当前位置：首页 > news >正文

【Python】pandas Week 8 - 1：环境搭建与基础概念

news 2026/6/4 16:35:49

一、学习目标

搭建Python 环境
理解 DataFrame 和 Series
学会读取和查看数据

二、Pandas vs SQL 语法对照

SQL概念	pandas对应	学习重点
*`SELECT FROM table`**	`df`或`df.head()`	查看数据
`SELECT col1, col2`	`df[['col1', 'col2']]`	选择列
`WHERE`	`df[df['col'] > 100]`	条件筛选
`GROUP BY`	`df.groupby('col')`	分组聚合
`JOIN`	`pd.merge(df1, df2)`	表关联
`ORDER BY`	`df.sort_values('col')`	排序
`SUM/AVG/COUNT`	`df['col'].sum()`	聚合函数
`CASE WHEN`	`df.apply()`或`np.where()`	条件判断

三、学习内容

1、环境搭建

# 安装 Anaconda 或 pip 安装 pip install pandas numpy sqlalchemy pymysql jupyter # 启动 jupyter jupyter notebook

2、第一个 pandas 程序

import pandas as pd # 创建 DataFrame （一个类似 SQL 的表） df = pd.DataFrame({ 'product_id': [1, 2, 3, 4, 5], 'product_name': ['产品A', '产品B', '产品C', '产品D', '产品E'], 'price': [100, 200, 150, 300, 250], 'quantity': [10, 20, 15, 5, 8] }) # 查看数据（类似 SQL 的 SELECT） print('查看全部数据：\\n',df) # 查看全部数据 print('查看前3行数据（LIMIT 3）：\\n', df.head(3)) # print('查看最后2行数据：\\n', df.tail(2)) # 最后2行 print('查看维度（行数，列数）：\\n', df.shape) # 维 print('查看数据信息（DESC）\\n', df.info()) # 数据信 print('查看统计描述：\\n', df.describe()) # 统计描述

3、读取外部数据

import pandas as pd from sqlalchemy import create_engine # 写入数据 df.to_csv('products.csv', index=False) # index=False 不写入索引 df.to_excel('products.xlsx', index=False) # 读取数据 df1 = pd.read_csv('products.csv') # 读取 CSV 文件 df2 = pd.read_excel('products.xlsx') # 读取 Excel 文件 print(df1) print(df2) # 读取 SQL engine = create_engine( 'mysql+pymysql://@10.200.13.59:9031/ads?charset=utf8', connect_args={ 'user': 'garciashan', 'password': 'garciashan@205324', 'port': 9031, } ) df = pd.read_sql('SELECT * FROM ads.ads_dim_site', engine) print(df) # 保存数据 df.to_csv('output.csv', index=False) df.to_excel('output.xlsx', index=False)

四、本周练习

练习1：创建一个包含以下列的 DataFrame

员工 ID 、姓名、部门、工资、入职日期
至少10条数据
用 head() 、tail()、info()、describe() 查看

from sqlalchemy import create_engine import pandas as pd df = pd.DataFrame({ '员工ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], '姓名': ['Sasa', 'Ami', 'Kimi', 'Jason', 'Sara', 'Tom', 'Jim', 'Phill', 'Zoe', 'Mike'], '部门': ['HR', 'HR', 'IT', 'IT', 'FIN', 'FIN', 'OPS', 'OPS', 'OCT', 'OCT'], '工资': [10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000], '入职日期': ['2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01', '2022-05-01', '2022-06-01', '2022-07-01', '2022-08-01', '2022-09-01', '2022-10-01'], }) print('df.head():\\n', df.head()) print('df.tail():\\n',df.tail()) print('df.info():\\n', df.info()) print('df.describe():\\n', df.describe())

练习2：从你的工作数据库读取一张表到 pandas

from sqlalchemy import create_engine import pandas as pd engine = create_engine( 'mysql+pymysql://@10.200.13.59:9031/ads?charset=utf8', connect_args={ 'user': 'garciashan', # 注意这里要加 default_cluster: 前缀 'password': 'garciashan@250324', 'port': 9031 } ) df = pd.read_sql("SELECT * FROM ads.ads_dim_site LIMIT 100", engine) # print(df) print(df.head()) print(df.shape)

查看全文

http://www.jsqmd.com/news/650346/