当前位置：首页 > news >正文

Pandas 使用

news 2026/7/11 21:35:41

1. value_counts()

统计某个列（Series）中各个唯一值出现的频率（次数）

2. ffill()

# 使用前向填充和后向填充的方法填补缺失值 4分data['Value'].ffill(inplace=True)data['Value'].bfill(inplace=True)# inplace=True 直接修改原数据

1. 求比例

# 1. 总患者数total=len(data)# 2. 分别统计高/低风险人数high_count=(data['RiskLevel']=='高风险患者').sum()low_count=(data['RiskLevel']=='低风险患者').sum()# 3. 计算比例high_rate=high_count/total low_rate=low_count/total# 输出（转百分比更直观）print(f"高风险患者占比：{high_rate:.2%}")print(f"低风险患者占比：{low_rate:.2%}")

# 直接统计所有RiskLevel的占比, key(RiskLevel值)-value(比例，小数)risk_ratio=data['RiskLevel'].value_counts(normalize=True)# 单独提取高、低风险print("高风险患者占比：",risk_ratio['高风险患者'])print("低风险患者占比：",risk_ratio['低风险患者'])

# 分组求比例：True=1，False=0，mean()=占比high_rate=(data['RiskLevel']=='高风险患者').mean()low_rate=(data['RiskLevel']=='低风险患者').mean()print(f"高风险患者占比：{high_rate:.2%}")print(f"低风险患者占比：{low_rate:.2%}")

2. numpy where 函数

importnumpyasnp# 1. 纯数字数组（不用任何数据集）arr=np.array([18,25,30,16])# 用 np.where 判断 BMI 范围result=np.where(arr>=28,"肥胖","正常")print(result)# 输出：['正常' '正常' '肥胖' '正常']# 创建新列'RiskLevel'，根据住院天数判断风险等级data['RiskLevel']=np.where(data['DaysInHospital']>7,'高风险患者','低风险患者')

3. cut 函数 (类似 case when)

连续数值 → 分段打标签

importnumpyasnpimportpandasaspd# 1. 定义边界（5个边界）bmi_bins=[0,18.5,24,28,np.inf]# 2. 定义标签（4个标签，对应4个区间）bmi_labels=['偏瘦','正常','超重','肥胖']# 3. 执行分段：给每个BMI值分配区间标签data['BMIRange']=pd.cut(data['BMI'],bins=bmi_bins,labels=bmi_labels,right=False# 左闭右开：[0,18.5) [18.5,24) ...)