当前位置: 首页 > news >正文

下载UCI数据集《Secondary Mushroom》

说明:

查看代码
1. Title: Primary mushroom data2. Sources:(a) Mushroom species drawn from source book:Patrick Hardin.Mushrooms & Toadstools.Zondervan, 1999(b) Inspired by this mushroom data:Jeff Schlimmer.Mushroom Data Set. Apr. 1987.url:https://archive.ics.uci.edu/ml/datasets/Mushroom.(c) Repository containing the related Python scripts and all the data sets: https://mushroom.mathematik.uni-marburg.de/files/ (d) Author: Dennis Wagner(e) Date: 05 September 20203. Relevant information:This dataset includes 173 species of mushrooms with caps from various families and oneentry for each species.Each species is identified as definitely edible, definitely poisonous, or of unknownedibility and not recommended (the latter class was combined with the poisonous class).Of the 20 variables, 17 are nominal and 3 are metrical. The values of each nominal variableare a set of possible values and for the metrical variables a range of possible values.4. Data generation:The related Python project (Sources (c)) contains a Python module primary_data_generation.pyused to generate a first version of this data from the HTML version of the book (Sources (a))found in primary_data_generated.csv.The primary data is cleaned and enriched by going through the book manually resulting inprimary_data_edited.csv (used for the simulation of the secondary data).5. Class information:1. family		String of the name of the family of mushroom species (multinomial)2. name			String of the of the mushroom species (multinomial)3. class		poisonous=p, edibile=e (binary)6. Variable Information:(n: nominal, m: metrical; nominal values as sets of values)1. cap-diameter (m):			float number(s) in cmtwo values=min max, one value=mean2. cap-shape (n):            bell=b, conical=c, convex=x, flat=f,sunken=s, spherical=p, others=o3. cap-surface (n):          fibrous=i, grooves=g, scaly=y, smooth=s,shiny=h, leathery=l, silky=k, sticky=t,wrinkled=w, fleshy=e4. cap-color (n):            brown=n, buff=b, gray=g, green=r, pink=p,purple=u, red=e, white=w, yellow=y, blue=l, orange=o,  black=k5. does-bruise-bleed (n):	bruises-or-bleeding=t,no=f6. gill-attachment (n):      adnate=a, adnexed=x, decurrent=d, free=e, sinuate=s, pores=p, none=f, unknown=?7. gill-spacing (n):         close=c, distant=d, none=f8. gill-color (n):           see cap-color + none=f9. stem-height (m):			float number(s) in cmtwo values=min max, one value=mean10. stem-width (m):			float number(s) in mmtwo values=min max, one value=mean	   11. stem-root (n):           bulbous=b, swollen=s, club=c, cup=u, equal=e,rhizomorphs=z, rooted=r12. stem-surface (n): 		see cap-surface + none=f13. stem-color (n):			see cap-color + none=f14. veil-type (n):           partial=p, universal=u15. veil-color (n):          see cap-color + none=f16. has-ring (n):            ring=t, none=f17. ring-type (n):           cobwebby=c, evanescent=e, flaring=r, grooved=g, large=l, pendant=p, sheathing=s, zone=z, scaly=y, movable=m, none=f, unknown=?18. spore-print-color (n):   see cap color19. habitat (n):             grasses=g, leaves=l, meadows=m, paths=p, heaths=h,urban=u, waste=w, woods=d20. season (n):				spring=s, summer=u, autumn=a, winter=w

注意:

  1. 要开魔法才能运行,把数据集下载下来。
  2. 在下载依赖包后,在PyCharm中还是会报错,但是不用管,只要开魔法就行

具体代码:

# pip install ucimlrepo
# https://archive.ics.uci.edu/dataset/848/secondary+mushroom+dataset
from ucimlrepo import fetch_ucirepo
import os
import pandas as pd# 创建数据保存目录(如果不存在)
data_dir = r'd:\PycharmProjects\机器学习大作业\data2'
os.makedirs(data_dir, exist_ok=True)# 下载数据集
secondary_mushroom = fetch_ucirepo(id=848)# 获取特征和目标数据
X = secondary_mushroom.data.features
y = secondary_mushroom.data.targets# 合并并保存完整数据集
full_data = pd.concat([X, y], axis=1)
full_data.to_csv(os.path.join(data_dir, 'full_dataset.csv'), index=False)print(f"数据集已成功保存到 {data_dir} 目录")

 

http://www.jsqmd.com/news/79182/

相关文章:

  • 【题解】P11453 [USACO24DEC] Deforestation S
  • 03 以上版本 Excel 文件解压替换图片
  • 【题解】Luogu P13977 数列分块入门 2
  • AI核心知识50——大语言模型之Scaling Laws(简洁且通俗易懂版)
  • MySQL 深分页查询优化实践与经验总结
  • P2014 [CTSC1997] 选课
  • 彻底讲清 MySQL InnoDB 锁机制:从 Record 到 Next-Key 的全景理解
  • 超越宣传:基于数据与案例的软件人才外包服务商价值评估指南
  • MCU的启动流程你了解么?
  • 电机多目标优化与灵敏度分析:探索电机性能提升之道
  • I2C通信最全面的讲解:从协议到硬件设计
  • 打造下一个爆款!专业短剧APP全栈开发解决方案,解锁万亿级市场红利
  • 毕业论文选题AI推荐:9大工具+热门方向合集
  • 【题解】Luogu P10752 [COI 2024] Sirologija
  • PFC2D预制裂隙巴西劈裂试验模拟:探索岩石破裂奥秘
  • Python字符串:别只用来打印!这5个高级用法让代码效率翻倍
  • PSRR仿真教程:解锁电路抗噪能力的密钥
  • C51_AH3144霍尔传感器
  • C51_74HC595串口转并口
  • 【题解】Atcoder ABC432 C
  • 赶due党救急!论文降重2小时搞定,不熬夜
  • 5 分钟快速入门 Gitlab CI/CD
  • 计算机论文模板推荐:8大平台+AI修改工具
  • 16 位 SAR ADC 逐次逼近型 ADC 模拟集成电路设计探秘
  • Lua语法深入1
  • 【题解】Luogu P13885 [蓝桥杯 2023 省 Java/Python A] 反异或 01 串
  • 期待回家,顺便写点年度总结
  • E No address added out of total 1 resolved地址绑定失败: No address added out of total 1 resolved errors:
  • 计算机论文题目推荐:8大平台+50例AI生成
  • 【笔记】Manacher