当前位置: 首页 > news >正文

Unsloth Studio 使用问题记录

离线环境,官方docker镜像,k8s环境使用

1、导出模型时从github克隆llama.cpp处理

手动下载源码https://github.com/ggml-org/llama.cpp传到unsloth内
根据报错日志的路径,创建目录

mkdir/home/unsloth/.unsloth/llama.cpp-p

把llama.cpp源码放进去

/opt/venv/bin/python3-m pip install gguf protobuf sentencepiece mistral_common #自己指定源处理下,或者加变量

重试,就不会从github自动拉了,pip源记得配,或者人工干预,ps -ef 看下他在装什么包


可以看到已经完成了llama.cpp的源码构建安装

手动转换,不依赖Unsloth

# 转换并输出 f16 GGUF cd llama.cpp python convert_hf_to_gguf.py./my_unsloth_model \--outtype f16 \--outfile my_model_f16.gguf

后面还有代码写死从github下载https://github.com/ggerganov/llama.cpp/raw/refs/heads/master/convert_hf_to_gguf.py

改变量没测,直接改代码

unsloth@unsloth-studio-7fd9b89dcd-mjd8x:/workspace/llama.cpp$ python-m http.server8081ServingHTTPon0.0.0.0port8081(http://0.0.0.0:8081/)...127.0.0.1--[26/Apr/202613:40:37]"GET /convert_hf_to_gguf.py HTTP/1.1"200-127.0.0.1--[26/Apr/202613:42:35]"GET /convert_hf_to_gguf.py HTTP/1.1"200-
vim/opt/venv/lib/python3.12/site-packages/unsloth_zoo/llama_cpp.py #报错代码55

成功转GGUF并量化

2、数据集生成报错

index-CY5egRSv.js:73 POST http://10.103.184.147:8000/api/data-recipe/validate 500 (Internal Server Error)

持久化的workspace目录要给unsloth用户权限


要选Full Run

根据 Unsloth Studio 的官方设计:
Preview Run(预览运行):仅用于快速调试,不会生成持久化的本地数据集文件,因此不会出现在Train页面的数据集列表里。
Full Run(完整运行):才会生成可被训练页面识别的、持久化的数据集文件,之后会自动出现在Local标签的列表中。
你的第三张截图里,所有运行记录都标着Preview,说明你只跑了预览,没有执行完整的数据集生成流程,所以 Train 页面自然找不到数据集。

3、开始训练

预期效果,3万条数据, 30000

模型训练报错1:

Repo id must be in the form ‘repo_name’ or ‘namespace/repo_name’: ‘/workspace/work/heretic8-glm4.7-flash.Q4_K_M.gguf’. Userepo_typeargument if needed.

不支持gguf,所以前一篇文章单独做了烧蚀

模型训练报错2:

Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to setllm_int8_enable_fp32_cpu_offload=Trueand pass a customdevice_maptofrom_pretrained. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.

这个是显存不够,56G模型昨天在1/2mig没起来,今天换了80G A800

离线的unsloth pod内调整完一些配置建议重新构建个镜像,以免每次重新安装llama.cpp,配置pip源等

要做持久化的路径

volumeMounts: - mountPath: /workspace name: dsdata subPath: unslothdata/workspace - mountPath: /home/unsloth/.unsloth/studio name: dsdata subPath: unslothdata/studio - mountPath: /data/GLM name: dsdata subPath: GLM-4.7-Flash

4、数据集合并,昨天每本书单独一个训练集,还有一个10本大杂烩,把这些合并到一起一次性完成训练

1、把数据集文件都打包拿出来,自己写python脚本或者让AI处理


2、数据集放回Unsloth数据集目录(选择local的时候可以看到你合并后的),或者训练页面手动上传数据集

训练的参数配置

我这台window是tiny11-core,缺少中文包或者不知道什么情况,好多小说文件在这台机器编码有问题。本地claude写了个转换脚本

#!/usr/bin/env python3# -*- coding: utf-8 -*-""" 通用书籍乱码修复工具 v2.0 / Universal Book Encoding Fixer 用法 / Usage:# 修复单个文件python fix_encoding.py"book.txt"# 修复单个文件并指定输出名python fix_encoding.py"input.txt""output.txt"# 修复目录下所有 txt 文件(默认递归子目录)python fix_encoding.py--dir"./books"# 修复多种格式python fix_encoding.py--dir"./books"--ext.txt .novel .umd# 覆盖原文件(谨慎使用)python fix_encoding.py--dir"./books"--overwrite# 查看帮助python fix_encoding.py--help"""importosimportsysimportargparseimportre from pathlibimportPath from datetimeimportdatetime# Force UTF-8 for console output on Windowsifsys.platform=='win32':try: sys.stdout.reconfigure(encoding='utf-8')sys.stderr.reconfigure(encoding='utf-8')except Exception: pass class EncodingFixer:"""书籍编码修复器""" def __init__(self): self.gbk_misread_map=self._build_gbk_map()self.stats={'total':0,'success':0,'failed':0,'skipped':0,'encodings':{}}def _build_gbk_map(self):"""Build GBK misread character mapping""" mapping={}forb1inrange(0x81, 0xFE):forb2inrange(0x40, 0xFE):ifb2==0x7F:continuegbk_byte=bytes([b1, b2])try: correct_char=gbk_byte.decode('gbk')misread_chars=chr(b1)+ chr(b2)mapping[misread_chars]=correct_char except Exception:continuereturnmapping def read_file(self, filepath):"""智能读取文件,尝试多种编码""" encodings=['utf-8','gbk','gb2312','big5','utf-16-le','utf-16-be']with open(filepath,'rb')as f: raw_bytes=f.read()# Check BOM markersifraw_bytes.startswith(b'\xef\xbb\xbf'):returnraw_bytes[3:].decode('utf-8'),'utf-8-sig'elifraw_bytes.startswith(b'\xff\xfe'):returnraw_bytes[2:].decode('utf-16-le'),'utf-16-le'elifraw_bytes.startswith(b'\xfe\xff'):returnraw_bytes[2:].decode('utf-16-be'),'utf-16-be'elifraw_bytes.startswith(b'\x2b\x2f\x76'):returnraw_bytes[3:].decode('utf-7'),'utf-7'# Try encodingsforencinencodings: try: content=raw_bytes.decode(enc)ifcontent.count('\ufffd')<len(content)*0.05:returncontent, enc except Exception:continuereturnraw_bytes.decode('gbk',errors='replace'),'gbk-replace'def fix_mojibake(self, text):"""修复乱码""" text=text.replace('\ufffd','')result=[]i=0whilei<len(text):ifi +1<len(text): pair=text[i:i+2]ifpairinself.gbk_misread_map: result.append(self.gbk_misread_map[pair])i+=2continueresult.append(text[i])i+=1return''.join(result)def clean_text(self, text):"""清理文本""" text=re.sub(r'[\u200b\u200c\u200d\ufeff]','', text)text=re.sub(r'[ \t]+$','', text,flags=re.MULTILINE)returntext def process(self, input_path,output_path=None,overwrite=False):"""处理单个文件""" input_path=Path(input_path)ifnot input_path.exists(): raise FileNotFoundError(f"File not found: {input_path}")# Readcontent, detected_encoding=self.read_file(input_path)original_len=len(content)# Cleancontent=self.clean_text(content)# Fix mojibakecontent=self.fix_mojibake(content)# Determine output pathifoutput_path is None:ifoverwrite: output_path=input_path else: output_path=input_path.parent /(input_path.stem +'_fixed.'+ input_path.suffix)else: output_path=Path(output_path)# Check if output already existsifoutput_path.exists()and not overwrite:return{'input':str(input_path),'output':str(output_path),'status':'skipped','reason':'output exists'}# Write with UTF-8with open(output_path,'w',encoding='utf-8')as f: f.write(content)return{'input':str(input_path),'output':str(output_path),'encoding':detected_encoding,'original_len':original_len,'fixed_len':len(content),'status':'success'}def process_directory(self, dir_path, extensions,recursive=True,overwrite=False):"""处理目录中的所有文件""" dir_path=Path(dir_path)ifnot dir_path.exists(): raise FileNotFoundError(f"Directory not found: {dir_path}")pattern='**/*'ifrecursiveelse'*'files=sorted([fforfindir_path.glob(pattern)iff.is_file()and f.suffix.lower()inextensions])self.stats['total']=len(files)total_chars=0print(f"Found {len(files)} files to process")print(f"Recursive: {recursive}, Overwrite: {overwrite}")print("-"*60)foridx, filepathinenumerate(files,1): rel_path=filepath.relative_to(dir_path)print(f"[{idx}/{len(files)}] {rel_path}",end=" ")try: result=self.process(filepath,overwrite=overwrite)ifresult['status']=='skipped':self.stats['skipped']+=1print("[SKIPPED - output exists]")else: self.stats['success']+=1total_chars+=result['fixed_len']enc=result.get('encoding','unknown')self.stats['encodings'][enc]=self.stats['encodings'].get(enc,0)+1print(f"[OK] {enc} -> UTF-8 ({result['fixed_len']} chars)")except Exception as e: self.stats['failed']+=1print(f"[ERROR] {e}")return{'stats':self.stats,'total_chars':total_chars}def print_summary(summary):"""打印统计摘要""" print("\n"+"="*60)print("PROCESSING SUMMARY / 处理摘要")print("="*60)print(f"Total files: {summary['stats']['total']}")print(f"Success: {summary['stats']['success']}")print(f"Skipped: {summary['stats']['skipped']}")print(f"Failed: {summary['stats']['failed']}")print(f"Total chars: {summary['total_chars']:,}")ifsummary['stats']['encodings']: print("\nDetected encodings / 检测到的编码:")forenc, countinsorted(summary['stats']['encodings'].items(),key=lambda x: -x[1]): print(f" {enc}: {count}")print("="*60)def main(): parser=argparse.ArgumentParser(description='Book Encoding Fixer v2.0 - Fix Chinese book encoding issues',formatter_class=argparse.RawDescriptionHelpFormatter,epilog=""" Examples:# Fix single filepython fix_encoding.py book.txt# Fix with custom output namepython fix_encoding.py input.txt output.txt# Fix all files in directory (recursive by default)python fix_encoding.py--dir./books# Fix multiple formatspython fix_encoding.py--dir./books--ext.txt .novel .umd# Overwrite original files (use with caution!)python fix_encoding.py--dir./books--overwrite# Non-recursive modepython fix_encoding.py--dir./books --no-recursive""")parser.add_argument('input',nargs='?',help='Input file path')parser.add_argument('output',nargs='?',help='Output file path (optional)')parser.add_argument('--dir','-d',help='Process all files in directory')parser.add_argument('--ext','-e',nargs='+',default=['.txt'],help='File extensions (default: .txt)')parser.add_argument('--overwrite','-w',action='store_true',help='Overwrite original files')parser.add_argument('--recursive','-r',action='store_true',default=True,help='Recursively process subdirectories (default: True)')parser.add_argument('--no-recursive',action='store_true',help='Do not process subdirectories')args=parser.parse_args()fixer=EncodingFixer()# Normalize extensions to lowercaseextensions=[e.lower()ife.startswith('.')else'.'+ e.lower()foreinargs.ext]# Determine recursive moderecursive=not args.no_recursive# Process directoryifargs.dir: try: summary=fixer.process_directory(args.dir, extensions,recursive=recursive,overwrite=args.overwrite)print_summary(summary)except Exception as e: print(f"Error: {e}")sys.exit(1)# Process single fileelifargs.input: try: result=fixer.process(args.input, args.output,overwrite=args.overwrite)ifresult.get('status')=='skipped':print(f"Skipped: output file already exists")print(f" Output: {result['output']}")else: print("\nProcessing complete!")print(f" Input: {Path(result['input']).name}")print(f" Output: {Path(result['output']).name}")print(f" Encoding: {result.get('encoding', 'N/A')} -> UTF-8")print(f" Size: {result.get('fixed_len', 0):,} characters")except Exception as e: print(f"Error: {e}")sys.exit(1)else: parser.print_help()sys.exit(1)if__name__=='__main__':main()

要跑21小时,挂着等结果吧

http://www.jsqmd.com/news/714783/

相关文章:

  • 技术深度解析:Get-cookies.txt-LOCALLY - 本地化Cookie导出解决方案
  • 高级java每日一道面试题-2025年11月18日-容器与虚拟化题[Dockerj]-Docker 容器的核心隔离技术是什么?Linux Namespace 有哪些类型?
  • 向量数据库核心技术解析与RAG系统实践
  • GD32单片机中断实战:用串口接收中断和按键中断做个简易聊天机器人(附完整代码)
  • 如何突破网盘限速:终极网盘下载加速工具使用指南
  • 在Windows 10上用VS2019编译libtiff 4.0.8:从源码到读取16位医学影像的完整避坑指南
  • MCP SQL Bridge:为AI助手安全连接本地数据库,实现智能数据查询
  • 微电子展推荐:聚焦国产替代的优质展会精选 - 品牌2026
  • 边缘AI推理引擎实战:从模型转换到部署优化的完整指南
  • 终极黑苹果配置方案:OpCore-Simplify 三步完成专业级OpenCore EFI构建
  • 保姆级教程:用Arduino IDE给ESP32-S2刷WiFi FTM测距固件,解决信道不匹配和CONF_REJECTED错误
  • STM32F103的SPI引脚不够用?用普通IO口模拟SPI驱动W25Q64的完整避坑指南
  • 保姆级教程:在Firefly RK3568开发板上为Android11系统适配GT9271触摸屏(附设备树与驱动修改详解)
  • 【Java 25 ZGC 2.0生产调优权威指南】:20年JVM专家亲授7大不可绕过的GC停顿压测红线
  • 从几何到优化:为什么VINS-Mono、PL-VIO等算法偏爱用正交表示而不是普吕克坐标?
  • TargetMol泛素化——MG-132(Cat. No. T2154, CAS. 133407-82-6),多通路调控细胞凋亡 - 陶术生物
  • Hailo-8模型编译避坑实录:从TensorFlow模型到HEF文件,我遇到的3个典型警告和1个关键优化建议
  • Windows终极免费屏幕标注工具:ppInk完整使用指南
  • 2026年5月帝舵官方售后网点踩坑实录与根因分析(含迁址/新开)实地考察・全流程记录 - 亨得利官方服务中心
  • GolemBot:为AI编程助手打造可协作的团队资产
  • GitHub加速插件:告别龟速下载,享受极速开发体验
  • 从KAIST到VOT2020-RGBT:手把手带你用LRRNet复现红外-可见光融合实验(含数据集处理与指标分析)
  • 2026年昆明短视频运营与AI全网推广:从本地获客到全域转化的完整指南 - 优质企业观察收录
  • Arm Neoverse V1 PMU架构与性能监控实战解析
  • 2026年5月三亚婚纱照推荐|刚需新人避坑版|这10家闭眼选不踩雷 - 江湖评测
  • 别再死磕TCP标定了!用C#写个视觉引导的‘项目抓取法’,EPSON机械手也能轻松抓料
  • 快速免费清理Windows 11系统臃肿的终极解决方案:Win11Debloat使用完全指南
  • 用TensorFlow 2.x从零搭建VGG16:为什么我建议新手从这里开始学CNN
  • 上海鉴钧电器:上海空调维修空调安装选哪家 - LYL仔仔
  • 2026年最新B站视频下载教程:3分钟掌握BiliTools跨平台下载神器