当前位置: 首页 > news >正文

【Qwen】train()函数说明

train()函数文档

train(attn_implementation='flash_attention_2')

Runs the main training loop for Qwen VL (Qwen2-VL, Qwen2.5-VL, Qwen3-VL, or Qwen3-VL-MoE) instruction tuning.
Parses command-line arguments for model, data, and training config; loads the appropriate model class and processor; optionally applies LoRA or configures which modules to tune (vision encoder, MLP merger, LLM); builds the supervised data module and Hugging FaceTrainer, runs training (with optional resume), then saves the final model and processor tooutput_dir.

Parameters

NameTypeDefaultDescription
attn_implementationstr"flash_attention_2"Attention implementation passed to the model (e.g."flash_attention_2"for Flash Attention 2).

Command-line arguments (parsed viaHfArgumentParser)

  • ModelArguments

    • model_name_or_path(str) – HuggingFace model id or path (e.g.Qwen/Qwen2.5-VL-3B-Instruct,Qwen/Qwen3-VL-8B-Instruct). Used to select model class (Qwen2-VL, Qwen2.5-VL, Qwen3-VL, or Qwen3-VL-MoE).
    • tune_mm_llm(bool) – Whether to train the language model (andlm_head).
    • tune_mm_mlp(bool) – Whether to train the vision merger (MLP).
    • tune_mm_vision(bool) – Whether to train the vision encoder.

  • DataArguments

    • dataset_use(str) – Comma-separated dataset names (with optional%Nsampling, e.g.dataset1%50).
    • data_flatten(bool) – Whether to flatten/concat batch sequences.
    • data_packing(bool) – Whether to use packed data (requires preprocessing withpack_data.py).
    • max_pixels(int) – Max image pixels (default28*28*576).
    • min_pixels(int) – Min image pixels (default28*28*16).
    • video_max_frames,video_min_frames,video_max_pixels,video_min_pixels,video_fps– Video sampling and resolution settings.
  • TrainingArguments(extendstransformers.TrainingArguments)

    • cache_dir(str, optional) – Cache directory for model/processor.
    • model_max_length(int) – Maximum sequence length for tokenizer.
    • lora_enable(bool) – IfTrue, apply LoRA and ignoretune_mm_*for the base model.
    • lora_r,lora_alpha,lora_dropout– LoRA rank, alpha, and dropout.
    • mm_projector_lr,vision_tower_lr– Optional learning rates for projector and vision tower.
    • Plus standard Trainer args:output_dir,bf16,per_device_train_batch_size,gradient_accumulation_steps,learning_rate,num_train_epochs,save_steps,gradient_checkpointing,deepspeed, etc.

Returns

None. Model and processor are saved undertraining_args.output_dir.

Notes

  • Ifoutput_diralready containscheckpoint-*directories, training is resumed withresume_from_checkpoint=True.
  • Whendata_flattenordata_packingis enabled, the Qwen2 VL attention class is replaced for compatibility.
  • Qwen3-VL MoE models useQwen3VLMoeForConditionalGeneration; other Qwen3-VL models useQwen3VLForConditionalGeneration; Qwen2.5-VL and Qwen2-VL use the corresponding classes inferred frommodel_name_or_path.

Example

# Typical usage: arguments are passed via command line (e.g. from scripts/sft_qwen3_4b.sh)torchrun --nproc_per_node=4qwenvl/train/train_qwen.py\--model_name_or_path Qwen/Qwen3-VL-8B-Instruct\--dataset_use my_dataset\--data_flatten True\--tune_mm_vision False --tune_mm_mlp True --tune_mm_llm True\--output_dir ./output\--bf16 --per_device_train_batch_size4--gradient_accumulation_steps4\--learning_rate 1e-5 --num_train_epochs0.5
# Programmatic call (still requires sys.argv or explicit parse for HfArgumentParser)fromqwenvl.train.train_qwenimporttrain train(attn_implementation="flash_attention_2")
http://www.jsqmd.com/news/358630/

相关文章:

  • 实用指南:LeetCode100天Day3-判断子序列与汇总区间
  • 03.传递函数(定义与时域)
  • C语言中易混淆概念:指针、数组与函数的深度辨析
  • 完整教程:PhysX-Anything:从单张图像创建可用于模拟的物理 3D 资源
  • 2026激光封边机有哪些品牌可选?十大热门品牌+选型干货,避坑指南来了 - 星辉数控
  • ubuntu:vim 操作教程(视频教程版)
  • 免费vs付费AIGC工具:10款主流选项性能对比
  • 论文写作智能化:6款AI工具提升效率与成果
  • php python+vue开题报告易租房系统
  • 2026年2月幼儿防干裂唇膏推荐:干唇修护测评与儿童专属护唇指南 - 品牌鉴赏师
  • DELL R740XD安装内存方法
  • JVM--5-深入 JVM 方法区:类的元数据之家
  • php python+vue在线考试系统设计与开发开题报告
  • 《道德经》 德经第一章
  • php python+vue在线聊天系统开题报告
  • uv包管理器
  • 永磁同步电机ADRC实战:手把手拆解Simulink骚操作
  • 红黑树解析:map与set底层原理
  • 毕业论文1天搞定?实测9款AI写论文工具,8万字初稿+真实参考文献,速度与查重双保险! - 麟书学长
  • 2026年2月宝宝小支装牙膏品牌最新推荐,便携牙膏品牌实力与口碑双评 - 品牌鉴赏师
  • [提示词工程] Prompt 工程 : 如何编写高质量的 Prompt ?
  • 文件类型为默认打开方式时,接收 文件名参数
  • 【小程序毕设全套源码+文档】基于微信小程序的校园文化艺术展示app的设计与实现(丰富项目+远程调试+讲解+定制)
  • 2026 学生党必囤 AI 写论文软件:高性价比天花板清单
  • 【小程序毕设源码分享】基于springboot+小程序的校园文化艺术展示app的设计与实现(程序+文档+代码讲解+一条龙定制)
  • 电桥测量模块:支持双驱动与多接口输出 适用于比例与固定电桥
  • idea 执行 Maven 的 `clean`、`install`、`package` 等命令报错
  • 不踩雷! 10个AI论文写作软件测评:专科生毕业论文+科研写作必备工具推荐
  • 发展融、民生暖:首都都市圈协同规划的幸福密码
  • 【小程序毕设源码分享】基于springboot+Android的陪诊护理系统APP的设计与实现(程序+文档+代码讲解+一条龙定制)