当前位置: 首页 > news >正文

CANN混元视频配置说明

YAML Parameter Description

【免费下载链接】cann-recipes-infer本项目针对LLM与多模态模型推理业务中的典型模型、加速算法,提供基于CANN平台的优化样例项目地址: https://gitcode.com/cann/cann-recipes-infer

Hunyuan-Video inference parameters are maintained inconfig/*.yaml. Select a config by settingYAML_FILE_NAMEininfer.sh.

Default configs:

  • Single-card baseline:single.yaml
  • 8-card sequence parallel:sp8.yaml
  • Single-card FP8:single_fp8.yaml
  • Single-card sparse attention:single_sparse.yaml
  • 8-card sparse attention:sp8_sparse.yaml
model_args: model-base: "ckpts" # Weight root directory. Relative paths are resolved from models/hunyuan-video/. prompt: "A cat walks ..." # Text prompt for video generation. video-size: [720, 1280] # Output size in [H, W]. video-length: 129 # Output frame count. Constraint: 4n+1. infer-steps: 50 # Number of denoising steps. seed: 42 # Random seed. embedded-cfg-scale: 6.0 # Embedded CFG guidance scale. flow-shift: 7.0 # FlowMatch timestep shift. flow-reverse: true # Whether to use reverse flow scheduling. Options: [false, true]. use-cpu-offload: true # Whether to enable CPU offload. Options: [false, true]. extract_q_k_data: false # Whether to extract QK data for sparse attention offline profiling. Options: [false, true]. extract_path: "path/to/qk_dir" # Output directory for extracted QK data. Required when extract_q_k_data is true. ulysses-degree: 8 # Ulysses sequence parallel degree. Multi-card configs use this field. ring-degree: 1 # Ring attention degree. Sparse configs currently require 1. use-vae-parallel: true # Whether to enable VAE parallelism. Options: [false, true]. fa-perblock-fp8: true # Whether to enable FP8 FA activation quantization. Options: [false, true]. mm-mxfp8: true # Whether to enable MXFP8 matmul quantization. Options: [false, true]. dit-weight: "/abs/path/ckpt.pt" # Optional DiT checkpoint path. model: "HYVideo-T/2-cfgdistill" # DiT architecture. Options: ["HYVideo-T/2", "HYVideo-T/2-cfgdistill"]. model-resolution: "720p" # Model resolution preset. Options: ["540p", "720p"]. precision: "bf16" # DiT precision. Options: ["fp32", "fp16", "bf16"]. seed-type: "auto" # Seed source. Options: ["file", "random", "fixed", "auto"]. model_name: "hunyuan-video" # Model name. Options: ["hunyuan-video"]. world_size: 1 # Number of launched processes. Multi-card configs require world_size = ulysses-degree * ring-degree. master_port: 29600 # torchrun master port. entry_script: "sample_video.py" # Entry script. Options: ["sample_video.py"]. dit_cache: method: "NoCache" # DiT cache method. Options: ["NoCache", "FBCache", "TeaCache", "TaylorSeer"]. params: # FBCache / TeaCache rel_l1_thresh: 0.05 # Relative L1 threshold. Larger values are faster but may reduce quality. # TeaCache coefficients: [] # TeaCache polynomial coefficients. warmup: 2 # Number of initial full-compute steps. # TaylorSeer n_derivatives: 3 # Taylor expansion order. skip_interval_steps: 4 # Full-compute interval. cutoff_steps: 1 # Number of final full-compute steps. offload: true # Whether to offload TaylorSeer history states to CPU. Options: [false, true]. sparse: method: "SVG" # Sparse attention method. Options: ["no_sparse", "TopK", "SVG"]. block_size_Q: 128 # Q-axis block size. block_size_K: 512 # K-axis block size. model: "HunyuanVideo" # Sparse module model type. Options: ["HunyuanVideo"]. params: TopK: sparse_time_step: "10-49" # Active denoising step range. Format: "start-end". sparsity_files_path: "./sparsity/720x1280x129/v3" # Offline profiling sparsity file directory. CAC_threshold: 0.66 # TopK threshold. SVG: sparse_time_step: "14-49" # Active denoising step range. Format: "start-end". sparsity: 0.8 # SVG sparsity ratio. sample_mse_max_row: 5000 # Maximum sampled rows for MSE. context_length: 256 # SVG context length.

Notes:

  • Sparse attention and DiT cache are mutually exclusive. Keepdit_cache.method: "NoCache"in sparse configs.
  • TopKrequires sparsity files that matchvideo-sizeandvideo-length.
  • extract_q_k_datais used to generate QK data for sparse attention offline profiling. Setextract_pathto a writable directory when enabling it.
  • TaylorSeer may require high host memory at large resolutions and long frame counts.

【免费下载链接】cann-recipes-infer本项目针对LLM与多模态模型推理业务中的典型模型、加速算法,提供基于CANN平台的优化样例项目地址: https://gitcode.com/cann/cann-recipes-infer

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

http://www.jsqmd.com/news/849345/

相关文章:

  • 数据中心工频UPS哪家好?2026工频不间断电源/核磁用UPS电源生产厂家权威推荐 - 栗子测评
  • CTF中的音频隐写术实战:从‘兔耳’和‘调频收音机’两道Misc题,学会用Python脚本提取隐藏信息
  • HermesAgent工具连接Taotoken自定义模型提供方的完整流程
  • CANN Bench交叉熵损失算子评测
  • Matlab阶跃响应性能指标自动化计算:从原理到工程实践
  • 如何快速上手elec-ops-inspection:昇腾平台部署指南
  • Configor 自动重载功能深度解析:实现配置热更新的终极指南
  • CANN/hccl RDMA QP端口配置路径
  • 轨距调整片定制哪家好?2026年绝缘轨距块生产厂家优质供应商推荐指南:新建铁路配件领衔 - 栗子测评
  • 2026机房不间断电源生产厂家哪家好?深圳不间断电源生产厂家实力深度解析 - 栗子测评
  • cann/asc-devkit SetGradOutput接口
  • CANN ops-fft部署指南:生产环境中的配置、监控与故障排除
  • npc_gzip异常处理与调试手册:解决压缩器错误的10个实用技巧
  • Commit Mono版本管理指南:如何优雅地升级和回滚字体版本
  • 源头工厂直供:利成充气水池定制厂家,广东便携式宠物泳池、PVC 戏水玩具、水上充气浮排专业生产基地 - 栗子测评
  • 穿透算法黑箱:2026论文降AI率工具深度测评,早标网语义保真度99%
  • 橡胶垫板定制厂家推荐:新建铁路配件领衔,2026年口碑好的调高垫板批发厂家/轨道橡胶垫板生产厂家/精调件生产厂家盘点 - 栗子测评
  • Transformer架构解析:自注意力机制与LLM核心技术
  • CrossGeo:首个跨卫星-无人机-地面三重视角的6-DoF 3D重建与定位数据集详解
  • 【YOLO目标检测全栈实战】48 深入TensorRT加速:从28ms到6ms的C++推理实战
  • Seed-VC语音克隆指南:5分钟实现零样本实时语音转换的终极方案
  • ARM SPE Profiling Buffer机制与性能分析实践
  • 地空协同巡检新范式:elec-ops-inspection 3D空间建模技术
  • GIFT应用案例:从Web服务到移动应用的实际部署方案
  • USB/IP Windows:打破物理限制的USB设备网络共享终极方案
  • 钢制平开防火窗|2026价格与工程应用要点
  • STR71X芯片JTAG失效分析与Bootloader恢复指南
  • Symfony String国际化实战:为什么它比原生PHP字符串函数更强大
  • 如何用Lano Visualizer打造智能音频可视化桌面:从音乐爱好者到专业用户的完整指南
  • 【独家首发】Gemini Pro函数调用(Function Calling)深度解析:7个生产环境踩坑案例+可复用的TypeScript Schema模板