当前位置: 首页 > news >正文

CANN/AMCT OFMR算法示例

AMCT Large Model Quantization

【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct

1 Quantization Prerequisites

1.1 Install Dependencies

The dependency packages for this sample can be found in requirements.txt

Note that the torch_npu package version needs to match the Python and torch package versions, and the CANN package needs to be installed

1.2 Model and Dataset Preparation

This sample uses Llama2-7b, qwen2-7b, and qwen3-8b models with pileval data and wikitext2 dataset as examples. Please download the models yourself and pass the model path to the script. The dataset is loaded online.

1.3 Simple Quantization Configuration

The quantization configuration used in this sample is built into the tool and can be obtained and used in the following ways:

from amct_pytorch import HIFP8_OFMR_CFG

If you need to modify the detailed configuration, please refer to the documentation to construct the required quantization configuration dict.

The OFMR algorithm supports weight-only quantization and full quantization. The supported quantization types and quantization configurations are:

FieldTypeDescriptionValue RangeNotes
batch_numuint32Number of batches used for quantization1/
skip_layersstrLayers to skip quantization/Skip quantization layers support fuzzy matching. When the configured string is a layer name substring or matches the layer name, skip quantization for that layer and do not generate quantization configuration. The string must contain numbers or letters
weights.typestrQuantized weight type'float8_e4m3fn'/'hifloat8'/
weights.symmetricboolSymmetric quantizationTRUE/
weights.strategystrQuantization granularity'tensor'/'channel'/
inputs.typestrQuantized activation type'float8_e4m3fn'/'hifloat8'/
inputs.symmetricboolSymmetric quantizationTRUE/
inputs.strategystrQuantization granularity'tensor'/
algorithmdictQuantization algorithm configuration used{'ofmr'}/

2 Quantization Example

2.1 Use Interface Method to Call

step 1.Please execute the following command in the current directory to run the sample program. Users need to modify the model path in the sample program according to actual conditions:

python3 src/run_llama2_samples.py --model_path=/data/Llama2_7b_hf/
python3 src/run_qwen_samples.py --model_path=/data/Qwen2-7b/
python3 src/run_qwen_samples.py --model_path=/data/Qwen3-8B/

If the following information appears, it indicates that quantization is successful:

Test time taken: 1.0 min 59.24865388870239 s Score: 5.477707

Where Score is the quantized model PPL. For specific values, refer to the following table:

ModelCalibration SetDatasetPre-quantization PPLPost-quantization PPL
LLAMA2-7Bpilevalwikitext25.4725.505
QWEN2-7Bpilevalwikitext27.1377.196
QWEN3-8Bpilevalwikitext29.7159.808

After inference succeeds, a quantization log file ./amct_log/amct_pytorch.log is generated in the current directory

【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

http://www.jsqmd.com/news/962139/

相关文章:

  • UE5数字人引擎架构设计:从Metahuman到AI交互的完整解决方案深度解析
  • 2026毕节织金装修公司实力榜单:5家靠谱装企,适配本地人居需求
  • 2026报考必看:四川省内哪所大学比较好? - 品牌2026
  • 20252403李俊江实验四
  • 为什么你的二维码在AI数字营销正文里自动失效?——CSDN官方白皮书未披露的4类拦截场景及3种灰度兼容方案
  • 开源项目管理的终极解决方案:OpenProject完整使用指南
  • 用Wireshark和tcpdump抓包,手把手教你搞懂MTU、MSS和网络分片(附避坑指南)
  • 昆明车主车灯改装法规科普:合规改灯不被罚、安稳过年检 - 英特菲斯
  • ThinkPad风扇控制终极指南:3种场景下的TPFanCtrl2专业配置方案
  • 2026实力之选:上海物流运输公司品牌机构评估分析 - 品牌企业推荐师(官方)
  • 2026年佛山包包回收一站式指南:多区门店+中检专业鉴定,卖包不迷茫 - 奢侈品交易观察员
  • 如何用LX Music桌面版打造你的专属音乐库:5个超实用技巧
  • 技术社区线下聚会的价值:从工程师连接到跨界成长
  • CorridorKey:基于神经网络的物理精确绿幕抠像终极解决方案
  • 2026 模块化UPS厂家实力推荐盘点:综合维度择优推荐,国产全链路厂商领跑行业
  • 终极指南:5分钟永久激活Windows和Office的智能解决方案
  • 终极解锁:Ohook如何高效实现Microsoft 365完整功能激活
  • 2026 菏泽防水补漏瓷砖空鼓修复推荐,苏易修缮本土直营,沿黄大堤背河洼地汛期河水抬升返潮黄泛软土全域不均匀沉降南部黄河故道低洼积涝冬春温差冻胀就近微创免砸修缮 - 苏易修缮
  • LikeC4架构权限管理:如何实现细粒度访问控制与可视化权限建模
  • 微信小程序自定义导航栏完整教程:5分钟打造专业级顶部导航
  • 掌握OpenCode多项目并发处理:现代开发者的终极效率提升方案
  • LLM底层原理-从零训练你的第一个ChatGPT 风格大模型:NanoChat 全流程实战指南
  • 别再让你的API接口裸奔了:从Padding Oracle攻击看现代Web应用加密的正确姿势
  • 开源数据恢复工具:3大常见数据灾难的终极解决方案
  • 可乐机减压阀哪个牌子好?2026专业选购指南 - 速递信息
  • 如何在Ruby on Rails中集成redis-rails?5分钟快速上手指南
  • 保姆级避坑指南:用ROS的easy_handeye和aruco_ros搞定机械臂手眼标定(附常见错误解决)
  • OpenMMD常见问题解决:新手必知的10个调试技巧
  • 2026郑州黄金回收权威测评:全国连锁榜首,收的顶稳居本地行业龙头 - 奢侈品回收评测
  • 富芮坤物联网开发板开箱评测与开发实战:从硬件解析到蓝牙应用