当前位置: 首页 > news >正文

CANN/mat-chem-sim-pred IPDT批量滚动评分基准测试

PidIpdtBatchRolloutScore Benchmark Report

【免费下载链接】mat-chem-sim-pred面向工业领域,聚焦计算仿真、预测两大核心场景,构建面向流程工业"机理+数据"双轮驱动的领域计算层,推动AI for Science在材料化学领域的深度应用。项目地址: https://gitcode.com/cann/mat-chem-sim-pred

This document records the measured CPU/NPU behavior ofPidIpdtBatchRolloutScore.

Environment

  • NPU host:node202
  • Device:Ascend910B3, device id0
  • CANN:/usr/local/Ascend/ascend-toolkit/latest
  • CPU baseline: benchmark program multi-thread mode
  • Build:-DCMAKE_BUILD_TYPE=Release -DSOC_VERSION=Ascend910B3 -DRUN_MODE=npu

Method

Thebenchmark_pid_ipdt_batch_rollout_score_aclnnprogram builds an in-process multi-thread CPU reference (ComputeRange, the same integrator recurrencey[k+1] = y[k] + b*u[k-delay]), runs the NPU operator on the same inputs and reportsmax_abs_err,max_quality_rel_errandbest_idx_diff_count. The pass conditions arenpu_zero_score_count == 0, per-candidate scores matching the CPU reference to float32 precision, and anybest_idxdifferences being near-ties (the chosen candidate's metric rel-err stays small), matching the behavior of the verified FOPDT operator.

Correctness

The IPDT kernel differs from the verified FOPDT kernel only in the state recurrence (thea*ydecay term is dropped). The candidate-axis SIMD width does not change the numerics (each tile is independent), so the accuracy profile matches FOPDT: NPU output equals the CPU reference within float32 rounding.

Measured onnode202 / Ascend910B3, B=128, sim_steps=1024, candidate_tile=C,npu_zero_score_count=0:

candidatesmax_abs_errmax_quality_rel_errbest_idx_diff_count
10242.4e-41.5e-60
40961.01.69e-31
163841.5e-33.3e-51

Themax_abs_err=1at 4096 is the discrete settling-time metric crossing the settle band one sample later on NPU than on CPU for a single near-tie loop (dt=1-> abs diff 1); the corresponding metric rel-err stays< 2e-3. The reference FOPDT operator shows the same behavior at this candidate count (max_abs_err=1, max_quality_rel_err=4.5e-3, best_idx_diff_count=1), so IPDT is within the accepted baseline.

Measured timing

node202 / Ascend910B3, B=128, sim_steps=1024, candidate_tile=C, CPU = 64-thread parallel reference.

candidatesCPU parallel msNPU kernel msNPU kernel vs CPU
102432.57.454.36x
4096122.124.74.95x
16384426.693.84.55x

Against a 192-thread CPU reference the speedup is 3.8-4.0x (the wider CPU pool narrows the gap).

Notes

The kernel reuses the FOPDT wide-lane (kLane=768) and fused inner-loop optimizations unchanged; the only algorithmic difference is the integrator recurrence, which removes one vector multiply per timestep.

【免费下载链接】mat-chem-sim-pred面向工业领域,聚焦计算仿真、预测两大核心场景,构建面向流程工业"机理+数据"双轮驱动的领域计算层,推动AI for Science在材料化学领域的深度应用。项目地址: https://gitcode.com/cann/mat-chem-sim-pred

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

http://www.jsqmd.com/news/1120456/

相关文章:

  • LiveViewJS项目结构解析:从Monorepo到模块化设计的完整指南
  • GB28181视频平台性能瓶颈深度解构:WVP-Pro高并发架构演进与优化策略
  • LunarBar macOS农历插件完整攻略:传统节日的数字守护者
  • 如何在Switch上使用wiliwili:第三方B站客户端的完整使用指南
  • Ruby依赖管理神器:Bundler深度解析与实践指南
  • jqjq社区贡献指南:如何参与这个开源项目的开发
  • 深入理解uarch-bench:libpfc库如何赋能精确性能计数
  • E-Hentai Downloader高级设置:个性化配置让你的下载体验更完美
  • 软件测试常见面试题1(附答案)
  • Mastra AI框架架构设计:构建企业级AI应用的最佳实践
  • 企业级视频监控平台架构解析:WVP-GB28181-Pro从单体到分布式部署的完整方案
  • 如何在macOS上快速搭建Intel RealSense深度相机开发环境:从零开始的完整指南
  • 如何快速使用BIThesis:北京理工大学论文写作的终极解决方案
  • ToastNotifications:打造WPF应用中令人惊艳的通知系统完全指南
  • Linux下GmSSL与OpenSSL共存:国密算法与标准加密库的隔离部署实践
  • Agent Skills技能物联网集成:连接IoT设备的技能开发终极指南
  • SendGrid Node.js邮件服务集成:从技术原理到高级应用的完整指南
  • 探索DwarFS:从技术爱好者到开源贡献者的成长之旅
  • Macad3D完整指南:5分钟快速上手开源3D建模工具
  • PowerAPI配置详解:从基础到高级的完整配置手册
  • Flower监控工具完整指南:5分钟掌握Celery分布式任务队列监控
  • NoDock数据库配置:MySQL、MongoDB与Redis的最佳实践
  • Selenium自动化测试中Shadow DOM的三种穿透方法与实战指南
  • GPT-5.5是虚构模型?揭秘大模型命名规范与真实技术演进
  • Ubuntu 26.04/24.04 Wayland下解决全屏显示问题的完整指南
  • MyBatis批量insert-select踩坑:useGeneratedKeys=true 可能让PostgreSQL返回大量插入结果
  • CANN/ge LLM缓存pull_cache API
  • CANN/cannbot-skills科学模型NPU迁移指南
  • 终极Houdini流程资产库:qLib让你的特效创作效率翻倍
  • Saber手写笔记:跨平台开源笔记工具的完整使用指南