当前位置: 首页 > news >正文

CANN/mat-chem-sim-pred:SOPDT批量PID候选评分算法

PidSopdtBatchRolloutScore Algorithm

【免费下载链接】mat-chem-sim-pred面向工业领域,聚焦计算仿真、预测两大核心场景,构建面向流程工业"机理+数据"双轮驱动的领域计算层,推动AI for Science在材料化学领域的深度应用。项目地址: https://gitcode.com/cann/mat-chem-sim-pred

Purpose

This operator evaluates many PID candidates for many SOPDT loops during the tuning stage and returns the best candidate for each loop.

The target workload is:

batch loops x candidate set x rollout time steps

Model

The plant model is discretized SOPDT (second-order plus dead time):

y[k+1] = a1 * y[k] + a2 * y[k-1] + b * u[k-delay]

Versus the FOPDT recurrencey[k+1] = a*y[k] + b*u[k-delay], SOPDT keeps one extra output-history statey[k-1]and a second coefficienta2, which lets it represent over/critically/under-damped second-order responses. For a plant built from two stable real lags with polesp1, p2,a1 = p1+p2,a2 = -p1*p2andb = K*(1-p1)*(1-p2). Everything else (PID law, scoring, candidate-axis SIMD, delay ring, tiling) is identical toPidFopdtBatchRolloutScore.

The PID law is:

e[k] = sp - y[k] integral += e[k] * dt derivative = (e[k] - e[k-1]) / dt u[k] = clamp(Kp * e[k] + Ki * integral + Kd * derivative, -10, 10)

Score

For each candidate, the rollout accumulates:

  • IAE
  • ISE
  • overshoot
  • settling_time
  • control_energy

The optimization target is:

score = IAE + overshoot_weight * overshoot + settling_weight * settling_time + control_weight * control_energy

The operator returns the candidate with minimumscore.

NPU Execution Strategy

The current implementation uses a two-stage tiled structure:

  1. host splits the candidate axis into tiles
  2. local kernel evaluates one tile for all assigned loops and writes partial best results
  3. final kernel reduces all tile-local best results into one best result per loop

This structure was chosen because the earlier single-launch(loop, tile)task mapping showed unstable coverage onnode202. The current host-per-tile launch plus conservative loop-range partitioning restores correctness.

Kernel difference from FOPDT

The SOPDT kernel adds one state vectory_prev(y[k-1], placed in the previously-unused scratch block 11, so the UB budget andkLane=768are unchanged) and reads two coefficients (a1,a2) instead of one. The state update becomes:

y_new = a1*y + a2*y_prev + b*u[k-delay] y_prev = y y = y_new

This costs ~2 extra vector ops per timestep (oneMuls+ oneAdd) versus FOPDT; the delay ring, reduction and scoring are unchanged.

Vectorization

The rollout time dimension is a serial recurrence (y[k+1]depends ony[k]) and cannot be turned into GEMM-style dense math without dropping the per-step nonlinearities (control clamp) and the nonlinear score functionals (IAE/ISE/overshoot/settling), so the kernel keeps the exact step-by-step recurrence.

The parallelism instead lives on the candidate axis: every timestep applies the same chain of vector ops to all candidates at once. Because the recurrence is serial, that chain of dependent vector ops cannot be pipelined across timesteps, so with a narrow lane the inner loop is bound by per-instruction issue/latency rather than by compute throughput. The kernel therefore evaluates the candidate axis with a wide SIMD lane (kLane=768): more candidates per vector instruction means fewer instructions for the same work, which amortises the fixed instruction latency and makes the loop throughput-bound.kLane=768is the largest lane that keeps the 8 state vectors + scratch + the 32-slot delay ring (delay spec0..31) + I/O queues within the 192 KB UB budget. Widening the lane is a pure layout change and leaves the output bit-identical.

Engineering Conclusion

This operator is valuable as:

  • an independent PID tuning operator sample
  • a correctness-verified NPU exploration artifact (NPU output matches the CPU reference, quality rel-err< 1e-3)
  • a single-card rollout that reuses the FOPDT wide-lane plus fused inner-loop optimizations

The inner loop was also reduced from ~37 to ~32 vector ops per timestep by reusing the response error as the next step's error and by folding the non-feedback metric accumulators (IAE/ISE/control energy) into fused multiply-accumulates; this is bit-identical to the original. The remaining single-card headroom is a cheaper settling reduction; multi-card data parallelism scales the absolute time further but is a hardware lever, not a single-card algorithmic speedup.

【免费下载链接】mat-chem-sim-pred面向工业领域,聚焦计算仿真、预测两大核心场景,构建面向流程工业"机理+数据"双轮驱动的领域计算层,推动AI for Science在材料化学领域的深度应用。项目地址: https://gitcode.com/cann/mat-chem-sim-pred

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

http://www.jsqmd.com/news/1120632/

相关文章:

  • Heya自定义操作开发指南:超越邮件的多渠道营销自动化
  • 如何一键备份微信聊天记录:WeChatMsg让你的珍贵对话永不丢失
  • AI音乐写歌用什么App软件?2026国产工具实测推荐
  • status-go核心架构解析:理解Status应用的Go后端实现原理
  • DeepSeek与豆包中文实测:办公学习场景下的AI应用选择指南
  • CANN/asc-devkit Conv3DBackpropInput GetTiling函数
  • TVA:具身智能的动力引擎与能力底座(2)
  • E-Hentai Downloader与其他工具对比:为什么选择这个高效下载方案
  • IpaDownloadTool常见问题:解决IPA提取失败的7种方法
  • CANN/GE DFlow API MetaContext类
  • 如何在30分钟内开始你的DD奇幻冒险:dnd-tldr项目完全指南
  • Leaps API开发入门:将实时协作功能集成到你自己的应用中的实用指南
  • Boss Show Time:5分钟掌握招聘时间先机,告别错过最新岗位的遗憾!
  • CANN/cannbot-skills Ascend C算子白盒测试设计模板
  • HookLib² C++辅助工具使用指南:HookFactory与模板函数实战
  • 升势动能主图之红钻选股指标公式
  • 深入理解tools.cli的核心功能:parse-opts函数全方位解析
  • Blazingly-fast AI聊天新纪元:开源免费应用chat0全面解析
  • RestFB性能优化技巧:如何高效管理Facebook API调用
  • AI与SQL结合:SQL Ultimate Course智能查询新趋势
  • 百度网盘秒传链接网页工具终极指南:从零开始快速掌握文件极速转存
  • ContEx数据集处理:从原始数据到精美图表的完整流程指南
  • 如何用Flask-profiler定位最耗时的API端点?实战案例分享
  • 分布式架构下的AI代理翻译服务:5大微服务集成策略解析
  • d3-annotation与D3.js集成教程:打造交互式数据可视化注释
  • 线性回归模型评估:5个核心指标(R²、MSE、MAE)的Python实现与解读
  • 如何使用InVesalius进行医学影像分割?5个实用技巧让你快速上手
  • E-Viewer开发者指南:如何贡献代码并参与开源项目协作
  • Node.js原生模块编译的终极指南:掌握node-gyp构建工具
  • OWASP Mutillidae II高级实战:CSRF Token绕过与命令注入过滤突破