当前位置: 首页 > news >正文

CANN/cannbot-skills A5设备约束指南

a5 Device Constraints

【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体,本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills

Read this file when writing a kernel targeting a5 (easyasc.a5, device950) and the kernel has any vec-side stage. Do not use it as a substitute for the general kernel-authoring playbook.

Goal

Capture the stable a5 vec-side authoring surface so that:

  • a2-style direct vec-body patterns are not copied into a5 kernels
  • vec-side work starts on the supported authoring surfaces
  • easyasc.a5import breadth is not mistaken for the stable kernel-writing contract

1. Stable a5 vec-side authoring rule

For a5, vec-side work should be authored through:

  • @vf()helpers for ordinary vec preprocess / postprocess
  • microops inside@vf()when register-level control is required
  • sort-family ops such assort32,mergesort4, andmergesort_2seqwhen the kernel genuinely needs sort behavior
  • ub_to_ubfor UB-local copies or layout-preserving handoff steps

Donotwrite generic a2-style vec UB ops directly in the a5 kernel body. If the step is elementwise, row-wise, reduction, normalization, or cast-oriented on a5, move it into@vf()first and only drop tomicrowhen@vf()alone is not enough.

Important note:

  • the raweasyasc.a5export surface is wider than this stable authoring rule
  • treat the authoring rule above as the repository contract for new a5 kernels

2. Contrast with a2

  • a2 doesnotsupport@vf()
  • a2 doesnotsupportmicro
  • a2 vec work is written directly in the kernel body on UB tensors
  • do not mirror an a2 pure-vec kernel body into a5, or an a5@vf()flow into a2

3. Implications for common topologies

Stable a5 forms:

  • cube -> vec:GM -> L1 -> L0 -> L0C -> UB -> @vf() -> GM
  • vec -> cube:GM -> UB -> @vf() -> UB -> L1 -> L0 -> L0C -> GM
  • vec-only transform:GM -> UB -> @vf()orGM -> UB -> @vf() + micro -> GM
  • UB-local republish / copy:ub_to_ubmay stay in the kernel body if it is truly just the copy step

Practical rule:

  • if you are about to call ordinary vec math on an a5 UB tensor from the kernel body, stop and move that logic into@vf()

3a. Cube-side matmul dependency reuse rule

When a later cube matmul depends on the result of an earlier cube matmul, check first whether the dependency can stay on the cube-side path:

  • producer:mmad -> L0C
  • republish:l0c_to_l1
  • consumer: laterl1_to_l0 -> mmad

Prefer that directL0C -> L1route when:

  • the intermediate value is only needed by a later cube-side matmul
  • no vec-side transform is required on the intermediate value before reuse

Avoid the detour:

  • L0C -> UB -> L1

unless the UB hop is semantically required for a real vec-side stage such as:

  • cast / normalization / elementwise transform in@vf()
  • a cube -> vec handoff that genuinely changes ownership to the vec lane

Reason:

  • l0c_to_l1already gives you the FIX-side republish path for this dependency
  • the UB detour adds traffic, adds synchronization surface, and makes the kernel easier to overcomplicate without adding capability

Practical debugging hint:

  • if you find yourself moving a pure matmul dependency through UB only so a later matmul can read it back, stop and re-check whetherl0c_to_l1already expresses the intended dependence

4. When to usemicro

Usemicroon a5 when the vec stage needs register-level behavior such as:

  • explicit fp8 cast control
  • pack4()/ sparse-lane squeeze patterns
  • explicit mask or cast-config handling
  • custom register reductions or packing not expressible cleanly as plainTensor <<= Reg/RegList

Prefer plain@vf()first when it already matches the contract. For example, aReg/RegListloaded in@vf()and written back withdst[...] <<= regsis usually simpler than dropping to explicitmicro cast + pack4.

Another stable case that should stay in@vf():

  • row-recursive vec kernels where each output row depends on the previous output row
  • example shape: load one GM chunk as[chunk_size, H], then compute
    • y[0, :] = x[0, :]
    • y[i, :] = x[i, :] + y[i - 1, :]
  • on a5, keep that recurrence in@vf()withReg/RegListslices over the row width
  • donotreach forcpaddor custommicrojust because the math is cumulative;cpaddis pair-wise add, not row-prefix recurrence
  • only drop tomicroif the recurrence itself needs per-lane scan behavior inside one row rather than previous-row carry

Files to study

  • agent/example/kernels/a5/basic_cube_vec_mix.py
  • agent/example/kernels/a5/chunk_row_cumsum.py
  • agent/example/kernels/a5/matmul_half_splitn_bias10p2_vf.py
  • agent/example/kernels/a5/matmul_kmkn_blockwise_quant128.py
  • agent/example/kernels/a5/micro_cast_fp8_pack4_dual.py
  • agent/example/kernels/a5/vec_cube_abs_sqrt_matmul.py

【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体,本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

http://www.jsqmd.com/news/784819/

相关文章:

  • 2026届必备的六大降AI率助手实测分析
  • 自监督学习、能量模型与JEPA:构建下一代AI世界模型的核心技术
  • CANN社区机器人能力列表
  • 多模态大模型赋能港口,从视频孪生迈向空间原生智能
  • Phi-4-Reasoning-Vision商业应用:电商商品图深度解析+卖点自动生成方案
  • AI优化疫苗接种干预:ADVISER框架在尼日利亚公共卫生最后一公里的实践
  • FireRedASR-AED-L入门必看:1.1B参数大模型本地化部署全流程
  • 如何快速掌握鼠标键盘自动化:KeymouseGo完整入门指南
  • 全面掌握Windows驱动管理:DriverStore Explorer实战指南
  • 3分钟掌握微信聊天记录解密:WechatDecrypt让你的数据重获自由
  • CAPL编程避坑指南:搞懂NetWork Node里的全局变量、文件包含与编译那些事儿
  • 律师上课记干货太吃力!2026年3款b站视频怎么转文字工具,1分钟导出整理办案笔记
  • CANN/catlass 逐令牌反量化
  • 等变神经网络:用群论与表示论构建具备对称性先验的AI模型
  • 如何快速掌握Video DownloadHelper CoApp:新手入门完整指南
  • CANN/catccos AllGather反量化算子
  • CANN/ATVC ACLNN调用示例
  • 从SPI到8080:一文搞懂MIPI DBI(Type C)如何驱动你的LCD屏并优化帧率
  • CANN/AMCT KV-Cache量化模型创建
  • 乡村全科执业医师培训机构哪个好?这份2026最新调研报告告诉你 - 医考机构品牌测评专家
  • RT2.0 动态 Shape 执行器特性分析
  • 从“算力竞赛”到“业务落地”:AI营销一体机选型的几点思考
  • Java老兵转型AI开发实战指南:收藏这份从零到精通的学习路线,小白也能快速上手大模型
  • 2026年4月靠谱的通风蝶阀厂家推荐,电动组合风阀/岗位轴流风机/吊顶式空调机组/通风蝶阀,通风蝶阀门店找哪家 - 品牌推荐师
  • 避坑指南:在CentOS7上为TensorFlow2.6搭建Python3.8环境,我踩过的那些‘依赖’雷
  • Swift-All实战:用T4显卡微调7B大模型,一小时成本不到5块钱
  • 深度学习赋能医学影像:COVID-19检测与病灶分割技术全解析
  • 淘金币自动化脚本终极指南:如何每天5分钟完成淘宝全任务
  • 2025届毕业生推荐的五大降AI率工具横评
  • 高校研究小组如何借助Taotoken低成本使用多种大模型进行实验