当前位置: 首页 > news >正文

PTO-ISA库开发者规则

This file lists some rules and limitations on the implementation of this library for pto-isa developers.

【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa

Not following the rules can result in any of the following:

  1. Can't compile (including source-code level compile errors and crash in compiler)
  2. Functionally incorrect (e.g., precision issues)
  3. Bad performance

1 - Remember thatpto::(Conv)Tile::data()returns vector type instead of pointer type in auto mode

The return type of.data()member function isTileDType, which is defined differently in manual vs auto mode. In manual mode this is simply a pointer, while in auto mode it's a vector type. See the details ininclude/pto/common/memory.hpp.

You should always keep this in mind to avoid using the returned value of.data()function directly as a pointer type outside tile functions.

2 - Avoid default initializer for a struct/class member

It's a very common practice to default-initialize data members in a struct or class in C++, for instance:

struct ConvTile { public: ... int shape[ConvTileDetail::MAX_CONVTILE_DIM] = {1}; };

This turns out to cause problems for the SROA pass in the compiler (SROA can't eliminate theAllocaInstof the struct plus all the load and store instructions associated with it). At least in auto mode, please DON'T default initialize the members:

#ifdef __PTO_AUTO__ // In auto mode, do not have default initialization in the class definition itself for its members int shape[ConvTileDetail::MAX_CONVTILE_DIM]; #else int shape[ConvTileDetail::MAX_CONVTILE_DIM] = {1}; #endif

Even though we are programming in C++, we encourage to use POD (Plain Old Data) Aggregate programming to describe structs and classes that is compatible with the C-programming language.

3 - Explicit synchronization is still needed inside tile functions and their callees

TL;DR:

  • Useset_flag,wait_flagorpipe_barrierexplicitly in tile functions and all of their callees.
  • UsePtoSetWaitFlagorTSYNCanywhere else.

Reason: The auto-sync will NOT traverse inside tile functions; as a matter of fact, the whole auto mode compiler works on the tile function level, meaning that everything inside tile function is a complete black box to auto-mode.

For this reason, if any synchronization is needed inside tile function, the library developers should still add synchronizations manually. That's why usingPtoSetWaitFlagandTSYNCwon't work in auto mode because it's no-op. Most of the cases this interface is used by kernel developers.

4 - Avoid usingTASSIGNfor implementation

Currently implementations of some pto instructions directly useTASSIGN_IMPL. This may be a problem for auto mode because it's no-op.

If you useTASSIGNjust to alias 2 tiles, you should useTRESHAPEorTSUBVIEWto achieve the same goal depending on your needs. Anything else won't work for auto mode.

For instance, if you callTASSIGNto allocate memory based on some kind of algorithm, this will never work for auto-mode because the compiler can't possibly recognize the specific algorithm logic and do the same allocation as you want to do in manual mode.

After all, the whole memory allocation in auto mode is based on each individual tile's liveness analysis, without knowing any other context. This is why the current implementation ofTPUSHandTPOPwon't work for auto mode.

5 - Some general rules for*_IMPLfunctions

Some consistency must be ensured for*_IMPLand tile function interface:

  • The function signature must havePTO_INTERNALmacro
  • Its implementation should directly call tile functions inside, don't call any non-tile functions unless they're inlined.
  • Always call.data()function to pass into tile functions, or return-by-reference for all return values of.data(). For example:
TExp(dstTile.data(), srcTile.data()); // correct auto dst = dstTile.data(); // wrong: return by value auto &src = srcTile.data(); // correct: return by reference TExp(dst, src);

6 - Some general rules for tile functions

  • Ensure to usetypename <...>::TileDTypeinstead oftypename <...>::DType *for tile types on tile function parameters
  • Ensure thesetypename <...>::TileDTypeparameters are pass-by-value, not by pointer or reference
  • Ensure__in__or__out__attributes are properly attached to thesetypename <...>::TileDTypeparameters
  • Always call__cce_get_tile_ptron thesetypename <...>::TileDTypearguments to get a tile's underlying buffer pointer
  • The return type should always bevoid. Otherwise the compiler's assumption about TF interface is broken and it's an undefined behavior. Please make all return values as pass-by-value arguments, even just for a single scalar.

7 - Avoid having runtime control flow before tile functions

Having runtime control flows imposes great challenges for auto-sync to work properly. We encourage developers to either try to remove these runtime conditions if they're not necessary or move them inside tile functions if possible.

Some examples includeTROWEXPANDDIV_IMPLandTMULS_IMPL.

【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

http://www.jsqmd.com/news/784735/

相关文章:

  • 新手也能快速出单,亚马逊优质Listing编写攻略。 - 易派
  • Imagination退出RISC-V CPU市场的战略分析
  • Anything V5图像生成服务:7个常见问题与快速修复指南
  • 品质靠谱!2026广州晶石治超非现场执法,每一款都经过严苛检测 - 品牌速递
  • 基于深度学习的YOLOV8目标检测+目标跟踪+车辆测速+车辆行人计数+交互式禁停区域识别+GUI
  • perf热点找到热进程6 - 小镇
  • Claude Code开发者如何配置Taotoken解决额度问题
  • CANN元数据融合解析函数
  • cann/hixl Mooncake Store批处理测试
  • AI赋能建筑电气工程:从图纸审查到智慧运维的实战指南
  • XAI 2.0:从黑箱到白盒,构建可解释、可信赖的下一代人工智能
  • 抖音无水印下载终极指南:免费开源工具完整解决方案
  • 2026治超不停车推荐之选,广州晶石,质量稳定且性价比拉满 - 品牌速递
  • 数据分析中的车辆重新分配
  • LLM API密钥泄露、向量数据库越权、Agent链路劫持——AI原生应用3类新型漏洞全解析,SITS2026合规修复指南
  • 2026重庆黄金回收五大门店“排位赛”:收的顶凭综合实力稳居榜首 - 奢侈品回收测评
  • 【MATLAB实战】从零构建图形化贪吃蛇:面向对象编程与性能调优
  • ThinkPad P53 BIOS设置保姆级指南:从开机F1到虚拟化、启动项全搞定
  • CANN/ops-cv算子调用指南
  • 无人船哪家企业质量好?2026年供应商推荐名单出炉,水上无人装备谁是王者? - 品牌推荐大师
  • Jenkins Inbound Agent Docker镜像:容器化CI/CD构建代理的配置与实战
  • 2026年怎么给照片更换背景?5款工具对比,我的真实体验分享
  • 如何快速搭建个人游戏云:Sunshine终极串流服务器指南
  • 2026年全国电动球阀厂家哪家好 兼具技术实力与售后保障 覆盖多区域需求 - 深度智识库
  • CANN/hccl:rank table配置资源信息(Atlas 300I Duo 推理卡)
  • 2026 深圳黄金奢侈品权威排名,全国连锁正规老店收的顶第一 - 奢侈品回收测评
  • 中医执医培训机构哪个好?四个“硬指标”帮你筛出靠谱选择 - 医考机构品牌测评专家
  • CANN/cann-samples HiFloat8介绍
  • 快手去水印免费软件有哪些?快手如何去掉水印?2026最新实测免费工具推荐 - 爱上科技热点
  • CANN/ops-math MaskedSelectV3算子