当前位置: 首页 > news >正文

CANN / pto-isa PTO Tile 内部函数编程模型

PTO Tile Intrinsics Programming Model

【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa

PTO Tile Lib providestile-granularityintrinsics that map to the PTO ISA. The model is designed for:

  • Portability across device generations: hardware may change (instruction details, storage layout, scheduling), but the programming model remains stable.
  • Near-hardware performance: the Tile and GlobalTensor abstractions are low-level enough to express efficient data movement and compute.
  • Two user profiles: a productive “compiler does the hard work” style, and an expert “I control placement and sync” style.

For the abstract execution model (core/device/host), seedocs/machine/abstract-machine.md.

Core concepts

  • Tile: a fixed-capacity 2-D on-chip buffer (conceptually a tile register / SRAM block) and the unit of computation for most PTO instructions. Seedocs/coding/Tile.md.
  • GlobalTensor: a lightweight view of global memory (GM) as a 5-D tensor with shape/stride/layout metadata, used by memory instructions such asTLOADandTSTORE. Seedocs/coding/GlobalTensor.md.
  • Scalar: immediate values and enumerations that parameterize instructions (rounding modes, comparison modes, atomic modes, etc.). Seedocs/coding/Scalar.md.
  • Event: explicit dependency tokens between pipeline classes, used to order operations without introducing a full barrier everywhere. Seedocs/coding/Event.md.

Two development styles

PTO-Auto

PTO-Auto targets developers who prefer a simple, portable programming experience:

  • The compiler/runtime chooses memory placement and address binding.
  • The compiler inserts required synchronization.
  • The compiler schedules operations and applies fusions when possible.

This mode is a good starting point for correctness and portability.

PTO-Manual

PTO-Manual targets developers who need full control for performance tuning:

  • The developer controls memory placement and binding (for example viaTASSIGN).
  • The developer explicitly expresses ordering (events and/orTSYNC).
  • The developer controls the operation schedule.

This mode enables expert tuning on critical kernels while still using the shared Tile/GlobalTensor abstractions.

Execution models: SPMD and MPMD

PTO supports bothSPMDandMPMDexecution models.

These execution models describehow work is mapped onto cores. They are orthogonal to theAuto vs Manualdevelopment styles (you can write SPMD-Auto, SPMD-Manual, MPMD-Auto, or MPMD-Manual code).

SPMD (Single Program, Multiple Data)

In SPMD, all participating cores run the same entry function, and each core selects its own data region using its runtime identity (for exampleblock_idx).

When sub-block decomposition exists, a stable “virtual id” can be constructed:

auto cid = get_block_idx(); auto vid = get_block_idx() * get_subblockdim() + get_subblockid();

SPMD is a good fit for regular tensor tiling (GEMM, softmax-by-rows, elementwise ops).

MPMD (Multiple Program, Multiple Data)

In MPMD, different cores (or groups of cores) may executedifferent tile programsas part of the same overall tile graph. Conceptually, theDevice Machine schedulerchooses which “program” a core runs.

One portable way to express this is to pass a scheduler-providedtask idinto the kernel entry function and dispatch based on it:

__global__ __aicore__ void KernelMPMD(__gm__ float* out, __gm__ const float* in, uint32_t task_id) { switch (task_id) { case 0: return ProducerStage(out, in); case 1: return ConsumerStage(out, in); default: return; } }

Notes:

  • The exact mechanism that deliverstask_idis platform/runtime dependent; the abstract model only requires that the Device Machine can schedule different tile blocks onto available cores.
  • If you prefer, MPMD can also be expressed asmultiple entry points(multiple kernels) rather than a single kernel with aswitch.

【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

http://www.jsqmd.com/news/784108/

相关文章:

  • SORONA生物基材料2026革新之作,重塑可持续时尚 - 品牌种草官
  • 2026年昆明短视频运营与AI全网推广完整指南|本地化精准获客与转化体系 - 年度推荐企业名录
  • TTT-Discover框架:强化学习在科学发现中的动态适应
  • 2026年4月优质的半自动钉箱机源头厂家推荐,市场半自动钉箱机选哪家 - 品牌推荐师
  • AI增强型网络弹性框架PHOENI2X:关键基础设施安全防御新范式
  • 国产高频红外碳硫分析仪品牌市场表现分析 - 品牌推荐大师
  • edict:专为开发者设计的离线命令行词典工具
  • 魔兽争霸3优化工具:让你的经典游戏焕然一新的完整指南
  • 别再用Time Machine了!针对Intel老Mac的三种‘精准’系统恢复方案实测(2015-2019款适用)
  • 基于MCP协议实现AI助手与n8n自动化平台的无缝集成
  • 基于Signal协议构建自托管加密通信服务器:从原理到部署实践
  • ProcessGPT:生成式AI如何重塑业务流程管理的未来
  • AI应用后端快速开发:基于开源模板的生产级工程实践
  • CANN/catlass Block MMAD开发详解
  • 2026年5月国内信号隔离器品牌TOP10大盘点 - 仪表人叶工
  • 扩散模型与多模态掩码的精准图像编辑技术
  • 技术人如何用工程化思维提升学术写作效率:从工具链到结构化思维
  • CANN/xla-npu BatchMatMul优化
  • FFmpeg QSV滤镜实战:解决`get_buffer() failed`报错的两种内存访问方案对比
  • CANNBot: RoPE预计算参考
  • Taotoken的API Key管理与访问控制功能实践分享
  • 2026 年活性炭箱厂家权威排行榜 TOP5 - 小艾信息发布
  • Dart factory构造函数避坑指南:和普通构造函数的5个关键区别与性能影响
  • ARM架构TLB操作与缓存锁定机制详解
  • CANN/pyasc API文档自动生成工具使用指南
  • AI医疗在非洲的落地实践:机遇、挑战与四步走策略
  • 2026 年生物滤池权威排行榜 TOP5 - 小艾信息发布
  • 高性能计算驱动可扩展AI:科学发现新范式与工程实践
  • StateLM:大语言模型长上下文管理的创新与实践
  • 2026 年挥发性有机物(VOCs)处理领域优质企业 TOP5 - 小艾信息发布