当前位置: 首页 > news >正文

昇腾CANN PTO ISA 概述

PTO ISA Overview

【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa

This page is the source-synchronized ISA index generated fromdocs/isa/manifest.yaml.

Docs Contents

AreaPageDescription
Overviewdocs/README.mdPTO ISA guide entry point and navigation.
Overviewdocs/PTOISA.mdThis page (overview + full instruction index).
ISA referencedocs/isa/README.mdPer-instruction reference directory index.
ISA referencedocs/isa/conventions.mdShared notation, operands, events, and modifiers.
Assembly (PTO-AS)docs/assembly/PTO-AS.mdPTO-AS syntax reference.
Source of truthinclude/pto/common/pto_instr.hppC++ intrinsic API (authoritative).
PTO Auto Modedocs/auto_mode/README.mdPTO auto mode guide entry point.

Instruction Index (All PTO Instructions)

CategoryInstructionDescription
SynchronizationTSYNCSynchronize PTO execution (wait on events or insert a per-op pipeline barrier).
Manual / Resource BindingTASSIGNBind a Tile object to an implementation-defined on-chip address (manual placement).
Manual / Resource BindingSETFMATRIXSet FMATRIX register(s) for IMG2COL-like ops.
Manual / Resource BindingSET_IMG2COL_RPTSet IMG2COL repeat metadata from an IMG2COL configuration tile.
Manual / Resource BindingSET_IMG2COL_PADDINGSet IMG2COL padding metadata from an IMG2COL configuration tile.
Elementwise (Tile-Tile)TADDElementwise add of two tiles.
Elementwise (Tile-Tile)TABSElementwise absolute value of a tile.
Elementwise (Tile-Tile)TANDElementwise bitwise AND of two tiles.
Elementwise (Tile-Tile)TORElementwise bitwise OR of two tiles.
Elementwise (Tile-Tile)TSUBElementwise subtract of two tiles.
Elementwise (Tile-Tile)TMULElementwise multiply of two tiles.
Elementwise (Tile-Tile)TMINElementwise minimum of two tiles.
Elementwise (Tile-Tile)TMAXElementwise maximum of two tiles.
Elementwise (Tile-Tile)TCMPCompare two tiles and write a packed predicate mask.
Elementwise (Tile-Tile)TDIVElementwise division of two tiles.
Elementwise (Tile-Tile)TSHLElementwise shift-left of two tiles.
Elementwise (Tile-Tile)TSHRElementwise shift-right of two tiles.
Elementwise (Tile-Tile)TXORElementwise bitwise XOR of two tiles.
Elementwise (Tile-Tile)TLOGElementwise natural logarithm of a tile.
Elementwise (Tile-Tile)TRECIPElementwise reciprocal of a tile.
Elementwise (Tile-Tile)TPRELUElementwise PReLU (parametric ReLU) with a per-element slope tile.
Elementwise (Tile-Tile)TADDCElementwise ternary add:src0 + src1 + src2.
Elementwise (Tile-Tile)TSUBCElementwise ternary op:src0 - src1 + src2.
Elementwise (Tile-Tile)TCVTElementwise type conversion with a specified rounding mode.
Elementwise (Tile-Tile)TSELSelect between two tiles using a mask tile (per-element selection).
Elementwise (Tile-Tile)TRSQRTElementwise reciprocal square root.
Elementwise (Tile-Tile)TSQRTElementwise square root.
Elementwise (Tile-Tile)TEXPElementwise exponential.
Elementwise (Tile-Tile)TNOTElementwise bitwise NOT of a tile.
Elementwise (Tile-Tile)TRELUElementwise ReLU of a tile.
Elementwise (Tile-Tile)TNEGElementwise negation of a tile.
Elementwise (Tile-Tile)TREMElementwise remainder of two tiles.
Elementwise (Tile-Tile)TFMODElementwise fmod of two tiles.
Elementwise (Tile-Tile)TPOWElementwise power of two tiles.
Tile-Scalar / Tile-ImmediateTEXPANDSBroadcast a scalar into a destination tile.
Tile-Scalar / Tile-ImmediateTCMPSCompare a tile against a scalar and write per-element comparison results.
Tile-Scalar / Tile-ImmediateTSELSSelect between source tile and scalar using a mask tile (per-element selection for source tile).
Tile-Scalar / Tile-ImmediateTMINSElementwise minimum of a tile and a scalar.
Tile-Scalar / Tile-ImmediateTADDSElementwise add a scalar to a tile.
Tile-Scalar / Tile-ImmediateTSUBSElementwise subtract a scalar from a tile.
Tile-Scalar / Tile-ImmediateTDIVSElementwise division with a scalar (tile/scalar or scalar/tile).
Tile-Scalar / Tile-ImmediateTMULSElementwise multiply a tile by a scalar.
Tile-Scalar / Tile-ImmediateTFMODSElementwise remainder with a scalar:fmod(src, scalar).
Tile-Scalar / Tile-ImmediateTREMSElementwise remainder with a scalar:remainder(src, scalar).
Tile-Scalar / Tile-ImmediateTMAXSElementwise max of a tile and a scalar:max(src, scalar).
Tile-Scalar / Tile-ImmediateTANDSElementwise bitwise AND of a tile and a scalar.
Tile-Scalar / Tile-ImmediateTORSElementwise bitwise OR of a tile and a scalar.
Tile-Scalar / Tile-ImmediateTSHLSElementwise shift-left a tile by a scalar.
Tile-Scalar / Tile-ImmediateTSHRSElementwise shift-right a tile by a scalar.
Tile-Scalar / Tile-ImmediateTXORSElementwise bitwise XOR of a tile and a scalar.
Tile-Scalar / Tile-ImmediateTLRELULeaky ReLU with a scalar slope.
Tile-Scalar / Tile-ImmediateTADDSCElementwise fused add with scalar and a second tile:src0 + scalar + src1.
Tile-Scalar / Tile-ImmediateTSUBSCElementwise fused op:src0 - scalar + src1.
Tile-Scalar / Tile-ImmediateTPOWSElementwise power of a tile by a scalar.
Axis Reduce / ExpandTROWSUMReduce each row by summing across columns.
Axis Reduce / ExpandTROWPRODReduce each row by multiplying across columns.
Axis Reduce / ExpandTCOLSUMReduce each column by summing across rows.
Axis Reduce / ExpandTCOLPRODReduce each column by multiplying across rows.
Axis Reduce / ExpandTCOLMAXReduce each column by taking the maximum across rows.
Axis Reduce / ExpandTROWMAXReduce each row by taking the maximum across columns.
Axis Reduce / ExpandTROWMINReduce each row by taking the minimum across columns.
Axis Reduce / ExpandTROWARGMAXGet the column index of the maximum element for each row.
Axis Reduce / ExpandTROWARGMINGet the column index of the minimum element for each row.
Axis Reduce / ExpandTCOLARGMAXGet the row index of the maximum element for each column.
Axis Reduce / ExpandTCOLARGMINGet the row index of the minimum element for each column.
Axis Reduce / ExpandTROWEXPANDBroadcast the first element of each source row across the destination row.
Axis Reduce / ExpandTROWEXPANDDIVRow-wise broadcast divide: divide each row ofsrc0by a per-row scalar vectorsrc1.
Axis Reduce / ExpandTROWEXPANDMULRow-wise broadcast multiply: multiply each row ofsrc0by a per-row scalar vectorsrc1.
Axis Reduce / ExpandTROWEXPANDSUBRow-wise broadcast subtract: subtract a per-row scalar vectorsrc1from each row ofsrc0.
Axis Reduce / ExpandTROWEXPANDADDRow-wise broadcast add: add a per-row scalar vector.
Axis Reduce / ExpandTROWEXPANDMAXRow-wise broadcast max with a per-row scalar vector.
Axis Reduce / ExpandTROWEXPANDMINRow-wise broadcast min with a per-row scalar vector.
Axis Reduce / ExpandTROWEXPANDEXPDIFRow-wise exp-diff: compute exp(src0 - src1) with per-row scalars.
Axis Reduce / ExpandTCOLMINReduce each column by taking the minimum across rows.
Axis Reduce / ExpandTCOLEXPANDBroadcast the first element of each source column across the destination column.
Axis Reduce / ExpandTCOLEXPANDDIVColumn-wise broadcast divide: divide each column by a per-column scalar vector.
Axis Reduce / ExpandTCOLEXPANDMULColumn-wise broadcast multiply: multiply each column by a per-column scalar vector.
Axis Reduce / ExpandTCOLEXPANDADDColumn-wise broadcast add with per-column scalar vector.
Axis Reduce / ExpandTCOLEXPANDMAXColumn-wise broadcast max with per-column scalar vector.
Axis Reduce / ExpandTCOLEXPANDMINColumn-wise broadcast min with per-column scalar vector.
Axis Reduce / ExpandTCOLEXPANDSUBColumn-wise broadcast subtract: subtract a per-column scalar vector from each column.
Axis Reduce / ExpandTCOLEXPANDEXPDIFColumn-wise exp-diff: compute exp(src0 - src1) with per-column scalars.
Memory (GM <-> Tile)TLOADLoad data from a GlobalTensor (GM) into a Tile.
Memory (GM <-> Tile)TPREFETCHPrefetch data from global memory into a tile-local cache/buffer (hint).
Memory (GM <-> Tile)TSTOREStore data from a Tile into a GlobalTensor (GM), optionally using atomic write or quantization parameters.
Memory (GM <-> Tile)TSTORE_FPStore an accumulator tile into global memory using a scaling (fp) tile for vector quantization parameters.
Memory (GM <-> Tile)MGATHERGather-load elements from global memory into a tile using per-element indices.
Memory (GM <-> Tile)MSCATTERScatter-store elements from a tile into global memory using per-element indices.
Matrix MultiplyTGEMV_MXGEMV with additional scaling tiles for mixed-precision / quantized matrix-vector compute.
Matrix MultiplyTMATMUL_MXMatrix multiply (GEMM) with additional scaling tiles for mixed-precision / quantized matmul on supported targets.
Matrix MultiplyTMATMULMatrix multiply (GEMM) producing an accumulator/output tile.
Matrix MultiplyTMATMUL_ACCMatrix multiply with accumulator input (fused accumulate).
Matrix MultiplyTMATMUL_BIASMatrix multiply with bias add.
Matrix MultiplyTGEMVGeneral Matrix-Vector multiplication producing an accumulator/output tile.
Matrix MultiplyTGEMV_ACCGEMV with explicit accumulator input/output tiles.
Matrix MultiplyTGEMV_BIASGEMV with bias add.
Data Movement / LayoutTEXTRACTExtract a sub-tile from a source tile.
Data Movement / LayoutTEXTRACT_FPExtract with fp/scaling tile (vector-quantization parameters).
Data Movement / LayoutTIMG2COLImage-to-column transform for convolution-like workloads.
Data Movement / LayoutTINSERTInsert a sub-tile into a destination tile at an (indexRow, indexCol) offset.
Data Movement / LayoutTINSERT_FPInsert with fp/scaling tile (vector-quantization parameters).
Data Movement / LayoutTFILLPADCopy+pad a tile outside the valid region with a compile-time pad value.
Data Movement / LayoutTFILLPAD_INPLACEIn-place fill/pad variant.
Data Movement / LayoutTFILLPAD_EXPANDFill/pad while allowing dst to be larger than src.
Data Movement / LayoutTMOVMove/copy between tiles, optionally applying implementation-defined conversion modes.
Data Movement / LayoutTMOV_FPMove/convert from an accumulator tile into a destination tile, using a scaling (fp) tile for vector quantization parameters.
Data Movement / LayoutTRESHAPEReinterpret a tile as another tile type/shape while preserving the underlying bytes.
Data Movement / LayoutTTRANSTranspose with an implementation-defined temporary tile.
Data Movement / LayoutTSUBVIEWReinterpret a tile as a subtile of another tile.
Data Movement / LayoutTGET_SCALE_ADDRBind the on-chip address of output tile to a scaled factor of that of input tile.
ComplexTPRINTDebug/print elements from a tile (implementation-defined).
ComplexTMRGSORTMerge sort for multiple sorted lists (implementation-defined element format and layout).
ComplexTSORT32Sort 32-element blocks ofsrcwith accompanyingidxentries and output sorted value-index pairs.
ComplexTGATHERGather/select elements using either an index tile or a compile-time mask pattern.
ComplexTCIGenerate a contiguous integer sequence into a destination tile.
ComplexTTRIGenerate a triangular (lower/upper) mask tile.
ComplexTRANDOMGenerates random numbers in the destination tile using a counter-based cipher algorithm.
ComplexTPARTADDPartial elementwise add with implementation-defined handling of mismatched valid regions.
ComplexTPARTMULPartial elementwise multiply with implementation-defined handling of mismatched valid regions.
ComplexTPARTMAXPartial elementwise max with implementation-defined handling of mismatched valid regions.
ComplexTPARTMINPartial elementwise min with implementation-defined handling of mismatched valid regions.
ComplexTPARTARGMAXPartial elementwise max selection returning corresponding index (argmax), with implementation-defined handling of mismatched valid regions.
ComplexTPARTARGMINPartial elementwise min selection returning corresponding index (argmin), with implementation-defined handling of mismatched valid regions.
ComplexTGATHERBGather elements using byte offsets.
ComplexTSCATTERScatter rows of a source tile into a destination tile using per-element row indices.
ComplexTQUANTQuantize a tile (e.g. FP32 to FP8) producing exponent/scaling/max outputs.
CommunicationTPUTRemote write: transfer local data to remote NPU memory (GM → UB → GM).
CommunicationTGETRemote read: read remote NPU data to local memory (GM → UB → GM).
CommunicationTPUT_ASYNCAsynchronous remote write (local GM → DMA engine → remote GM).
CommunicationTGET_ASYNCAsynchronous remote read (remote GM → DMA engine → local GM).
CommunicationTNOTIFYSend flag notification to remote NPU.
CommunicationTWAITBlocking wait until signal(s) meet comparison condition.
CommunicationTTESTNon-blocking test if signal(s) meet comparison condition.
CommunicationTGATHERGather data from all ranks and concatenate along DIM_3.
CommunicationTSCATTERScatter data to all ranks by splitting along DIM_3.
CommunicationTREDUCEGather and reduce data from all ranks element-wise to local.
CommunicationTBROADCASTBroadcast data from current NPU to all ranks.

【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

http://www.jsqmd.com/news/783138/

相关文章:

  • CANN运行时TDT通道基础传输
  • CANN/asnumpy 基准测试
  • AI+SPU-Net:机器人辅助脊柱手术的自动切面规划技术详解
  • CANN/ops-transformer FFA算子设计
  • 5分钟彻底优化魔兽争霸3:解锁高帧率与宽屏支持的完整指南
  • CANN驱动获取设备PCIe信息v2
  • CANN/PTO-ISA同步算法优化
  • 从停机问题到AI责任:技术不可判定性与法律归责的跨界思考
  • CANN/pyasc向上取整函数
  • SMDA扩散(面向线性复杂度长上下文语言建模的序列流形扩散聚合) 下一代 大模型核心模型,有可能取代Transformer架构的自注意力机制
  • LobeHub 这玩意儿,到底香在哪?
  • AI赋能空间天气预报:深度学习预测太阳耀斑的技术实践
  • 你以为AI先裁基层,其实最危险的是中层管理者
  • 基于可解释AI与核形态分析的淋巴瘤辅助诊断系统实践
  • CANN/ops-math掩码填充张量
  • CANN/hcomm获取通道通知数API
  • claude cli 登录403问题
  • CANN π₀.₅模型训练优化说明
  • Docker Registry Push 超时排查全记录:从网络栈到残留 veth 的真相
  • MoE、多模态与AGI:生成式AI研究范式的变革与工程实践
  • 联邦学习在物联网场景下的性能评估与基准测试实践
  • CANN运行时跨机内存共享
  • AI驱动电弧故障检测:从传统信号处理到深度学习实战
  • 可解释AI如何破解人机协同决策的信任难题?
  • Likeshop一个开源商城到底有哪些功能模块?
  • CANN块稀疏注意力算子
  • cann/ops-math反射填充算子
  • 创业公司如何借助Taotoken低成本快速验证AI产品创意
  • 组态屏工程备份 / 恢复 / 加密 / 密码忘记
  • CANN PyPTO索引添加UB函数