昇腾CANN PTO ISA 概述
PTO ISA Overview
【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa
This page is the source-synchronized ISA index generated fromdocs/isa/manifest.yaml.
Docs Contents
| Area | Page | Description |
|---|---|---|
| Overview | docs/README.md | PTO ISA guide entry point and navigation. |
| Overview | docs/PTOISA.md | This page (overview + full instruction index). |
| ISA reference | docs/isa/README.md | Per-instruction reference directory index. |
| ISA reference | docs/isa/conventions.md | Shared notation, operands, events, and modifiers. |
| Assembly (PTO-AS) | docs/assembly/PTO-AS.md | PTO-AS syntax reference. |
| Source of truth | include/pto/common/pto_instr.hpp | C++ intrinsic API (authoritative). |
| PTO Auto Mode | docs/auto_mode/README.md | PTO auto mode guide entry point. |
Instruction Index (All PTO Instructions)
| Category | Instruction | Description |
|---|---|---|
| Synchronization | TSYNC | Synchronize PTO execution (wait on events or insert a per-op pipeline barrier). |
| Manual / Resource Binding | TASSIGN | Bind a Tile object to an implementation-defined on-chip address (manual placement). |
| Manual / Resource Binding | SETFMATRIX | Set FMATRIX register(s) for IMG2COL-like ops. |
| Manual / Resource Binding | SET_IMG2COL_RPT | Set IMG2COL repeat metadata from an IMG2COL configuration tile. |
| Manual / Resource Binding | SET_IMG2COL_PADDING | Set IMG2COL padding metadata from an IMG2COL configuration tile. |
| Elementwise (Tile-Tile) | TADD | Elementwise add of two tiles. |
| Elementwise (Tile-Tile) | TABS | Elementwise absolute value of a tile. |
| Elementwise (Tile-Tile) | TAND | Elementwise bitwise AND of two tiles. |
| Elementwise (Tile-Tile) | TOR | Elementwise bitwise OR of two tiles. |
| Elementwise (Tile-Tile) | TSUB | Elementwise subtract of two tiles. |
| Elementwise (Tile-Tile) | TMUL | Elementwise multiply of two tiles. |
| Elementwise (Tile-Tile) | TMIN | Elementwise minimum of two tiles. |
| Elementwise (Tile-Tile) | TMAX | Elementwise maximum of two tiles. |
| Elementwise (Tile-Tile) | TCMP | Compare two tiles and write a packed predicate mask. |
| Elementwise (Tile-Tile) | TDIV | Elementwise division of two tiles. |
| Elementwise (Tile-Tile) | TSHL | Elementwise shift-left of two tiles. |
| Elementwise (Tile-Tile) | TSHR | Elementwise shift-right of two tiles. |
| Elementwise (Tile-Tile) | TXOR | Elementwise bitwise XOR of two tiles. |
| Elementwise (Tile-Tile) | TLOG | Elementwise natural logarithm of a tile. |
| Elementwise (Tile-Tile) | TRECIP | Elementwise reciprocal of a tile. |
| Elementwise (Tile-Tile) | TPRELU | Elementwise PReLU (parametric ReLU) with a per-element slope tile. |
| Elementwise (Tile-Tile) | TADDC | Elementwise ternary add:src0 + src1 + src2. |
| Elementwise (Tile-Tile) | TSUBC | Elementwise ternary op:src0 - src1 + src2. |
| Elementwise (Tile-Tile) | TCVT | Elementwise type conversion with a specified rounding mode. |
| Elementwise (Tile-Tile) | TSEL | Select between two tiles using a mask tile (per-element selection). |
| Elementwise (Tile-Tile) | TRSQRT | Elementwise reciprocal square root. |
| Elementwise (Tile-Tile) | TSQRT | Elementwise square root. |
| Elementwise (Tile-Tile) | TEXP | Elementwise exponential. |
| Elementwise (Tile-Tile) | TNOT | Elementwise bitwise NOT of a tile. |
| Elementwise (Tile-Tile) | TRELU | Elementwise ReLU of a tile. |
| Elementwise (Tile-Tile) | TNEG | Elementwise negation of a tile. |
| Elementwise (Tile-Tile) | TREM | Elementwise remainder of two tiles. |
| Elementwise (Tile-Tile) | TFMOD | Elementwise fmod of two tiles. |
| Elementwise (Tile-Tile) | TPOW | Elementwise power of two tiles. |
| Tile-Scalar / Tile-Immediate | TEXPANDS | Broadcast a scalar into a destination tile. |
| Tile-Scalar / Tile-Immediate | TCMPS | Compare a tile against a scalar and write per-element comparison results. |
| Tile-Scalar / Tile-Immediate | TSELS | Select between source tile and scalar using a mask tile (per-element selection for source tile). |
| Tile-Scalar / Tile-Immediate | TMINS | Elementwise minimum of a tile and a scalar. |
| Tile-Scalar / Tile-Immediate | TADDS | Elementwise add a scalar to a tile. |
| Tile-Scalar / Tile-Immediate | TSUBS | Elementwise subtract a scalar from a tile. |
| Tile-Scalar / Tile-Immediate | TDIVS | Elementwise division with a scalar (tile/scalar or scalar/tile). |
| Tile-Scalar / Tile-Immediate | TMULS | Elementwise multiply a tile by a scalar. |
| Tile-Scalar / Tile-Immediate | TFMODS | Elementwise remainder with a scalar:fmod(src, scalar). |
| Tile-Scalar / Tile-Immediate | TREMS | Elementwise remainder with a scalar:remainder(src, scalar). |
| Tile-Scalar / Tile-Immediate | TMAXS | Elementwise max of a tile and a scalar:max(src, scalar). |
| Tile-Scalar / Tile-Immediate | TANDS | Elementwise bitwise AND of a tile and a scalar. |
| Tile-Scalar / Tile-Immediate | TORS | Elementwise bitwise OR of a tile and a scalar. |
| Tile-Scalar / Tile-Immediate | TSHLS | Elementwise shift-left a tile by a scalar. |
| Tile-Scalar / Tile-Immediate | TSHRS | Elementwise shift-right a tile by a scalar. |
| Tile-Scalar / Tile-Immediate | TXORS | Elementwise bitwise XOR of a tile and a scalar. |
| Tile-Scalar / Tile-Immediate | TLRELU | Leaky ReLU with a scalar slope. |
| Tile-Scalar / Tile-Immediate | TADDSC | Elementwise fused add with scalar and a second tile:src0 + scalar + src1. |
| Tile-Scalar / Tile-Immediate | TSUBSC | Elementwise fused op:src0 - scalar + src1. |
| Tile-Scalar / Tile-Immediate | TPOWS | Elementwise power of a tile by a scalar. |
| Axis Reduce / Expand | TROWSUM | Reduce each row by summing across columns. |
| Axis Reduce / Expand | TROWPROD | Reduce each row by multiplying across columns. |
| Axis Reduce / Expand | TCOLSUM | Reduce each column by summing across rows. |
| Axis Reduce / Expand | TCOLPROD | Reduce each column by multiplying across rows. |
| Axis Reduce / Expand | TCOLMAX | Reduce each column by taking the maximum across rows. |
| Axis Reduce / Expand | TROWMAX | Reduce each row by taking the maximum across columns. |
| Axis Reduce / Expand | TROWMIN | Reduce each row by taking the minimum across columns. |
| Axis Reduce / Expand | TROWARGMAX | Get the column index of the maximum element for each row. |
| Axis Reduce / Expand | TROWARGMIN | Get the column index of the minimum element for each row. |
| Axis Reduce / Expand | TCOLARGMAX | Get the row index of the maximum element for each column. |
| Axis Reduce / Expand | TCOLARGMIN | Get the row index of the minimum element for each column. |
| Axis Reduce / Expand | TROWEXPAND | Broadcast the first element of each source row across the destination row. |
| Axis Reduce / Expand | TROWEXPANDDIV | Row-wise broadcast divide: divide each row ofsrc0by a per-row scalar vectorsrc1. |
| Axis Reduce / Expand | TROWEXPANDMUL | Row-wise broadcast multiply: multiply each row ofsrc0by a per-row scalar vectorsrc1. |
| Axis Reduce / Expand | TROWEXPANDSUB | Row-wise broadcast subtract: subtract a per-row scalar vectorsrc1from each row ofsrc0. |
| Axis Reduce / Expand | TROWEXPANDADD | Row-wise broadcast add: add a per-row scalar vector. |
| Axis Reduce / Expand | TROWEXPANDMAX | Row-wise broadcast max with a per-row scalar vector. |
| Axis Reduce / Expand | TROWEXPANDMIN | Row-wise broadcast min with a per-row scalar vector. |
| Axis Reduce / Expand | TROWEXPANDEXPDIF | Row-wise exp-diff: compute exp(src0 - src1) with per-row scalars. |
| Axis Reduce / Expand | TCOLMIN | Reduce each column by taking the minimum across rows. |
| Axis Reduce / Expand | TCOLEXPAND | Broadcast the first element of each source column across the destination column. |
| Axis Reduce / Expand | TCOLEXPANDDIV | Column-wise broadcast divide: divide each column by a per-column scalar vector. |
| Axis Reduce / Expand | TCOLEXPANDMUL | Column-wise broadcast multiply: multiply each column by a per-column scalar vector. |
| Axis Reduce / Expand | TCOLEXPANDADD | Column-wise broadcast add with per-column scalar vector. |
| Axis Reduce / Expand | TCOLEXPANDMAX | Column-wise broadcast max with per-column scalar vector. |
| Axis Reduce / Expand | TCOLEXPANDMIN | Column-wise broadcast min with per-column scalar vector. |
| Axis Reduce / Expand | TCOLEXPANDSUB | Column-wise broadcast subtract: subtract a per-column scalar vector from each column. |
| Axis Reduce / Expand | TCOLEXPANDEXPDIF | Column-wise exp-diff: compute exp(src0 - src1) with per-column scalars. |
| Memory (GM <-> Tile) | TLOAD | Load data from a GlobalTensor (GM) into a Tile. |
| Memory (GM <-> Tile) | TPREFETCH | Prefetch data from global memory into a tile-local cache/buffer (hint). |
| Memory (GM <-> Tile) | TSTORE | Store data from a Tile into a GlobalTensor (GM), optionally using atomic write or quantization parameters. |
| Memory (GM <-> Tile) | TSTORE_FP | Store an accumulator tile into global memory using a scaling (fp) tile for vector quantization parameters. |
| Memory (GM <-> Tile) | MGATHER | Gather-load elements from global memory into a tile using per-element indices. |
| Memory (GM <-> Tile) | MSCATTER | Scatter-store elements from a tile into global memory using per-element indices. |
| Matrix Multiply | TGEMV_MX | GEMV with additional scaling tiles for mixed-precision / quantized matrix-vector compute. |
| Matrix Multiply | TMATMUL_MX | Matrix multiply (GEMM) with additional scaling tiles for mixed-precision / quantized matmul on supported targets. |
| Matrix Multiply | TMATMUL | Matrix multiply (GEMM) producing an accumulator/output tile. |
| Matrix Multiply | TMATMUL_ACC | Matrix multiply with accumulator input (fused accumulate). |
| Matrix Multiply | TMATMUL_BIAS | Matrix multiply with bias add. |
| Matrix Multiply | TGEMV | General Matrix-Vector multiplication producing an accumulator/output tile. |
| Matrix Multiply | TGEMV_ACC | GEMV with explicit accumulator input/output tiles. |
| Matrix Multiply | TGEMV_BIAS | GEMV with bias add. |
| Data Movement / Layout | TEXTRACT | Extract a sub-tile from a source tile. |
| Data Movement / Layout | TEXTRACT_FP | Extract with fp/scaling tile (vector-quantization parameters). |
| Data Movement / Layout | TIMG2COL | Image-to-column transform for convolution-like workloads. |
| Data Movement / Layout | TINSERT | Insert a sub-tile into a destination tile at an (indexRow, indexCol) offset. |
| Data Movement / Layout | TINSERT_FP | Insert with fp/scaling tile (vector-quantization parameters). |
| Data Movement / Layout | TFILLPAD | Copy+pad a tile outside the valid region with a compile-time pad value. |
| Data Movement / Layout | TFILLPAD_INPLACE | In-place fill/pad variant. |
| Data Movement / Layout | TFILLPAD_EXPAND | Fill/pad while allowing dst to be larger than src. |
| Data Movement / Layout | TMOV | Move/copy between tiles, optionally applying implementation-defined conversion modes. |
| Data Movement / Layout | TMOV_FP | Move/convert from an accumulator tile into a destination tile, using a scaling (fp) tile for vector quantization parameters. |
| Data Movement / Layout | TRESHAPE | Reinterpret a tile as another tile type/shape while preserving the underlying bytes. |
| Data Movement / Layout | TTRANS | Transpose with an implementation-defined temporary tile. |
| Data Movement / Layout | TSUBVIEW | Reinterpret a tile as a subtile of another tile. |
| Data Movement / Layout | TGET_SCALE_ADDR | Bind the on-chip address of output tile to a scaled factor of that of input tile. |
| Complex | TPRINT | Debug/print elements from a tile (implementation-defined). |
| Complex | TMRGSORT | Merge sort for multiple sorted lists (implementation-defined element format and layout). |
| Complex | TSORT32 | Sort 32-element blocks ofsrcwith accompanyingidxentries and output sorted value-index pairs. |
| Complex | TGATHER | Gather/select elements using either an index tile or a compile-time mask pattern. |
| Complex | TCI | Generate a contiguous integer sequence into a destination tile. |
| Complex | TTRI | Generate a triangular (lower/upper) mask tile. |
| Complex | TRANDOM | Generates random numbers in the destination tile using a counter-based cipher algorithm. |
| Complex | TPARTADD | Partial elementwise add with implementation-defined handling of mismatched valid regions. |
| Complex | TPARTMUL | Partial elementwise multiply with implementation-defined handling of mismatched valid regions. |
| Complex | TPARTMAX | Partial elementwise max with implementation-defined handling of mismatched valid regions. |
| Complex | TPARTMIN | Partial elementwise min with implementation-defined handling of mismatched valid regions. |
| Complex | TPARTARGMAX | Partial elementwise max selection returning corresponding index (argmax), with implementation-defined handling of mismatched valid regions. |
| Complex | TPARTARGMIN | Partial elementwise min selection returning corresponding index (argmin), with implementation-defined handling of mismatched valid regions. |
| Complex | TGATHERB | Gather elements using byte offsets. |
| Complex | TSCATTER | Scatter rows of a source tile into a destination tile using per-element row indices. |
| Complex | TQUANT | Quantize a tile (e.g. FP32 to FP8) producing exponent/scaling/max outputs. |
| Communication | TPUT | Remote write: transfer local data to remote NPU memory (GM → UB → GM). |
| Communication | TGET | Remote read: read remote NPU data to local memory (GM → UB → GM). |
| Communication | TPUT_ASYNC | Asynchronous remote write (local GM → DMA engine → remote GM). |
| Communication | TGET_ASYNC | Asynchronous remote read (remote GM → DMA engine → local GM). |
| Communication | TNOTIFY | Send flag notification to remote NPU. |
| Communication | TWAIT | Blocking wait until signal(s) meet comparison condition. |
| Communication | TTEST | Non-blocking test if signal(s) meet comparison condition. |
| Communication | TGATHER | Gather data from all ranks and concatenate along DIM_3. |
| Communication | TSCATTER | Scatter data to all ranks by splitting along DIM_3. |
| Communication | TREDUCE | Gather and reduce data from all ranks element-wise to local. |
| Communication | TBROADCAST | Broadcast data from current NPU to all ranks. |
【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考
