PTO-ISA库开发者规则
This file lists some rules and limitations on the implementation of this library for pto-isa developers.
【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa
Not following the rules can result in any of the following:
- Can't compile (including source-code level compile errors and crash in compiler)
- Functionally incorrect (e.g., precision issues)
- Bad performance
1 - Remember thatpto::(Conv)Tile::data()returns vector type instead of pointer type in auto mode
The return type of.data()member function isTileDType, which is defined differently in manual vs auto mode. In manual mode this is simply a pointer, while in auto mode it's a vector type. See the details ininclude/pto/common/memory.hpp.
You should always keep this in mind to avoid using the returned value of.data()function directly as a pointer type outside tile functions.
2 - Avoid default initializer for a struct/class member
It's a very common practice to default-initialize data members in a struct or class in C++, for instance:
struct ConvTile { public: ... int shape[ConvTileDetail::MAX_CONVTILE_DIM] = {1}; };This turns out to cause problems for the SROA pass in the compiler (SROA can't eliminate theAllocaInstof the struct plus all the load and store instructions associated with it). At least in auto mode, please DON'T default initialize the members:
#ifdef __PTO_AUTO__ // In auto mode, do not have default initialization in the class definition itself for its members int shape[ConvTileDetail::MAX_CONVTILE_DIM]; #else int shape[ConvTileDetail::MAX_CONVTILE_DIM] = {1}; #endifEven though we are programming in C++, we encourage to use POD (Plain Old Data) Aggregate programming to describe structs and classes that is compatible with the C-programming language.
3 - Explicit synchronization is still needed inside tile functions and their callees
TL;DR:
- Use
set_flag,wait_flagorpipe_barrierexplicitly in tile functions and all of their callees. - Use
PtoSetWaitFlagorTSYNCanywhere else.
Reason: The auto-sync will NOT traverse inside tile functions; as a matter of fact, the whole auto mode compiler works on the tile function level, meaning that everything inside tile function is a complete black box to auto-mode.
For this reason, if any synchronization is needed inside tile function, the library developers should still add synchronizations manually. That's why usingPtoSetWaitFlagandTSYNCwon't work in auto mode because it's no-op. Most of the cases this interface is used by kernel developers.
4 - Avoid usingTASSIGNfor implementation
Currently implementations of some pto instructions directly useTASSIGN_IMPL. This may be a problem for auto mode because it's no-op.
If you useTASSIGNjust to alias 2 tiles, you should useTRESHAPEorTSUBVIEWto achieve the same goal depending on your needs. Anything else won't work for auto mode.
For instance, if you callTASSIGNto allocate memory based on some kind of algorithm, this will never work for auto-mode because the compiler can't possibly recognize the specific algorithm logic and do the same allocation as you want to do in manual mode.
After all, the whole memory allocation in auto mode is based on each individual tile's liveness analysis, without knowing any other context. This is why the current implementation ofTPUSHandTPOPwon't work for auto mode.
5 - Some general rules for*_IMPLfunctions
Some consistency must be ensured for*_IMPLand tile function interface:
- The function signature must have
PTO_INTERNALmacro - Its implementation should directly call tile functions inside, don't call any non-tile functions unless they're inlined.
- Always call
.data()function to pass into tile functions, or return-by-reference for all return values of.data(). For example:
TExp(dstTile.data(), srcTile.data()); // correct auto dst = dstTile.data(); // wrong: return by value auto &src = srcTile.data(); // correct: return by reference TExp(dst, src);6 - Some general rules for tile functions
- Ensure to use
typename <...>::TileDTypeinstead oftypename <...>::DType *for tile types on tile function parameters - Ensure these
typename <...>::TileDTypeparameters are pass-by-value, not by pointer or reference - Ensure
__in__or__out__attributes are properly attached to thesetypename <...>::TileDTypeparameters - Always call
__cce_get_tile_ptron thesetypename <...>::TileDTypearguments to get a tile's underlying buffer pointer - The return type should always be
void. Otherwise the compiler's assumption about TF interface is broken and it's an undefined behavior. Please make all return values as pass-by-value arguments, even just for a single scalar.
7 - Avoid having runtime control flow before tile functions
Having runtime control flows imposes great challenges for auto-sync to work properly. We encourage developers to either try to remove these runtime conditions if they're not necessary or move them inside tile functions if possible.
Some examples includeTROWEXPANDDIV_IMPLandTMULS_IMPL.
【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考
