当前位置: 首页 > news >正文

CANN/asc-devkit数据搬运API样例

Data Movement API Sample Introduction

【免费下载链接】asc-devkit本项目是CANN 推出的昇腾AI处理器专用的算子程序开发语言,原生支持C和C++标准规范,主要由类库和语言扩展层构成,提供多层级API,满足多维场景算子开发诉求。项目地址: https://gitcode.com/cann/asc-devkit

Overview

This directory contains samples for multiple APIs related to data movement. Each sample is based on Ascend C's <<<>>> direct call method, supporting implementation of main function and kernel function in the same cpp file.

Sample List

Directory NameFunction Description
broadcast_ub2l0cThis sample implements data broadcast movement based on BroadCastVecToMM, broadcasting data located on UB (Unified Buffer) and moving it to CO1 (L0C Buffer)
copy_ub2ubThis sample implements data movement based on Copy, applicable for data movement between VECIN and VECOUT, supporting mask continuous mode and counter mode
data_copy_gm2ub_sliceThis sample implements data slice movement based on DataCopy, extracting subsets of multi-dimensional Tensor data for movement between GM (Global Memory) and UB (Unified Buffer) pathways
data_copy_gm2ub_nddmaThis sample introduces how to use multi-dimensional data movement interface to implement data movement from GM (Global Memory) to UB (Unified Buffer) pathway. By freely configuring dimension information and corresponding Stride, it can be used for Padding, Transpose, BroadCast, Slice and other data transformation operations
data_copy_l0c2gmThis sample implements data inline quantization activation movement based on DataCopy in convolution scenarios
data_copy_pad_gm2ub_ub2gmThis sample implements non-32-byte aligned data movement based on DataCopyPad, with data padding
data_copy_ub2l1This sample implements data movement from UB (Unified Buffer) to L1 (L1 Buffer) based on DataCopy in Mmad matrix multiplication scenarios
ld_st_reg_maskThis sample implements UB (Unified Buffer) load/store operations to MaskReg (mask register) using Reg programming interface, and mask-based masked store operations
ld_st_reg_alignThis sample implements aligned data movement operations (continuous and non-continuous) from UB (Unified Buffer) to RegTensor (Reg vector computation basic unit) using Reg programming interface
ld_st_reg_unalignThis sample implements unaligned data movement operations from UB (Unified Buffer) to RegTensor (Reg vector computation basic unit) using Reg programming interface
gather_ld_regThis sample demonstrates using Gather interface to implement discrete data load, including high-dimensional Gather (source is LocalTensor) and Reg::GatherB (collect by DataBlock) scenarios
scatter_st_regThis sample demonstrates using Reg::Scatter interface to implement discrete data store (scatter elements to UB)
auxscalar_regThis sample demonstrates using AuxScalar method to read multiple scalar data from UB for computation
move_regThis sample implements data load/store operations from UB (Unified Buffer) to RegTensor using Reg programming interface

【免费下载链接】asc-devkit本项目是CANN 推出的昇腾AI处理器专用的算子程序开发语言,原生支持C和C++标准规范,主要由类库和语言扩展层构成,提供多层级API,满足多维场景算子开发诉求。项目地址: https://gitcode.com/cann/asc-devkit

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

http://www.jsqmd.com/news/862101/

相关文章:

  • 2026最权威一键生成论文工具榜单:这些被高校和导师偷偷推荐的软件你用了吗
  • gdb调试ros2程序
  • LangChain 是什么?从零开始学会 LangChain 的工程实践指南
  • 设计师私藏的11个纹理Prompt原子模块(仅限本周开放下载:含PBR贴图映射表+光照反射系数速查卡)
  • 2026年无添加微辣萝卜干深度厂家推荐 - 行业平台推荐
  • swift-doc与Swift Package Manager的完美结合实践:快速生成专业Swift文档
  • mlir 编译器学习笔记之六 -- 经典实现
  • ubuntu24 主题经验
  • 抖音内容本地化保存解决方案:批量下载与去水印工具实践
  • 谷歌关键词优化seo需要怎么做?避开这4个最掏钱的布词误区
  • 2026年最新一键生成论文工具全攻略(含免费额度说明)
  • 【Midjourney拟物化风格实战指南】:20年视觉设计专家亲授3大材质渲染公式与5步出图工作流
  • 新人结婚开封汴绣婚庆礼品推荐
  • C语言中的sizeof和strlen
  • 2026年评价高的榨菜芯/去皮榨菜优质厂家推荐榜 - 品牌宣传支持者
  • 【docker镜像加速器配置】
  • Spring AI Alibaba 1.x 系列【55】Interrupts 中断机制:静态中断源码分析
  • 升学赠礼推荐开封汴绣绣品
  • 2026年局域网考试系统选型对比:优考试助力政企信创与内网安全
  • 【RK3588-AI-004】RK3588 AI专属依赖环境预装(Python、OpenCV、基础编译工具)
  • 3分钟掌握gmpublisher:Garry‘s Mod工坊发布的终极解决方案
  • 数分-MySQL基础01
  • Allen-Bradley 280D-F12Z-10B-CR启动控制模块
  • Go语言并发编程:sync包深度解析与实践
  • 升官发财送开封汴绣礼品推荐
  • 【Linux驱动开发】第10天:设备树零基础入门——DTS/DTB/DTC全解+编译流程
  • AI论文软件的实战手册:什么程度算学术不端?
  • Aeneas终极指南:3步搞定音频文本自动对齐,准确率超95%
  • 【Linux驱动开发】第11天:设备树(Device Tree)超详细全解:从诞生背景到工作原理
  • 如何构建更接近真实交通的自动驾驶仿真世界:数字孪生、风险重构与物理感知全栈实践