当前位置：首页 > news >正文

CANN/asnumpy 基准测试

news 2026/7/10 9:19:06

Benchmarks

【免费下载链接】asnumpy哈尔滨工业大学计算学部苏统华、王甜甜老师团队联合华为CANN团队开发的华为昇腾NPU原生Numpy仓库项目地址: https://gitcode.com/cann/asnumpy

Back to README | Reproduce:examples/03_multiply.py

This document contains the full performance benchmark comparing AsNumpy (NPU) against NumPy (CPU) on themultiply()operation.

Test Environment

Item	AsNumpy (NPU)	NumPy (CPU)
Processor	Ascend 910B NPU	Server CPU (AArch64) on the same machine
NPU Runtime	CANN 8.2.RC1.alpha003	—
Python	Python 3.9+
Library version	AsNumpy 0.2.0	NumPy 1.26+
Data type	float32
Operation	multiply() — element-wise multiplication
Timer	time.perf_counter() (high-resolution)

Controlled Variables

Both sides useidentical input data: arrays are generated by NumPy and transferred to NPU viafrom_numpy()before timing starts.
Data transfer time is excluded: only themultiply()computation is timed.
Results are single-run wall-clock times (no warmup, no averaging).

Results

Shape	AsNumpy (NPU)	NumPy (CPU)	Speedup
(500, 500)	1.9355 s	0.1708 s	0.09×
(1000, 1000)	0.0692 s	0.7029 s	10.16×
(2000, 2000)	0.1033 s	3.8387 s	37.17×
(3000, 3000)	0.1115 s	14.3567 s	128.70×

Key observation:For small tensors (500×500), NPU launch overhead dominates and CPU is faster. As tensor size grows, NPU's massive parallelism takes over — reaching128.70× speedupat 3000×3000.

Reproducing the Results

Run the benchmark script from the project root:

python examples/03_multiply.py

The script tests all four shapes with 50 iterations each, reports average and minimum times, and verifies numerical correctness against NumPy (relative diff < 1e-4).

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

查看全文

http://www.jsqmd.com/news/783136/