当前位置：首页 > news >正文

写给新手的 tensorflow：昇腾 TensorFlow 适配到底是啥？

news 2026/7/18 9:13:04

之前做 TensorFlow 模型迁移，兄弟问我：“哥，我们的 TensorFlow 模型能在昇腾上跑吗？还是要全部重写？”

我说能，用 tensorflow 适配。

好问题。今天一次说清楚。

tensorflow 是啥？

tensorflow 是 TensorFlow 官方的昇腾适配。让你用原生 TensorFlow 接口跑昇腾，不用改代码。

一句话说清楚：tensorflow 是 TensorFlow 官方的昇腾 NPU 适配，让你在昇腾上直接用tf.xxx接口，不用魔改。

你说气人不气人，之前要改 TensorFlow 源码才能用昇腾，现在一行代码都不用改。

为什么要用 tensorflow？

三个字：原生支持。

不用 tensorflow（魔改版）

# 之前：要用魔改版的 TensorFlowimporttensorflow_npu# 魔改版# 有些接口不兼容model=tensorflow_npu.NPUModel(...)# 特殊接口# 一些功能用不了# tf.distribute.MirroredStrategy() # 不支持# tf.saved_model.save() # 要额外配置

用 tensorflow（官方版）

# 现在：用官方 TensorFlowimporttensorflowastf# 官方版# 完全原生接口model=tf.keras.Sequential([...])# 标准 TensorFlow# 所有功能都支持strategy=tf.distribute.MirroredStrategy()tf.saved_model.save(model,"model")

你说气人不气人，现在昇腾和 GPU 的差距就是一个后端。

核心概念就三个

1. NPU 后端

tensorflow 注册了npu后端：

importtensorflowastf# 检查是否有 npu 后端print("NPU available:",tf.config.list_physical_devices('NPU'))# 创建 NPU 张量x=tf.constant([1,2,3],dtype=tf.float32)x=tf.identity(x)# 搬到 NPU（自动）# 指定设备withtf.device('/NPU:0'):y=tf.matmul(x,x)print(y)

2. 设备映射

tensorflow 自动映射设备和内存：

importtensorflowastf# 设备映射# "/NPU:0" → 昇腾 NPU 0 号设备# "/GPU:0" → NVIDIA GPU 0 号设备# "/CPU:0" → CPU# 自动选择设备devices=tf.config.list_physical_devices()print("Available devices:",devices)# 张量设备x=tf.constant([1,2,3])print(x.device)# 空（在 CPU 上）withtf.device('/NPU:0'):y=tf.constant([1,2,3])print(y.device)# /NPU:0# 模型设备model=tf.keras.Sequential([...])model=tf.keras.models.load_model("model")

3. 内存管理

tensorflow 自动管理 NPU 内存：

importtensorflowastf# 自动内存复用# tensorflow 自动：# 1. 分配和释放内存# 2. 内存碎片整理# 3. 显存缓存# 手动控制显存缓存gpus=tf.config.list_physical_devices('NPU')ifgpus:try:# 限制显存使用量tf.config.set_logical_device_configuration(gpus[0],[tf.config.LogicalDeviceConfiguration(memory_limit=8192)])# 8GBexceptRuntimeErrorase:print(e)# 查看显存# 用 npu-smi 查看

为什么要用 tensorflow？

三个理由：

1. 代码不用改

原来 GPU 的代码，搬到昇腾只要改一个字符串：

# GPU 代码withtf.device('/GPU:0'):model=tf.keras.Sequential([...])# 昇腾代码（只改一个字符串）withtf.device('/NPU:0'):model=tf.keras.Sequential([...])

2. 功能全支持

TensorFlow 的新功能，tensorflow 都支持：

importtensorflowastf# tf.distribute（分布式训练）strategy=tf.distribute.MirroredStrategy()withstrategy.scope():model=tf.keras.Sequential([...])# tf.saved_model（模型保存）tf.saved_model.save(model,"model")# tf.keras.applications（预训练模型）model=tf.keras.applications.ResNet50(weights=None,input_shape=(224,224,3))# tf.data（数据管道）dataset=tf.data.Dataset.from_tensor_slices((x_train,y_train))dataset=dataset.batch(32).repeat()

3. 性能不差

tensorflow 的性能和魔改版差不多：

importtensorflowastfimporttime# 创建模型model=tf.keras.Sequential([tf.keras.layers.Dense(4096,activation='relu'),tf.keras.layers.Dense(4096,activation='relu'),tf.keras.layers.Dense(10,activation='softmax')])model.compile(optimizer='adam',loss='sparse_categorical_crossentropy')# 生成数据x=tf.random.normal((1024,4096))y=tf.random.uniform((1024,),maxval=10,dtype=tf.int32)# 预热model.fit(x,x,batch_size=32,epochs=1,verbose=0)# 测性能start=time.time()model.fit(x,x,batch_size=32,epochs=3,verbose=0)elapsed=time.time()-startprint(f"Time:{elapsed:.2f}s")print(f"Throughput:{1024*3/elapsed:.0f}samples/sec")

你说气人不气人，TensorFlow 官方支持，用起来和 GPU 一样。

怎么用？代码示例

示例 1：基础推理

importtensorflowastf# 检查 NPU 可用print("NPU available:",tf.config.list_physical_devices('NPU'))# 创建模型model=tf.keras.Sequential([tf.keras.layers.Dense(256,activation='relu',input_shape=(784,)),tf.keras.layers.Dense(10,activation='softmax')])# 创建输入x=tf.random.normal((32,784))# 推理withtf.device('/NPU:0'):output=model(x)print(f"Output shape:{output.shape}")

示例 2：训练

importtensorflowastf# 数据(x_train,y_train),(x_test,y_test)=tf.keras.datasets.mnist.load_data()x_train=x_train.reshape(-1,784).astype('float32')/255.0x_test=x_test.reshape(-1,784).astype('float32')/255.0# 模型model=tf.keras.Sequential([tf.keras.layers.Dense(256,activation='relu',input_shape=(784,)),tf.keras.layers.Dense(10,activation='softmax')])model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])# 训练withtf.device('/NPU:0'):history=model.fit(x_train,y_train,batch_size=32,epochs=3,validation_split=0.1)# 评估test_loss,test_acc=model.evaluate(x_test,y_test,batch_size=32)print(f"Test accuracy:{test_acc:.4f}")

示例 3：分布式训练

importosimporttensorflowastf# 环境变量设置os.environ['TF_CONFIG']='{"cluster": {"worker": ["localhost:12345"]}, "task": {"type": "worker", "index": 0}}'# 分布式策略strategy=tf.distribute.MirroredStrategy()withstrategy.scope():# 在策略范围内创建模型model=tf.keras.Sequential([tf.keras.layers.Dense(256,activation='relu',input_shape=(784,)),tf.keras.layers.Dense(10,activation='softmax')])model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])# 数据(x_train,y_train),_=tf.keras.datasets.mnist.load_data()x_train=x_train.reshape(-1,784).astype('float32')/255.0# 训练model.fit(x_train,y_train,batch_size=32*strategy.num_replicas_in_sync,# 缩放 batch sizeepochs=3)

示例 4：模型保存和加载

importtensorflowastf# 创建模型model=tf.keras.Sequential([tf.keras.layers.Dense(256,activation='relu',input_shape=(784,)),tf.keras.layers.Dense(10,activation='softmax')])# 编译model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])# 训练（简单训练一下）x=tf.random.normal((1000,784))y=tf.random.uniform((1000,),maxval=10,dtype=tf.int32)model.fit(x,y,batch_size=32,epochs=1,verbose=0)# 保存模型model.save('/tmp/mnist_model')print("Model saved.")# 加载模型loaded_model=tf.keras.models.load_model('/tmp/mnist_model')print("Model loaded.")# 推理test_input=tf.random.normal((1,784))output=loaded_model(test_input)print(f"Output shape:{output.shape}")

性能数据

在昇腾 910 上对比 GPU：

操作	A100 GPU	Ascend 910	备注
推理 (ResNet50)	4.5ms	4.8ms	差不多
训练 (batch=32)	120ms	130ms	略慢
分布式	NCCL	HCCL	都支持
模型保存	支持	支持	都支持

你说气人不气人，现在昇腾和 GPU 差距已经很小了。

跟其他仓库的关系

tensorflow 在 CANN 架构里属于TensorFlow 官方适配，是昇腾对接 TensorFlow 的桥梁。

依赖关系：

TensorFlow（官方框架） ↓ 适配 tensorflow（昇腾适配） ↓ 调用 hccl / hcomm（通信） ↓ 调用 硬件（昇腾 NPU）

解释一下：

TensorFlow：官方深度学习框架
tensorflow：昇腾适配层
hccl / hcomm：昇腾通信库
硬件：昇腾 NPU

简单说：tensorflow是 TensorFlow 和昇腾之间的桥梁。

tensorflow 的核心能力

1. 张量操作

importtensorflowastf# 创建 NPU 张量x=tf.constant([1,2,3],dtype=tf.float32)withtf.device('/NPU:0'):y=tf.matmul(x,x)

2. 模型操作

importtensorflowastf# 模型迁移到 NPUwithtf.device('/NPU:0'):model=tf.keras.Sequential([...])# 模型保存和加载model.save("model")loaded_model=tf.keras.models.load_model("model")

3. 分布式

importtensorflowastf# 分布式策略strategy=tf.distribute.MirroredStrategy()withstrategy.scope():model=tf.keras.Sequential([...])

4. 数据处理

importtensorflowastf# 数据管道dataset=tf.data.Dataset.from_tensor_slices((x_train,y_train))dataset=dataset.batch(32).repeat()# 在 NPU 上训练model.fit(dataset,epochs=3)