当前位置: 首页 > news >正文

CANN 容器化部署:Docker 与 K8s 实战


一、为什么需要容器化

1.1 容器化优势

裸机部署: 环境依赖复杂,版本冲突 扩缩容困难,手动运维 资源隔离差,互相影响 容器化部署: 环境一致性,开箱即用 弹性扩缩容,自动运维 资源隔离,互不影响 版本管理,灰度发布

1.2 CANN 容器架构

┌──────────────────────────────────────┐ │ Kubernetes 集群 │ ├──────────────────────────────────────┤ │ ┌─────────────┐ ┌─────────────┐ │ │ │ Pod 0 │ │ Pod 1 │ │ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ │ │ │推理服务 │ │ │ │推理服务 │ │ │ │ │ └─────────┘ │ │ └─────────┘ │ │ │ │ /dev/davinci0│ │ /dev/davinci1│ │ │ └─────────────┘ └─────────────┘ │ ├──────────────────────────────────────┤ │ 宿主机 (Host OS) │ │ ┌──────────────────────────┐ │ │ │ CANN 驱动 + Docker NPU │ │ │ └──────────────────────────┘ │ └──────────────────────────────────────┘

二、Docker 环境搭建

2.1 基础镜像

# Dockerfile.cann-base FROM ubuntu:20.04 # 安装基础依赖 RUN apt-get update && apt-get install -y \ build-essential \ cmake \ git \ wget \ && rm -rf /var/lib/apt/lists/* # 安装 CANN COPY Ascend-cann-toolkit_8.2.RC1_linux-aarch64.run /tmp/ RUN /tmp/Ascend-cann-toolkit_8.2.RC1_linux-aarch64.run --install --quiet \ && rm /tmp/Ascend-cann-toolkit_8.2.RC1_linux-aarch64.run # 设置环境变量 ENV ASCEND_HOME=/usr/local/Ascend ENV PATH=${ASCEND_HOME}/ascend-toolkit/bin:${ASCEND_HOME}/nnae/bin:${PATH} ENV LD_LIBRARY_PATH=${ASCEND_HOME}/ascend-toolkit/lib64:${ASCEND_HOME}/driver/lib64:${LD_LIBRARY_PATH} # 验证安装 RUN npu-smi info WORKDIR /workspace

2.2 构建推理镜像

# Dockerfile.inference FROM cann-base:latest # 安装 Python RUN apt-get update && apt-get install -y python3 python3-pip && \ pip3 install --upgrade pip # 安装依赖 COPY requirements.txt /tmp/ RUN pip3 install -r /tmp/requirements.txt # 复制模型和代码 COPY model/ /workspace/model/ COPY src/ /workspace/src/ # 暴露端口 EXPOSE 50051 # 启动命令 CMD ["python3", "/workspace/src/server.py", "--model", "/workspace/model/model.om"]

2.3 构建与运行

# 构建基础镜像dockerbuild-fDockerfile.cann-base-tcann-base:latest.# 构建推理镜像dockerbuild-fDockerfile.inference-tinference-server:latest.# 运行容器 (需要 NPU 设备)dockerrun-d\--nameinference\--device/dev/davinci0\--device/dev/davinci_manager\--device/dev/devmm_svm\--device/dev/hisi_hdc\-v/usr/local/Ascend:/usr/local/Ascend\-p50051:50051\inference-server:latest

三、NPU 设备透传

3.1 设备挂载参数

# 必须挂载的设备dockerrun-d\--device/dev/davinci0\# NPU 设备节点--device/dev/davinci_manager\# NPU 管理器--device/dev/devmm_svm\# 设备内存管理--device/dev/hisi_hdc\# HDC 通信-v/usr/local/Ascend:/usr/local/Ascend\# CANN 驱动inference-server:latest# 挂载所有 NPU (多卡)dockerrun-d\--device/dev/davinci0\--device/dev/davinci1\--device/dev/davinci2\--device/dev/davinci3\--device/dev/davinci_manager\--device/dev/devmm_svm\--device/dev/hisi_hdc\-v/usr/local/Ascend:/usr/local/Ascend\inference-server:latest

3.2 Docker Compose 配置

# docker-compose.ymlversion:'3.8'services:inference:build:context:.dockerfile:Dockerfile.inferencecontainer_name:inference-serverrestart:unless-stoppedports:-"50051:50051"devices:-/dev/davinci0:/dev/davinci0-/dev/davinci_manager:/dev/davinci_manager-/dev/devmm_svm:/dev/devmm_svm-/dev/hisi_hdc:/dev/hisi_hdcvolumes:-/usr/local/Ascend:/usr/local/Ascend-./model:/workspace/modelenvironment:-ASCEND_HOME=/usr/local/Ascend-PYTHONUNBUFFERED=1deploy:resources:limits:memory:16Greservations:memory:8Gnginx:image:nginx:latestports:-"80:80"volumes:-./nginx.conf:/etc/nginx/nginx.confdepends_on:-inference

四、Kubernetes 部署

4.1 NPU Device Plugin

# npu-device-plugin.ymlapiVersion:apps/v1kind:DaemonSetmetadata:name:ascend-npu-device-pluginnamespace:kube-systemspec:selector:matchLabels:name:ascend-npu-device-plugintemplate:metadata:labels:name:ascend-npu-device-pluginspec:tolerations:-key:CriticalAddonsOnlyoperator:ExistspriorityClassName:system-node-criticalcontainers:-name:npu-device-pluginimage:ascend-k8sdeviceplugin/amd64-npu-plugin:latestimagePullPolicy:IfNotPresentsecurityContext:privileged:truevolumeMounts:-name:device-pluginmountPath:/var/lib/kubelet/device-plugins-name:davincimountPath:/dev/davincivolumes:-name:device-pluginhostPath:path:/var/lib/kubelet/device-plugins-name:davincihostPath:path:/dev/davinci

4.2 推理服务 Deployment

# inference-deployment.ymlapiVersion:apps/v1kind:Deploymentmetadata:name:inference-servernamespace:defaultspec:replicas:3selector:matchLabels:app:inference-servertemplate:metadata:labels:app:inference-serverspec:containers:-name:inferenceimage:inference-server:latestimagePullPolicy:IfNotPresentports:-containerPort:50051name:grpcresources:limits:huawei.com/npu:1# 请求 1 张 NPUrequests:huawei.com/npu:1volumeMounts:-name:model-volumemountPath:/workspace/modelreadinessProbe:grpc:port:50051initialDelaySeconds:10periodSeconds:5livenessProbe:grpc:port:50051initialDelaySeconds:15periodSeconds:10volumes:-name:model-volumepersistentVolumeClaim:claimName:model-pvc---apiVersion:v1kind:Servicemetadata:name:inference-servicenamespace:defaultspec:selector:app:inference-serverports:-name:grpcport:50051targetPort:50051type:ClusterIP

4.3 自动扩缩容

# hpa.ymlapiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:inference-hpanamespace:defaultspec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:inference-serverminReplicas:2maxReplicas:10metrics:-type:Resourceresource:name:cputarget:type:UtilizationaverageUtilization:70-type:Podspods:metric:name:inference_queue_sizetarget:type:AverageValueaverageValue:"10"behavior:scaleUp:stabilizationWindowSeconds:60policies:-type:Podsvalue:2periodSeconds:60scaleDown:stabilizationWindowSeconds:300policies:-type:Podsvalue:1periodSeconds:120

五、资源限制与隔离

5.1 NPU 资源配额

# resource-quota.ymlapiVersion:v1kind:ResourceQuotametadata:name:npu-quotanamespace:inferencespec:hard:requests.huawei.com/npu:"8"limits.huawei.com/npu:"8"requests.cpu:"32"limits.cpu:"64"requests.memory:"128Gi"limits.memory:"256Gi"

5.2 Pod 资源限制

# pod-with-limits.ymlapiVersion:v1kind:Podmetadata:name:inference-podspec:containers:-name:inferenceimage:inference-server:latestresources:limits:huawei.com/npu:1cpu:"8"memory:"32Gi"requests:huawei.com/npu:1cpu:"4"memory:"16Gi"

六、监控与日志

6.1 Prometheus 指标暴露

fromprometheus_clientimportCounter,Histogram,Gauge,start_http_server# 定义指标INFERENCE_REQUESTS=Counter('inference_requests_total','Total inference requests',['model_name','status'])INFERENCE_LATENCY=Histogram('inference_latency_seconds','Inference latency',['model_name'],buckets=[0.01,0.05,0.1,0.5,1.0,2.0,5.0])NPU_MEMORY_USAGE=Gauge('npu_memory_usage_bytes','NPU memory usage',['device'])# 在推理中记录指标defpredict_with_metrics(model_name,input_data):start=time.time()try:output=server.Inference(input_data)latency=time.time()-start INFERENCE_REQUESTS.labels(model_name=model_name,status='success').inc()INFERENCE_LATENCY.labels(model_name=model_name).observe(latency)returnoutputexceptExceptionase:INFERENCE_REQUESTS.labels(model_name=model_name,status='error').inc()raise# 启动指标服务器start_http_server(8000)

6.2 Grafana 看板

{"dashboard":{"title":"CANN Inference Dashboard","panels":[{"title":"QPS","type":"graph","targets":[{"expr":"rate(inference_requests_total[5m])","legendFormat":"{{model_name}}"}]},{"title":"P99 Latency","type":"graph","targets":[{"expr":"histogram_quantile(0.99, rate(inference_latency_seconds_bucket[5m]))","legendFormat":"{{model_name}}"}]},{"title":"NPU Memory","type":"graph","targets":[{"expr":"npu_memory_usage_bytes","legendFormat":"{{device}}"}]}]}}

七、常见问题

问题原因解决方案
容器内 NPU 不可用设备未挂载添加 --device 参数
NPU 驱动版本不匹配镜像与宿主机驱动版本不一致使用相同版本的 CANN
推理性能下降容器资源限制太严增加 CPU/内存限制
Pod 无法调度NPU 资源不足扩容集群或减少副本数
OOM Killed显存/内存超限增加资源限制或优化模型

相关仓库

  • ascend-docker- 昇腾 Docker 工具 https://gitee.com/ascend/ascend-docker
  • k8s-device-plugin- K8s NPU 插件 https://gitee.com/ascend/k8s-device-plugin
  • ascend-operator- K8s Ascend Operator https://gitee.com/ascend/ascend-operator
http://www.jsqmd.com/news/871368/

相关文章:

  • 2寸证件照怎么免费制作?2026实测证件照制作软件推荐 - 软件小管家
  • 全新向日葵16.5首发!AI自动化跨平台远控,安全守护更无感!
  • 2026国产在线PH分析仪十大品牌排行榜丨市政污水与工业水处理实测选型指南 - 仪表品牌榜
  • GitHub导航菜单全解析:功能、方案、资源及yt - dlp对Bun支持调整
  • 襄阳政企数据治理实践:基于AI数据清洗实现审批提效60%+、合规零风险
  • 如何在Windows电脑上轻松运行安卓应用?5个实用技巧让你告别模拟器
  • 阅读APP书源终极指南:3分钟快速解决书源失效问题
  • 如何3步搭建FPS游戏AI瞄准系统:基于YOLOv10的完整实战指南
  • 深圳欧米茄保养哪家专业又省心?亨得利技师硬核拆解:揭秘原厂级养护与普通路边店的致命区别,让你的海马重获新生 - 亨得利官方维修中心
  • 2026最新浩卡联盟推荐邀请码怎么填?用户口碑好的号卡代理平台测评 - 博客万
  • CompreFace人脸识别模型选型实战指南:5步搞定最佳AI模型部署
  • 回收盒马鲜生卡的最佳回收方式:分享实用心得和技巧 - 团团收购物卡回收
  • Topit:macOS窗口置顶终极指南,轻松实现多窗口高效协同
  • 大麦抢票终极指南:告别手速焦虑,轻松锁定心仪演出门票
  • ComfyUI_TTP_Toolset:突破显存限制的AI图像分块处理技术方案
  • 观察Taotoken平台在高峰时段的API可用性与自动路由效果
  • RK3288嵌入式开发实战指南:从核心优势到工业应用方案
  • 告别手动抢购:i茅台智能预约系统如何实现7x24小时自动预约
  • 在Taotoken模型广场中根据任务需求挑选合适模型的决策过程
  • 掌握AI写教材方法,低查重工具让教材编写变得如此简单!
  • 终极指南:如何在3DS上原生运行GBA游戏,告别模拟器卡顿
  • 2026郑州名表回收推荐,添价收名表回收同城甄选行业TOP1 - 薛定谔的梨花猫
  • 碳化硅半导体:新能源汽车驱动下的第三代半导体技术解析
  • 证件照怎么用手机换底色?2026年手机更换证件照底色APP推荐及方法大全 - 软件小管家
  • 智能小车设计全攻略:从硬件电路到PID算法的嵌入式系统实战
  • Feishin:现代自托管音乐播放器的技术架构与用户体验深度解析
  • 常州黄金回收全攻略 2026年锁定靠谱机构 福运来领衔测评 - 黄金回收
  • 洛雪音乐音源终极指南:如何快速获取全网无损音乐资源
  • 港澳通行证照片怎么拍?2026 手机拍摄规格要求与方法详解 - 软件小管家
  • 工业机器视觉工控机选型指南:从硬件配置到现场调试