Deployment滚动更新与回滚完全指南
Deployment滚动更新与回滚完全指南
前言
在Kubernetes生产环境中,应用的持续交付是核心需求之一。Deployment作为Kubernetes最常用的 workload 资源,提供了强大的滚动更新和回滚能力。本文将全面讲解Deployment的滚动更新机制、回滚策略以及最佳实践。
一、Deployment核心概念
1.1 Deployment的作用
Deployment是一个声明式的资源,用于管理Pod和ReplicaSet,支持以下核心功能:
- 声明期望状态:定义应用期望的副本数、镜像版本等
- 滚动更新:平滑升级应用版本
- 回滚:出现问题时快速回退到历史版本
- 扩缩容:动态调整应用实例数
- 暂停/恢复:支持分阶段部署
1.2 Deployment与ReplicaSet的关系
Deployment → ReplicaSet → PodDeployment通过管理ReplicaSet来实现Pod的版本管理,每次更新都会创建一个新的ReplicaSet。
二、滚动更新机制详解
2.1 滚动更新原理
滚动更新通过逐步替换旧版Pod来实现零停机部署。
apiVersion: apps/v1 kind: Deployment metadata: name: my-app-deployment labels: app: my-app spec: replicas: 10 strategy: type: RollingUpdate rollingUpdate: maxSurge: 3 # 最多超出期望副本数 maxUnavailable: 2 # 最多不可用副本数 selector: matchLabels: app: my-app template: metadata: labels: app: my-app spec: containers: - name: my-app image: myapp:v2 ports: - containerPort: 8080 resources: requests: cpu: 100m memory: 128Mi limits: cpu: 200m memory: 256Mi2.2 滚动更新参数详解
# 关键参数说明: # maxSurge: 25% - 在滚动更新过程中,最多可以创建的额外Pod数量 # maxUnavailable: 0 - 在滚动更新过程中,最多可以不可用的Pod数量 # # 场景分析: # 1. maxSurge=25%, maxUnavailable=0: 先创建新Pod,再删除旧Pod(保守策略) # 2. maxSurge=0, maxUnavailable=25%: 先删除旧Pod,再创建新Pod(激进策略) # 3. maxSurge=1, maxUnavailable=1: 混合策略,平衡速度和资源2.3 滚动更新流程
滚动更新的完整流程如下:
# 示例:从v1升级到v2 # 初始状态: 10个 v1 Pod # 滚动更新过程: # Step 1: maxSurge=1, maxUnavailable=0 # - 创建1个 v2 Pod (总数: 10 v1 + 1 v2 = 11) # - 等待新Pod就绪 # Step 2: 继续滚动 # - 停止1个 v1 Pod (总数: 9 v1 + 2 v2 = 11) # - 创建1个 v2 Pod (总数: 9 v1 + 3 v2 = 12) # - 等待新Pod就绪 # ... # 最终状态: 10个 v2 Pod三、金丝雀部署策略
3.1 基于ReplicaSet的金丝雀部署
# 1. 创建生产环境Deployment (90% 流量) apiVersion: apps/v1 kind: Deployment metadata: name: my-app-production labels: app: my-app track: production spec: replicas: 9 strategy: type: RollingUpdate selector: matchLabels: app: my-app track: production template: metadata: labels: app: my-app track: production spec: containers: - name: my-app image: myapp:v1 ports: - containerPort: 8080 --- # 2. 创建金丝雀Deployment (10% 流量) apiVersion: apps/v1 kind: Deployment metadata: name: my-app-canary labels: app: my-app track: canary spec: replicas: 1 strategy: type: RollingUpdate selector: matchLabels: app: my-app track: canary template: metadata: labels: app: my-app track: canary spec: containers: - name: my-app image: myapp:v2 ports: - containerPort: 80803.2 使用Service权重分配流量
apiVersion: v1 kind: Service metadata: name: my-app-service spec: selector: app: my-app ports: - port: 80 targetPort: 80803.3 自动金丝雀分析
apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: my-app-rollout spec: replicas: 10 strategy: canary: steps: - setWeight: 5 - pause: {duration: 10m} - analysis: templates: - templateName: success-rate - templateName: latency - setWeight: 20 - pause: {duration: 20m} - setWeight: 50 - pause: {} canaryMetadata: labels: role: canary stableMetadata: labels: role: stable selector: matchLabels: app: my-app template: metadata: labels: app: my-app spec: containers: - name: my-app image: myapp:v2 ports: - containerPort: 8080四、回滚操作详解
4.1 查看历史版本
# 查看Deployment历史记录 kubectl rollout history deployment/my-app-deployment # 查看特定版本的详细信息 kubectl rollout history deployment/my-app-deployment --revision=3 # 输出示例: # deployments "my-app-deployment" # REVISION CHANGE-CAUSE # 1 kubectl apply --filename=deployment.yaml --record=true # 2 kubectl set image deployment/my-app-deployment my-app=myapp:v1 # 3 kubectl set image deployment/my-app-deployment my-app=myapp:v24.2 回滚到上一版本
# 回滚到上一个版本 kubectl rollout undo deployment/my-app-deployment # 回滚到指定版本 kubectl rollout undo deployment/my-app-deployment --to-revision=2 # 查看回滚状态 kubectl rollout status deployment/my-app-deployment4.3 回滚配置示例
apiVersion: apps/v1 kind: Deployment metadata: name: my-app-deployment annotations: kubernetes.io/change-cause: "Update image to v2.0.0" spec: replicas: 5 revisionHistoryLimit: 10 # 保留的历史版本数量 selector: matchLabels: app: my-app template: metadata: labels: app: my-app spec: containers: - name: my-app image: myapp:v2 ports: - containerPort: 8080五、最佳实践
5.1 合理的副本数配置
apiVersion: apps/v1 kind: Deployment metadata: name: production-deployment spec: replicas: 3 # 生产环境至少3个副本 minReadySeconds: 30 # 新Pod启动后等待时间 progressDeadlineSeconds: 600 # 部署超时时间 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 05.2 健康检查配置
spec: template: spec: containers: - name: app image: myapp:v2 ports: - containerPort: 8080 readinessProbe: httpGet: path: /actuator/health/readiness port: 8080 initialDelaySeconds: 10 periodSeconds: 5 successThreshold: 1 failureThreshold: 3 livenessProbe: httpGet: path: /actuator/health/liveness port: 8080 initialDelaySeconds: 60 periodSeconds: 10 failureThreshold: 35.3 滚动更新的监控
apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config data: prometheus.yml: | global: scrape_interval: 15s scrape_configs: - job_name: 'kubernetes-deployments' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_label_app] action: keep regex: my-app - source_labels: [__meta_kubernetes_pod_label_deployment] action: replace target_label: deployment5.4 蓝绿部署策略
# Blue Deployment apiVersion: apps/v1 kind: Deployment metadata: name: my-app-blue labels: app: my-app version: blue spec: replicas: 5 selector: matchLabels: app: my-app version: blue template: metadata: labels: app: my-app version: blue spec: containers: - name: my-app image: myapp:v1 --- # Green Deployment apiVersion: apps/v1 kind: Deployment metadata: name: my-app-green labels: app: my-app version: green spec: replicas: 0 # 初始为0,切换时改为5 selector: matchLabels: app: my-app version: green template: metadata: labels: app: my-app version: green spec: containers: - name: my-app image: myapp:v2 --- # Service切换 apiVersion: v1 kind: Service metadata: name: my-app-service spec: selector: app: my-app version: blue # 切换时改为 green ports: - port: 80 targetPort: 8080六、常见问题解决方案
6.1 滚动更新卡住
# 检查Deployment状态 kubectl describe deployment my-app-deployment # 检查ReplicaSet kubectl get rs -l app=my-app # 检查Pod状态 kubectl get pods -l app=my-app -w # 如果需要强制完成更新 kubectl rollout undo deployment/my-app-deployment6.2 镜像拉取失败
spec: template: spec: imagePullSecrets: - name: my-registry-secret containers: - name: app image: registry.example.com/myapp:v1 imagePullPolicy: IfNotPresent6.3 资源不足导致调度失败
# 检查Node资源 kubectl describe node <node-name> | grep -A 5 "Allocated resources" # 调整资源请求 kubectl set resources deployment my-app-deployment -c=app --requests=cpu=100m,memory=128Mi6.4 版本不兼容问题
# 使用就绪探测器防止流量到不兼容版本 spec: strategy: type: RollingUpdate rollingUpdate: maxSurge: 0 maxUnavailable: 1 # 每次只替换一个Pod七、自动化部署实践
7.1 ArgoCD配置
apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: my-app namespace: argocd spec: project: default source: repoURL: 'https://github.com/myorg/my-app.git' targetRevision: HEAD path: k8s destination: server: 'https://kubernetes.default.svc' namespace: production syncPolicy: automated: prune: true selfHeal: true syncOptions: - CreateNamespace=true7.2 部署脚本示例
#!/bin/bash # deploy.sh - 自动化部署脚本 DEPLOYMENT_NAME="my-app" NAMESPACE="production" NEW_VERSION=${1:-latest} echo "Starting deployment of ${DEPLOYMENT_NAME}:${NEW_VERSION}" # 设置镜像 kubectl set image deployment/${DEPLOYMENT_NAME} \ ${DEPLOYMENT_NAME}=myrepo/${DEPLOYMENT_NAME}:${NEW_VERSION} \ -n ${NAMESPACE} # 等待部署完成 kubectl rollout status deployment/${DEPLOYMENT_NAME} -n ${NAMESPACE} --timeout=300s # 验证部署 kubectl get pods -n ${NAMESPACE} -l app=${DEPLOYMENT_NAME} echo "Deployment completed successfully!"总结
Deployment的滚动更新和回滚是Kubernetes应用管理的核心能力。通过合理配置滚动更新参数、实施金丝雀部署策略、完善的监控告警机制,可以实现安全、可靠的应用发布流程。结合ArgoCD等GitOps工具,可以进一步提升部署的自动化水平和可靠性。
