当前位置: 首页 > news >正文

120. Kubernetes 版本/配置更新的排水和节点池配置更新的draining差异

Situation 地理位置

During a cluster maintenance operation that involves both a Kubernetes version or configuration update and a change to the node pool configuration (e.g., updating the OS image), nodes are observed to get stuck in a "Deleting" state.
在涉及 Kubernetes 版本或配置更新以及节点池配置变更(例如更新操作系统镜像)的集群维护操作中,节点会被观察到卡在“删除”状态。

The symptoms include: 症状包括:

  • Pods running on these nodes are not automatically evicted.
    运行在这些节点上的 Pod 不会自动被淘汰。

  • TheupgradeStrategydrain options defined in the cluster's specification appear to have no effect on this behaviour.
    集群规范中定义的upgradeStrategydrain 选项似乎对此行为没有影响。

This creates confusion as to why the configured drain process in the upgradeStrategy does not occur.
这导致了为什么升级策略中配置的排水过程没有发生的困惑。

Resolution 结局

To ensure nodes are drained correctly during all maintenance operations, you must configure the appropriate mechanism for both processes, the Node Pool update, as well as the Kubernetes version/configuration update. These are two separate processes:
为了确保在所有维护操作中节点正确排空,您必须为两个进程配置相应机制,分别是节点池更新以及 Kubernetes 版本/配置更新。这有两个独立的过程:

  • Node Pool updates: In your cluster's provisioning resource (cluster.provisioning.cattle.io), ensure every machine pool has thedrainBeforeDeleteflag set totrue. This option is exposed under theShowAdvancedsection of the Machine Pools configuration, in theEdit Configview for a cluster. While this option is enabled by default when configuring a cluster via the Rancher UI, it is important to specify it explicitly when using external automated tools such as GitOps.
    节点池更新:在你集群的配置资源(cluster.provisioning.cattle.io),确保每个机器池的drainBeforeDelete标志都设置为true。该选项可在集群的“编辑配置”视图中,机池配置的“显示高级”部分中展示。虽然在通过 Rancher UI 配置集群时默认启用了该选项,但在使用外部自动化工具如 GitOps 时,明确指定该选项非常重要。

  • Kubernetes version/configuration updates: Configure your desired drain behaviour within theupgradeStrategysection of the cluster spec. This will be respected during an in-place Kubernetes version or configuration update. This option is exposed in theUpdate Strategytab of theCluster Configurationsection, in theEdit Configview for a cluster.
    Kubernetes 版本/配置更新:在集群规范的upgradeStrategy部分配置你想要的排水行为。在原地 Kubernetes 版本或配置更新时,这一点会被尊重。该选项在集群配置部分的“更新策略”标签页中,在集群的“编辑配置”视图中展示。

Most importantly, you should not perform a Kubernetes version/configuration update and a node pool update at the same time.Trigger these maintenance tasks in separate steps. For example:
最重要的是,你不应该同时进行 Kubernetes 版本/配置更新和节点池更新。分步骤触发这些维护任务。例如:

  • First, apply the changes to your node pool configuration and wait for all nodes to be replaced successfully.
    首先,将更改应用到你的节点池配置中,等待所有节点都被成功替换。

  • Then, apply the change for the Kubernetes version/configuration update.
    然后,应用 Kubernetes 版本/配置更新的更改。

Applying a Kubernetes version/configuration update and a node pool template change at the same time will trigger two parallel, competing processes. This is inefficient (a node's Kubernetes version might be upgraded in-place only for the node to be immediately deleted due to the node pool configuration update) and makes it extremely difficult to troubleshoot if an issue occurs. Always perform these operations separately.
同时应用 Kubernetes 版本/配置更新和节点池模板更改会触发两个并行竞争进程。这效率低下(节点的 Kubernetes 版本可能在原地升级,但由于节点池配置更新,节点会立即被删除),并且一旦出现问题,排查起来极其困难。这些操作一定要单独执行。

Cause 病因

It is important to understand that the replacement of nodes within a Node Pool, due to an update to the node pool configuration, and the in-place upgrade of Kubernetes version/configuration are two separate processes, managed by separate controllers within Rancher.
需要理解的是,节点池内节点因节点池配置更新而更换节点,以及 Kubernetes 版本/配置的原地升级,是两个独立进程,由 Rancher 内的不同控制器管理。

  • Node Pool update draining: This process is controlled by the Kubernetes Cluster API (CAPI). It is triggered when the configuration template for a node pool is changed. This processreplaces old nodes with new ones.
    节点池更新清理:该过程由 Kubernetes 集群 API(CAPI)控制。当节点池的配置模板发生变化时,它会被触发。该过程用新节点替换旧节点

    • In this case, the draining behaviour would be managed by the "drainBeforeDelete: true"flag in themachinePoolsspecification. If this flag isfalseor absent, CAPI will not drain the node before deleting it, leading to the stuck pods and "Deleting" state.
      在这种情况下,排水行为将由machinePools规范中的“drainBeforeDelete: true”标志来管理。如果该标志为假或缺失,CAPI 不会在删除节点前耗尽该节点,导致 pods 卡住并处于“删除”状态。

  • Kubernetes version/configuration update draining: This process is controlled by Rancher's upgrade controller. It reads theupgradeStrategysection in the cluster's specification. Its purpose is to manage the draining of nodes for an in-place update (e.g., upgrading the RKE2 version). The node itself isnot replaced, so the underlying machine persists. It is simply cordoned, drained, updated, and un-cordoned.
    Kubernetes 版本/配置更新耗尽:该过程由 Rancher 的升级控制器控制。它读取集群规范中的upgradeStrategy部分。其目的是管理节点的耗尽,用于原地更新(例如升级 RKE2 版本)。节点本身没有被替换,因此底层机器得以延续。它只是被封锁、排水、更新和解除封锁。

The previously described problem in the situation section occurs because the node replacement is being triggered by the Node Pool update, but the system is missing the "drainBeforeDelete: true"flag.
之前情况部分提到的问题是节点替换是由节点池更新触发的,但系统缺少“drainBeforeDelete: true”标志。

Additional Information 附加信息
Environment 环境

A Rancher-provisioned RKE2 or K3s node driver cluster
Rancher 配置的 RKE2 或 K3s 节点驱动集群

访问Rancher-K8S解决方案博主,企业合作伙伴 :
https://blog.csdn.net/lidw2009

http://www.jsqmd.com/news/582566/

相关文章:

  • 运维养龙虾--腾讯云 CloudQ 上线:把企业云上治理,装进你每天都在用的聊天框
  • 3分钟掌握ppInk:Windows上最高效的屏幕标注工具完全指南
  • 购物中心Wi-Fi与有线网如何共存?基于eNSP的MSTP+VRRP高可用网络设计与避坑指南
  • 2026最新氮气罐供应商推荐!东北/吉林/长春优质服务商权威榜单发布 - 十大品牌榜
  • Claude Code 官方回应代码泄漏:这次,他们没有“甩锅人”
  • 实时数据仓库:实时ETL实现原理与主流技术方案全解析
  • 新能源车比亚迪唐L(DM-i/DM-P)给燃油车搭电实操
  • 中关村论坛 | 清微智能以原创算力,书写科产融合创新答卷
  • AI Agent开发必看:从LLM到Sub-agents,这些核心概念你真的理解了吗?
  • 你的Bootloader安全吗?给STM32F103的Ymodem升级加上AES加密和CRC32校验(附完整代码)
  • 2026最新贵州旅游推荐!安顺优质风景区/度假村权威榜单发布,助力游客规划舒心旅程 - 十大品牌榜
  • 计算机毕业设计:Python中国地铁网络智能分析系统 Flask框架 数据分析 可视化 高德地图 数据挖掘 机器学习 爬虫(建议收藏)✅
  • Windows Defender优化工具:提升系统性能的安全配置方案
  • 2026最新压力管道推荐!东北长春优质服务商权威榜单 - 十大品牌榜
  • 把 Claude Code 变成你的桌面宠物,这个开源项目好有创意啊。
  • 苏州日料店周末有什么优惠?火地铁板烧口令福利解锁舌尖惊喜 - 资讯焦点
  • Trae国内版初体验:用豆包大模型写Python爬虫,比Copilot香吗?
  • 小白必看!工业照明定制化爆发,别再乱选灯
  • 2026最新压力容器推荐!东北/吉林/长春优质压力容器权威榜单发布 - 十大品牌榜
  • 多目标跟踪算法实战:从DeepSORT到Chained-Tracker的避坑指南
  • 数据仓库实战:多维度数据建模全流程与落地方法
  • 从GPT-3到ChatGPT:少样本学习的演进之路,给开发者的启示与避坑指南
  • 保姆级教程:在Linux上用Flume 1.7.0 + Spark 2.4.7搭建实时日志流处理管道
  • 221. Angular deprecation 或 Panel 插件在 Rancher-monitoring 105.1.0+up61.3.2 - 106.0.2+up66.7.1 中没有面板组件错
  • 用STC32G的HSPWM做个数控电源:从BUCK电路到PID调参,我的DIY踩坑全记录
  • 如何快速打造你的家庭影院?开源IPTV播放器IPTVnator终极指南
  • 效率提升:告别卡顿,用快马生成win11右键菜单高效定制工具
  • AppImageLauncher:Linux系统AppImage应用管理的全方位解决方案
  • Codesys软运动控制进阶:用SMC_FreeEncoder为ECAT轴搭建一个“虚拟手轮”调试工具
  • 国有企业如何推动内部科技创新?