当前位置: 首页 > news >正文

78. RKE2 集群配置失败,由于无法解析 localhost,导致 kube-apiserver 健康检查失败

Environment 环境
  • Rancher v2.6+ 牧场主 v2.6+
  • A Rancher-provisioned RKE2 cluster
    一个由牧场者配置的 RKE2 集群
Situation 地理位置

There are a high number of restarts for cluster component Pods in the affected downstream RKE2 cluster:
受影响的下游 RKE2 集群中,集群组件 Pod 的重启次数较多:

NAMESPACE NAME READY STATUS RESTARTS cattle-fleet-system fleet-agent-cc8c97f97-bvx78 1/1 Running 185 cattle-system cattle-cluster-agent-b1460cbd-8ct5c 1/1 Running 115 cattle-system cattle-cluster-agent-b1460cbd-l2l8l 1/1 Running 168 kube-system kube-apiserver-cluster-suse-cp-f777105c-2qgvh 0/1 Running 314 kube-system kube-controller-manager-cluster-suse-cp-5c-2qgvh 1/1 Running 491 kube-system cloud-controller-manager-cluster-suse-cp-5c-2qgvh 1/1 Running 501

The kube-apiserver Pod flaps between a ready and not ready status:
kube-apiserver Pod 在准备状态和未准备好状态之间摇摆:

NAMESPACE NAME READY STATUS RESTARTS kube-system kube-apiserver-cluster-suse-cp-f777105c-2qgvh 0/1 Running 314

The kubelet logs register failing probes against the kube-apiserver.
kubelet 日志会对 kube-apiserver 进行检测失败。

Resolution 结局
  1. Enable kubelet debug logging
    启用 kubelet 调试日志
    1. Navigate toCluster Management
      导航至集群管理
    2. ClickEdit Configfor the affected downstream RKE2 cluster
      点击“编辑配置”以查看受影响的下游 RKE2 集群
    3. Click theAdvancedtab in theCluster Configurationform
      点击集群配置表单中的高级标签
    4. UnderAdditional Kubelet ArgsclickAdd Global Argument
      “额外 Kubelet Args”下点击添加全局参数
    5. In the new argument field enter v=9
      在新的参数字段中,输入 v=9
    6. ClickSave点击保存
  2. Replicate the liveness probe and check the kubelet logs
    复制活性探针并检查 kubelet 日志
    1. Open an SSH session to a master node in the affected RKE2 downstream cluster
      在受影响的 RKE2 下游集群中,向主节点开启 SSH 会话
    2. Check the kubelet log (tail -f /var/lib/rancher/rke2/agent/logs/kubelet.log | grep kube-apiserver) for failing kube-apiserver liveness probes
      检查 kubelet 日志(tail -f /var/lib/rancher/rke2/agent/logs/kubelet.log | grep kube-apiserver),查找失败的 kube-apiserver liveness probes
    3. Execute the following command to simulate the liveness probe for the kube-apiserver Pod, which should fail, if encountering the issue:
      执行以下命令来模拟 kube-apiserver Pod 的活性探测,如果遇到问题,该探测应该会失败:
      /var/lib/rancher/rke2/bin/crictl --runtime-endpoint unix:///run/k3s/containerd/containerd.sock exec $(/var/lib/rancher/rke2/bin/crictl --runtime-endpoint unix:///run/k3s/containerd/containerd.sock ps | grep kube-apiserver | awk '{print $1}') kubectl get --server=https://localhost:6443/ --client-certificate=/var/lib/rancher/rke2/server/tls/client-kube-apiserver.crt --client-key=/var/lib/rancher/rke2/server/tls/client-kube-apiserver.key --certificate-authority=/var/lib/rancher/rke2/server/tls/server-ca.crt --raw=/livez
    4. Perform the simulated liveness probe for the kube-apiserver again, replacing localhost with 127.0.0.1, which should succeed:
      再次对 kube-apiserver 进行模拟的活体探测,将 localhost 替换为 127.0.0.1,应该能成功:
      /var/lib/rancher/rke2/bin/crictl --runtime-endpoint unix:///run/k3s/containerd/containerd.sock exec $(/var/lib/rancher/rke2/bin/crictl --runtime-endpoint unix:///run/k3s/containerd/containerd.sock ps | grep kube-apiserver | awk '{print $1}') kubectl get --server=https://127.0.0.1:6443/ --client-certificate=/var/lib/rancher/rke2/server/tls/client-kube-apiserver.crt --client-key=/var/lib/rancher/rke2/server/tls/client-kube-apiserver.key --certificate-authority=/var/lib/rancher/rke2/server/tls/server-ca.crt --raw=/livez
  3. Fix the host or host template, to ensure a valid /etc/hosts file is present, with an entry mapping localhost to 127.0.0.1, as expected.
    修复主机或主机模板,确保存在有效的 /etc/hosts 文件,并按预期将 localhost 映射到 127.0.0.1。
Cause 病因

The /etc/hosts file on the node was empty and did not contain any localhost references, causing DNS resolution failures for the kube-apiserver liveness probes to localhost.
节点上的 /etc/hosts 文件是空的,且不包含任何 localhost 引用,导致 kube-apiserver liveness 探测器向 localhost 的 DNS 解析失败。

http://www.jsqmd.com/news/529976/

相关文章:

  • 在vscode中使用create vue创建项目(小白向)
  • 越招人越亏?ToB必建的复利飞轮
  • MCP协议落地实战手册(REST开发者必读的协议升维指南)
  • 3分钟掌握WebGPU加速图像修复:Inpaint-web浏览器端零配置解决方案
  • Unity Timeline绑定丢失?教你用ScriptableObject自动备份与恢复(附完整代码)
  • 3步掌握PyEMD:从信号分解到模态分析全攻略
  • Arduino异步移位寄存器读取库AsyncShiftIn详解
  • REST API调用耗时总超200ms?MCP协议在K8s Service Mesh中实现端到端P99<17ms(含全链路压测报告)
  • 从AODV协议仿真到毕业论文:如何用NS2和AWK脚本快速生成网络性能对比图?
  • 79. 如何在 RKE2 或 K3s 集群中配置 CPU-manager-policy
  • Linux系统优化Baichuan-M2-32B推理性能的10个技巧
  • DeepSeek API实战指南:从零开始,随心所欲集成你的AI助手
  • 制造业的中枢神经:MES系统如何驱动智慧工厂从“自动化”迈向“自主化”(PPT)
  • DeepSeek-R1-Distill-Qwen-1.5B政务咨询应用:合规问答系统搭建教程
  • EI 论文复现:基于净能力及二阶锥规划的分布式光储多场景协同优化策略
  • FLUX.1-dev效果验证:第三方评测机构对120亿参数模型的真实打分
  • OFA图像语义蕴含Web应用作品集:图文匹配AI精彩案例分享
  • 如何解决transformers库导入错误:Gemma3ForConditionalGeneration缺失的实战指南
  • Mac开发者必备:PlistEdit Pro 1.9.1最新版安装与JSON编辑避坑指南
  • 新手也能搞定的1kHz正弦波发生器:用运放和文氏电桥从仿真到洞洞板的完整避坑指南
  • 二极管选型避坑指南:从锗管到肖特基,5种常见类型优缺点对比
  • 3步突破安卓截图限制:Xposed-Disable-FLAG_SECURE终极指南
  • 163MusicLyrics:一站式音乐歌词获取与管理工具完全指南
  • Stable Diffusion XL 1.0部署案例:灵感画廊在Mac M2/M3芯片上的Metal加速适配
  • 集团数字化建设里程碑:DMS/TMS与LIMS系统全面启动,赋能质量管理体系
  • 突破视频监控壁垒:WVP-GB28181-Pro开源平台实战指南
  • Linux AXI-DMA 驱动调试与实战排错指南
  • 总结一下断言与防御式编程
  • 揭秘MCP Sampling接口RT飙升300%的真相:从gRPC拦截器到异步缓冲的7层调用链深度剖析
  • JS射线法实战:5分钟搞定电子围栏与快递区域判断(附完整代码)