别再手动复制粘贴了!用Ansible自动化部署Kubernetes多Master高可用集群(含Haproxy+Keepalived)
从零构建高可用Kubernetes集群:Ansible自动化部署实战指南
为什么我们需要自动化部署Kubernetes集群?
在云原生时代,Kubernetes已经成为容器编排的事实标准。然而,手动部署一个高可用的Kubernetes集群仍然是一项复杂且容易出错的任务。想象一下,你需要重复执行数十个步骤,在多台服务器上配置系统参数、安装依赖、部署组件——这不仅耗时,而且几乎不可能保证环境的一致性。
这正是Ansible这类自动化工具大显身手的地方。通过Ansible,我们可以将整个部署过程编码化,实现一键部署、版本控制和可重复执行。更重要的是,当我们需要扩展集群或重建环境时,自动化部署能节省大量时间和精力。
1. 环境准备与Ansible基础配置
1.1 基础设施规划
在开始之前,我们需要明确集群的架构设计。一个典型的高可用Kubernetes集群包含以下组件:
- 3个Master节点:运行控制平面组件(API Server、Controller Manager、Scheduler等)
- N个Worker节点:运行业务工作负载
- 负载均衡层:使用Haproxy+Keepalived实现API Server的高可用
- 网络插件:Calico、Flannel等提供Pod间通信
以下是一个示例的主机清单表格:
| 主机名 | IP地址 | 角色 | 备注 |
|---|---|---|---|
| master01 | 192.168.1.1 | Master + LB | 同时运行Haproxy |
| master02 | 192.168.1.2 | Master + LB | 同时运行Haproxy |
| master03 | 192.168.1.3 | Master + LB | 同时运行Haproxy |
| worker01 | 192.168.1.4 | Worker | 运行业务Pod |
| worker02 | 192.168.1.5 | Worker | 运行业务Pod |
| vip | 192.168.1.100 | 虚拟IP | 由Keepalived管理 |
1.2 Ansible环境配置
首先,我们需要在控制节点(可以是你的本地开发机或其中一台Master节点)上安装Ansible:
# 在Ubuntu/Debian上 sudo apt update && sudo apt install -y ansible # 在CentOS/RHEL上 sudo yum install -y epel-release sudo yum install -y ansible创建Ansible项目目录结构:
k8s-cluster/ ├── inventories/ │ ├── production/ │ │ ├── group_vars/ │ │ ├── host_vars/ │ │ └── hosts │ └── staging/ ├── roles/ │ ├── common/ │ ├── docker/ │ ├── haproxy/ │ ├── keepalived/ │ ├── kubernetes/ │ └── calico/ └── playbooks/ ├── site.yml ├── master.yml └── worker.yml配置inventories/production/hosts文件:
[masters] master01 ansible_host=192.168.1.1 master02 ansible_host=192.168.1.2 master03 ansible_host=192.168.1.3 [workers] worker01 ansible_host=192.168.1.4 worker02 ansible_host=192.168.1.5 [load_balancers:children] masters [kube_cluster:children] masters workers2. 系统基础配置自动化
2.1 操作系统通用配置
创建roles/common/tasks/main.yml文件,包含所有节点都需要的基础配置:
- name: Disable SELinux selinux: state: disabled - name: Disable swap shell: | swapoff -a sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab - name: Configure sysctl parameters sysctl: name: "{{ item.key }}" value: "{{ item.value }}" state: present reload: yes with_items: - { key: 'net.bridge.bridge-nf-call-iptables', value: '1' } - { key: 'net.ipv4.ip_forward', value: '1' } - { key: 'vm.swappiness', value: '0' } - name: Install base packages yum: name: "{{ packages }}" state: present vars: packages: - conntrack - ipvsadm - ipset - iptables - curl - sysstat - libseccomp2.2 内核模块加载
为支持Kubernetes的IPVS模式,我们需要加载必要的内核模块。创建roles/common/tasks/ipvs.yml:
- name: Ensure ipvs modules are loaded modprobe: name: "{{ item }}" state: present with_items: - ip_vs - ip_vs_rr - ip_vs_wrr - ip_vs_sh - nf_conntrack_ipv4 - name: Persist ipvs modules copy: content: | #!/bin/bash modprobe -- ip_vs modprobe -- ip_vs_rr modprobe -- ip_vs_wrr modprobe -- ip_vs_sh modprobe -- nf_conntrack_ipv4 dest: /etc/sysconfig/modules/ipvs.modules mode: 07553. 容器运行时安装与配置
3.1 Docker安装
创建roles/docker/tasks/main.yml:
- name: Add Docker repository yum_repository: name: docker-ce description: Docker CE Repository baseurl: https://download.docker.com/linux/centos/$releasever/$basearch/stable gpgcheck: yes gpgkey: https://download.docker.com/linux/centos/gpg enabled: yes - name: Install Docker yum: name: docker-ce-18.09.7 state: present - name: Configure Docker daemon copy: content: | { "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m" } } dest: /etc/docker/daemon.json - name: Start and enable Docker service: name: docker state: started enabled: yes注意:Kubernetes 1.20+版本开始逐渐弃用Docker,你也可以选择containerd作为容器运行时。配置方式类似,但需要调整相关参数。
4. 高可用负载均衡部署
4.1 Haproxy配置
创建roles/haproxy/tasks/main.yml:
- name: Install Haproxy yum: name: haproxy state: present - name: Configure Haproxy template: src: haproxy.cfg.j2 dest: /etc/haproxy/haproxy.cfg - name: Start Haproxy service: name: haproxy state: restarted enabled: yes对应的模板文件roles/haproxy/templates/haproxy.cfg.j2:
global log /dev/log local0 log /dev/log local1 notice daemon defaults log global mode tcp timeout connect 5000ms timeout client 50000ms timeout server 50000ms frontend k8s-api bind *:6443 default_backend k8s-api backend k8s-api balance roundrobin option tcp-check {% for host in groups['masters'] %} server {{ hostvars[host].ansible_hostname }} {{ hostvars[host].ansible_host }}:6443 check {% endfor %}4.2 Keepalived配置
创建roles/keepalived/tasks/main.yml:
- name: Install Keepalived yum: name: keepalived state: present - name: Configure Keepalived template: src: keepalived.conf.j2 dest: /etc/keepalived/keepalived.conf - name: Start Keepalived service: name: keepalived state: restarted enabled: yes模板文件roles/keepalived/templates/keepalived.conf.j2:
vrrp_script chk_haproxy { script "killall -0 haproxy" interval 2 weight 2 } vrrp_instance VI_1 { interface {{ ansible_default_ipv4.interface }} state {{ 'MASTER' if inventory_hostname == 'master01' else 'BACKUP' }} virtual_router_id 51 priority {{ 100 if inventory_hostname == 'master01' else (90 if inventory_hostname == 'master02' else 80) }} advert_int 1 authentication { auth_type PASS auth_pass 42 } virtual_ipaddress { {{ k8s_vip }} } track_script { chk_haproxy } }在group_vars/all.yml中定义变量:
k8s_vip: 192.168.1.1005. Kubernetes控制平面部署
5.1 安装Kubernetes组件
创建roles/kubernetes/tasks/main.yml:
- name: Add Kubernetes repository yum_repository: name: kubernetes description: Kubernetes Repository baseurl: https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/ gpgcheck: no enabled: yes - name: Install kubeadm, kubelet and kubectl yum: name: "{{ packages }}" state: present disable_gpg_check: yes vars: packages: - kubelet-1.19.0 - kubeadm-1.19.0 - kubectl-1.19.0 - name: Enable kubelet service: name: kubelet enabled: yes5.2 初始化第一个Master节点
创建playbooks/master.yml:
- hosts: master01 become: yes roles: - common - docker - kubernetes tasks: - name: Initialize Kubernetes cluster command: kubeadm init --config=/tmp/kubeadm-config.yaml args: creates: /etc/kubernetes/admin.conf register: kubeadm_init - name: Copy admin config to local fetch: src: /etc/kubernetes/admin.conf dest: /tmp/admin.conf flat: yes - name: Deploy Calico network command: kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml when: kubeadm_init.rc == 05.3 加入其他Master节点
在第一个Master节点初始化完成后,我们可以获取加入命令:
kubeadm token create --print-join-command然后创建任务将其他Master节点加入集群:
- name: Join other masters hosts: masters[1:] become: yes tasks: - name: Join master to cluster command: "{{ hostvars['master01'].join_command }} --control-plane" when: inventory_hostname != 'master01'6. 网络插件与Worker节点配置
6.1 部署Calico网络
创建roles/calico/tasks/main.yml:
- name: Download Calico manifest get_url: url: https://docs.projectcalico.org/manifests/calico.yaml dest: /tmp/calico.yaml - name: Apply Calico network command: kubectl apply -f /tmp/calico.yaml when: inventory_hostname == 'master01'6.2 Worker节点加入集群
创建playbooks/worker.yml:
- hosts: workers become: yes roles: - common - docker - kubernetes tasks: - name: Join worker to cluster command: "{{ hostvars['master01'].join_command }}"7. 验证集群状态
在所有节点部署完成后,我们可以验证集群状态:
kubectl get nodes kubectl get pods -n kube-system kubectl get svc以下是一个健康集群应有的核心组件状态:
| 组件 | 预期状态 | 副本数 |
|---|---|---|
| kube-apiserver | Running | 3 |
| kube-controller-manager | Running | 3 |
| kube-scheduler | Running | 3 |
| etcd | Running | 3 |
| calico-node | Running | N+3 |
| coredns | Running | 2 |
| haproxy | Running | 3 |
| keepalived | Running | 3 |
8. 高级配置与优化
8.1 证书自动续期
Kubernetes集群的证书默认有效期为1年,我们可以配置自动续期:
- name: Enable kubelet certificate rotation lineinfile: path: /var/lib/kubelet/config.yaml regexp: '^rotateCertificates:' line: 'rotateCertificates: true' state: present - name: Restart kubelet service: name: kubelet state: restarted8.2 集群备份与恢复
使用etcdctl备份集群状态:
ETCDCTL_API=3 etcdctl \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ snapshot save snapshot.db8.3 安全加固建议
- 启用Pod安全策略
- 配置网络策略限制Pod间通信
- 使用RBAC严格控制访问权限
- 定期轮换证书和密钥
- 启用审计日志
9. 常见问题排查
9.1 节点NotReady状态
可能原因及解决方案:
- 网络插件未正确安装:检查Calico/Kube-proxy日志
- 容器运行时问题:验证Docker/containerd状态
- kubelet配置错误:检查/var/log/messages和kubelet日志
9.2 Pod无法调度
检查方向:
kubectl describe pod <pod-name> kubectl get events --sort-by=.metadata.creationTimestamp9.3 API Server不可用
排查步骤:
- 验证Haproxy状态
- 检查Keepalived是否维护了VIP
- 确认各Master节点的API Server日志
10. 扩展与升级策略
10.1 集群扩展
添加新Worker节点:
- name: Add new worker hosts: new_worker become: yes roles: - common - docker - kubernetes tasks: - name: Join new worker command: "{{ hostvars['master01'].join_command }}"10.2 集群升级
Kubernetes版本升级步骤:
- 升级kubeadm
- 排空节点
- 升级控制平面
- 升级kubelet和kubectl
- 升级Worker节点
对应的Ansible任务:
- name: Upgrade kubeadm yum: name: kubeadm-{{ target_version }} state: present - name: Drain node command: kubectl drain {{ inventory_hostname }} --ignore-daemonsets - name: Upgrade control plane command: kubeadm upgrade apply v{{ target_version }} - name: Upgrade kubelet and kubectl yum: name: "{{ item }}" state: present with_items: - kubelet-{{ target_version }} - kubectl-{{ target_version }} - name: Uncordon node command: kubectl uncordon {{ inventory_hostname }}