从零开始:手动部署Kubernetes(k8s)v1.34.0高可用集群
1. 环境准备与系统配置
1.1 主机规划与网络拓扑
在开始部署Kubernetes高可用集群之前,我们需要先规划好主机角色和网络架构。典型的Kubernetes高可用集群包含以下节点类型:
- Master节点:运行控制平面组件(API Server、Controller Manager、Scheduler等),建议至少3个以实现高可用
- Worker节点:运行工作负载的实际节点
- 负载均衡节点:可选,用于对外暴露API Server服务
在我们的部署方案中,使用5台主机:
- 3个Master节点(k8s-master01/02/03)
- 2个Worker节点(k8s-node01/02)
- 1个虚拟IP(172.16.1.36)通过keepalived实现高可用
网络规划采用双栈配置(IPv4+IPv6):
- 物理网络:172.16.1.0/24
- Service网段:10.96.0.0/12
- Pod网段:172.16.0.0/12
- IPv6物理网络:fc00::/8
- IPv6 Service网段:fd00:1111::/112
- IPv6 Pod网段:fc00:2222::/112
1.2 系统初始化配置
所有节点需要执行以下基础配置:
# 关闭防火墙 systemctl disable --now firewalld # 关闭SELinux setenforce 0 sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config # 关闭swap sed -ri 's/.*swap.*/#&/' /etc/fstab swapoff -a && sysctl -w vm.swappiness=0 # 设置时区同步 yum install -y chrony cat > /etc/chrony.conf << EOF pool ntp.aliyun.com iburst driftfile /var/lib/chrony/drift makestep 1.0 3 rtcsync allow 172.16.1.0/24 local stratum 10 keyfile /etc/chrony.keys leapsectz right/UTC logdir /var/log/chrony EOF systemctl restart chronyd && systemctl enable chronyd1.3 内核参数优化
Kubernetes对Linux内核参数有特定要求,需要调整以下参数:
cat <<EOF > /etc/sysctl.d/k8s.conf net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-iptables = 1 fs.may_detach_mounts = 1 vm.overcommit_memory=1 vm.panic_on_oom=0 fs.inotify.max_user_watches=89100 fs.file-max=52706963 fs.nr_open=52706963 net.netfilter.nf_conntrack_max=2310720 net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_intvl =15 net.ipv4.tcp_max_tw_buckets = 36000 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_max_orphans = 327680 net.ipv4.tcp_orphan_retries = 3 net.ipv4.tcp_syncookies = 1 net.ipv4.ip_conntrack_max = 65536 net.ipv4.tcp_max_syn_backlog = 16384 net.ipv4.tcp_timestamps = 0 net.core.somaxconn = 16384 net.ipv6.conf.all.disable_ipv6 = 0 net.ipv6.conf.default.disable_ipv6 = 0 net.ipv6.conf.lo.disable_ipv6 = 0 net.ipv6.conf.all.forwarding = 1 EOF sysctl --system2. 容器运行时安装
Kubernetes支持多种容器运行时,这里我们以containerd为例进行安装配置。
2.1 安装containerd
# 下载containerd二进制包 wget https://github.com/containerd/containerd/releases/download/v2.0.5/containerd-2.0.5-linux-amd64.tar.gz tar xf containerd-*-linux-amd64.tar.gz -C /usr/local/ # 创建systemd服务 cat > /etc/systemd/system/containerd.service << EOF [Unit] Description=containerd container runtime Documentation=https://containerd.io After=network.target local-fs.target [Service] ExecStartPre=-/sbin/modprobe overlay ExecStart=/usr/local/bin/containerd Type=notify Delegate=yes KillMode=process Restart=always RestartSec=5 LimitNPROC=infinity LimitCORE=infinity LimitNOFILE=infinity TasksMax=infinity OOMScoreAdjust=-999 [Install] WantedBy=multi-user.target EOF # 加载内核模块 cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf overlay br_netfilter EOF systemctl restart systemd-modules-load.service # 生成默认配置 mkdir -p /etc/containerd containerd config default | tee /etc/containerd/config.toml # 修改sandbox镜像为国内源 sed -i "s#registry.k8s.io#registry.aliyuncs.com/chenby#g" /etc/containerd/config.toml # 启动服务 systemctl daemon-reload systemctl enable --now containerd2.2 配置crictl客户端
# 下载crictl wget https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.34.0/crictl-v1.34.0-linux-amd64.tar.gz tar xf crictl-v*-linux-amd64.tar.gz -C /usr/bin/ # 创建配置文件 cat > /etc/crictl.yaml << EOF runtime-endpoint: unix:///run/containerd/containerd.sock image-endpoint: unix:///run/containerd/containerd.sock timeout: 10 debug: false EOF3. Kubernetes组件安装
3.1 下载Kubernetes二进制文件
# 下载etcd和k8s组件 wget https://github.com/etcd-io/etcd/releases/download/v3.5.21/etcd-v3.5.21-linux-amd64.tar.gz wget https://cdn.dl.k8s.io/release/v1.34.0/kubernetes-server-linux-amd64.tar.gz # 解压安装 tar -xf kubernetes-server-linux-amd64.tar.gz --strip-components=3 -C /usr/local/bin kubernetes/server/bin/kube{let,ctl,-apiserver,-controller-manager,-scheduler,-proxy} tar -xf etcd*.tar.gz && mv etcd-*/etcd /usr/local/bin/ && mv etcd-*/etcdctl /usr/local/bin/3.2 证书生成
Kubernetes集群需要大量证书用于组件间认证,我们使用cfssl工具生成证书。
3.2.1 安装cfssl工具
wget "https://github.com/cloudflare/cfssl/releases/download/v1.6.5/cfssl_1.6.5_linux_amd64" -O /usr/local/bin/cfssl wget "https://github.com/cloudflare/cfssl/releases/download/v1.6.5/cfssljson_1.6.5_linux_amd64" -O /usr/local/bin/cfssljson chmod +x /usr/local/bin/cfssl /usr/local/bin/cfssljson3.2.2 生成CA证书
cat > ca-config.json << EOF { "signing": { "default": { "expiry": "876000h" }, "profiles": { "kubernetes": { "usages": ["signing", "key encipherment", "server auth", "client auth"], "expiry": "876000h" } } } } EOF cat > ca-csr.json << EOF { "CN": "kubernetes", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "Beijing", "L": "Beijing", "O": "Kubernetes", "OU": "Kubernetes-manual" } ], "ca": { "expiry": "876000h" } } EOF cfssl gencert -initca ca-csr.json | cfssljson -bare /etc/kubernetes/pki/ca3.2.3 生成API Server证书
cat > apiserver-csr.json << EOF { "CN": "kube-apiserver", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "Beijing", "L": "Beijing", "O": "Kubernetes", "OU": "Kubernetes-manual" } ] } EOF cfssl gencert \ -ca=/etc/kubernetes/pki/ca.pem \ -ca-key=/etc/kubernetes/pki/ca-key.pem \ -config=ca-config.json \ -hostname=10.96.0.1,172.16.1.36,127.0.0.1,kubernetes,kubernetes.default,kubernetes.default.svc,kubernetes.default.svc.cluster,kubernetes.default.svc.cluster.local,172.16.1.31,172.16.1.32,172.16.1.33,172.16.1.34,172.16.1.35 \ -profile=kubernetes \ apiserver-csr.json | cfssljson -bare /etc/kubernetes/pki/apiserver4. 高可用方案部署
4.1 使用keepalived+haproxy实现高可用
4.1.1 安装haproxy
yum install -y haproxy cat > /etc/haproxy/haproxy.cfg << EOF global maxconn 2000 ulimit-n 16384 log 127.0.0.1 local0 err stats timeout 30s defaults log global mode http option httplog timeout connect 5000 timeout client 50000 timeout server 50000 timeout http-request 15s timeout http-keep-alive 15s frontend k8s-master bind 0.0.0.0:9443 bind 127.0.0.1:9443 mode tcp option tcplog tcp-request inspect-delay 5s default_backend k8s-master backend k8s-master mode tcp option tcplog option tcp-check balance roundrobin server k8s-master01 172.16.1.31:6443 check server k8s-master02 172.16.1.32:6443 check server k8s-master03 172.16.1.33:6443 check EOF systemctl enable --now haproxy4.1.2 安装keepalived
yum install -y keepalived # Master节点配置 cat > /etc/keepalived/keepalived.conf << EOF ! Configuration File for keepalived global_defs { router_id LVS_DEVEL } vrrp_script chk_apiserver { script "/etc/keepalived/check_apiserver.sh" interval 5 weight -5 fall 2 rise 1 } vrrp_instance VI_1 { state MASTER interface ens160 virtual_router_id 51 priority 100 advert_int 2 authentication { auth_type PASS auth_pass K8SHA_KA_AUTH } virtual_ipaddress { 172.16.1.36 } track_script { chk_apiserver } } EOF # 健康检查脚本 cat > /etc/keepalived/check_apiserver.sh << EOF #!/bin/bash err=0 for k in \$(seq 1 3) do check_code=\$(pgrep haproxy) if [[ \$check_code == "" ]]; then err=\$(expr \$err + 1) sleep 1 continue else err=0 break fi done if [[ \$err != "0" ]]; then echo "systemctl stop keepalived" /usr/bin/systemctl stop keepalived exit 1 else exit 0 fi EOF chmod +x /etc/keepalived/check_apiserver.sh systemctl enable --now keepalived5. 控制平面组件部署
5.1 etcd集群部署
# 创建etcd配置文件 cat > /etc/etcd/etcd.config.yml << EOF name: 'k8s-master01'>cat > /usr/lib/systemd/system/kube-apiserver.service << EOF [Unit] Description=Kubernetes API Server Documentation=https://github.com/kubernetes/kubernetes After=network.target [Service] ExecStart=/usr/local/bin/kube-apiserver \\ --v=2 \\ --allow-privileged=true \\ --bind-address=0.0.0.0 \\ --secure-port=6443 \\ --advertise-address=172.16.1.31 \\ --service-cluster-ip-range=10.96.0.0/12,fd00:1111::/112 \\ --service-node-port-range=30000-32767 \\ --etcd-servers=https://172.16.1.31:2379,https://172.16.1.32:2379,https://172.16.1.33:2379 \\ --etcd-cafile=/etc/kubernetes/pki/etcd/etcd-ca.pem \\ --etcd-certfile=/etc/kubernetes/pki/etcd/etcd.pem \\ --etcd-keyfile=/etc/kubernetes/pki/etcd/etcd-key.pem \\ --client-ca-file=/etc/kubernetes/pki/ca.pem \\ --tls-cert-file=/etc/kubernetes/pki/apiserver.pem \\ --tls-private-key-file=/etc/kubernetes/pki/apiserver-key.pem \\ --kubelet-client-certificate=/etc/kubernetes/pki/apiserver.pem \\ --kubelet-client-key=/etc/kubernetes/pki/apiserver-key.pem \\ --service-account-key-file=/etc/kubernetes/pki/sa.pub \\ --service-account-signing-key-file=/etc/kubernetes/pki/sa.key \\ --service-account-issuer=https://kubernetes.default.svc.cluster.local \\ --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname \\ --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota \\ --authorization-mode=Node,RBAC \\ --enable-bootstrap-token-auth=true \\ --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.pem \\ --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.pem \\ --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client-key.pem \\ --requestheader-allowed-names=aggregator \\ --requestheader-group-headers=X-Remote-Group \\ --requestheader-extra-headers-prefix=X-Remote-Extra- \\ --requestheader-username-headers=X-Remote-User \\ --enable-aggregator-routing=true Restart=on-failure RestartSec=10s LimitNOFILE=65535 [Install] WantedBy=multi-user.target EOF systemctl daemon-reload systemctl enable --now kube-apiserver6. 集群网络与插件部署
6.1 安装Calico网络插件
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.3/manifests/tigera-operator.yaml curl https://raw.githubusercontent.com/projectcalico/calico/v3.30.3/manifests/custom-resources.yaml -O # 修改custom-resources.yaml配置 cat > custom-resources.yaml << EOF apiVersion: operator.tigera.io/v1 kind: Installation metadata: name: default spec: calicoNetwork: ipPools: - name: default-ipv4-ippool blockSize: 26 cidr: 172.16.0.0/12 encapsulation: VXLANCrossSubnet natOutgoing: Enabled nodeSelector: all() EOF kubectl create -f custom-resources.yaml6.2 安装CoreDNS
helm repo add coredns https://coredns.github.io/helm helm install coredns coredns/coredns -n kube-system --set service.clusterIP=10.96.0.107. 节点加入与验证
7.1 Worker节点加入集群
在Worker节点上执行:
# 安装kubelet cat > /usr/lib/systemd/system/kubelet.service << EOF [Unit] Description=Kubernetes Kubelet Documentation=https://github.com/kubernetes/kubernetes After=network-online.target firewalld.service containerd.service Wants=network-online.target Requires=containerd.service [Service] ExecStart=/usr/local/bin/kubelet \\ --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig \\ --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \\ --config=/etc/kubernetes/kubelet-conf.yml \\ --container-runtime-endpoint=unix:///run/containerd/containerd.sock \\ --node-labels=node.kubernetes.io/node= Restart=always RestartSec=10s [Install] WantedBy=multi-user.target EOF # 启动kubelet systemctl daemon-reload systemctl enable --now kubelet7.2 集群验证
# 查看节点状态 kubectl get nodes # 部署测试Pod cat<<EOF | kubectl apply -f - apiVersion: v1 kind: Pod metadata: name: busybox namespace: default spec: containers: - name: busybox image: busybox:1.28 command: ["sleep", "3600"] EOF # 检查Pod运行状态 kubectl get pod -o wide8. 集群扩展组件
8.1 安装Metrics Server
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml -O metrics-server.yaml # 修改配置添加以下参数 args: - --kubelet-insecure-tls - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.pem kubectl apply -f metrics-server.yaml # 验证 kubectl top node8.2 安装Dashboard
helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/ helm install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard \ --namespace kube-system \ --set service.type=NodePort9. 生产环境建议
- 日志收集:部署EFK(Elasticsearch+Fluentd+Kibana)或Loki+Promtail+Grafana日志系统
- 监控告警:部署Prometheus+Alertmanager+Grafana监控系统
- 备份恢复:定期备份etcd数据,可使用etcdctl snapshot save命令
- 安全加固:
- 启用Pod安全策略
- 使用NetworkPolicy限制Pod间通信
- 定期轮换证书
- 自动扩缩:配置Cluster Autoscaler和HPA实现自动扩缩容
10. 常见问题排查
节点NotReady:
- 检查kubelet日志:journalctl -u kubelet -f
- 验证网络插件是否正常运行
- 检查节点资源是否充足
Pod创建失败:
- 使用kubectl describe pod 查看事件
- 检查资源配额限制
- 验证镜像拉取是否成功
网络问题:
- 使用kubectl exec进入Pod测试网络连通性
- 检查CoreDNS是否正常运行
- 验证网络插件配置
证书过期:
- 定期检查证书有效期:openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text | grep Not
- 使用kubeadm alpha certs renew更新证书(如使用kubeadm部署)
