本文档记录了在 AWS 中国区实现 EKS 跨区域节点加入的完整过程,包括:
- 北京同区域跨 VPC 节点加入
- 宁夏区域节点跨区域节加入北京 EKS 集群
Nodes (4):ip-10-200-1-254.cn-northwest-1.compute.internal Ready: True ← 宁夏跨区域节点 ✅ip-172-31-18-54.cn-north-1.compute.internal Ready: True ← 北京跨 VPC 节点 ✅ip-192-168-3-84.cn-north-1.compute.internal Ready: True ← 北京托管节点组fargate-ip-192-168-46-192... Ready: True ← Fargate 节点
同AZ跨VPC
创建 Access Entry:
aws eks create-access-entry \--cluster-name clustername \--principal-arn arn:aws-cn:iam::xxxxxxxxxxx:role/myEKSNodeRole \--type EC2_LINUX \--kubernetes-groups system:nodes \--username system:node:{{EC2PrivateDNSName}} \--region cn-north-1
AL2023 AMI nodeadm 配置
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:cluster:name: clusternameapiServerEndpoint: https://F4B05xxxxxxxF79A8CA3EE4.yl4.cn-north-1.eks.amazonaws.com.cncertificateAuthority: LS0tLS1CRUdJTi...cidr: 10.100.0.0/16
北京同区域跨 VPC 节点成功加入集群,状态 Ready: True。
跨region
宁夏区节点尝试加入北京 EKS 集群时,kubelet 日志显示:
E0522 15:30:00.123456 12345 round_trippers.go:553]
Request: POST https://F4B054AExxxxxx79A8CA3EE4.yl4.cn-north-1.eks.amazonaws.com.cn/api/v1/nodes
Response: 401 Unauthorized
401 Unauthorized 表示认证失败。这很奇怪,因为:
- 同一个 IAM 角色
myEKSNodeRole在北京节点上正常工作 - VPC Peering 网络已经打通,可以 ping 通 EKS API endpoint
- STS presigned URL 可以正常解析出 IAM identity
初步假设:
网络不通→ 已验证 VPC Peering 可达Token 生成问题→ 同角色在北京可生成有效 tokenIAM 权限问题→ 同角色在北京节点正常工作
在两个节点上分别生成 token 并测试:
# 北京节点生成 token
TOKEN_BEIJING=$(curl -s "http://169.254.169.254/latest/meta-data/identity-credentials/ec2/security-credentials/ec2-instance" \-H "X-aws-ec2-metadata-token: $TOKEN" | jq -r .Token)# 宁夏节点生成 token
TOKEN_NINGXIA=$(同上命令)# 测试 STS 验证
curl -s "https://sts.cn-north-1.amazonaws.com.cn/?Action=GetCallerIdentity&Version=2011-06-15&X-Amz-Security-Token=$TOKEN"
# 两者都返回正确的 IAM identity# 测试 EKS API
curl -s -H "Authorization: Bearer $TOKEN_NINGXIA" \"https://EKS-ENDPOINT/api/v1/nodes"
# 北京 token: 403 Forbidden (认证成功,授权取决于角色)
# 宁夏 token: 401 Unauthorized (认证失败)
STS 接受宁夏 token 并正确解析出 IAM identity,说明 IAM 层面的认证是成功的。但 EKS API 返回 401,说明问题出在 EKS 的认证 webhook 上。
北京 token 返回 403 Forbidden(认证成功,只是没有权限),而宁夏 token 返回 401 Unauthorized(认证失败)。这说明 EKS 对两者的处理方式不同。
基于以上差异,我们推测EC2_LINUX access entry 的认证流程是:
- 首先验证 IAM 签名(这步通过了,因为 STS 验证成功)
- 然后解析 username 模板
{{EC2PrivateDNSName}} - 为了解析这个模板,EKS 需要在集群所在区域调用
DescribeInstancesAPI 查找实例 - 宁夏实例 ID
i-07aexxxxxxxe2a在北京区域的 EC2 中不存在,所以认证失败
这解释了为什么:
- STS 验证成功(STS 只检查签名)
- EKS webhook 拒绝(webhook 还要检查实例是否存在)
删除原有 Access Entry
aws eks delete-access-entry \--cluster-name clustername \--principal-arn arn:aws-cn:iam::xxxxxxxxxxx:role/myEKSNodeRole \--region cn-north-1
创建 STANDARD 类型 Access Entry
aws eks create-access-entry \--cluster-name clustername \--principal-arn arn:aws-cn:iam::xxxxxxxxxxx:role/myEKSNodeRole \--type STANDARD \--username "eks-cross-region-node" \--region cn-north-1
STANDARD 类型的 access entry 只验证 IAM 签名,不执行 EC2 实例验证。这正是跨区域场景需要的。
但是 STANDARD 类型有一个限制:不能使用 system: 前缀的 kubernetes-groups(如 system:nodes)。这意味着我们不能直接使用内置的节点权限,需要手动创建 RBAC。
创建自定义 RBAC,参考 system:node ClusterRole 的权限设计,但做适当精简。
# 获取管理员 token
TOKEN=$(aws eks get-token --cluster-name clustername --region cn-north-1 --query status.token --output text)# 创建 ClusterRole
curl -X POST -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \"https://EKS-ENDPOINT/apis/rbac.authorization.k8s.io/v1/clusterroles" -d '{"apiVersion": "rbac.authorization.k8s.io/v1","kind": "ClusterRole","metadata": {"name": "cross-region-node-role"},"rules": [{"apiGroups": [""], "resources": ["nodes"], "verbs": ["get","list","watch","create","update","patch","delete"]},{"apiGroups": [""], "resources": ["nodes/status"], "verbs": ["patch","update"]},{"apiGroups": [""], "resources": ["pods"], "verbs": ["get","list","watch","delete"]},{"apiGroups": [""], "resources": ["pods/status"], "verbs": ["patch","update"]},{"apiGroups": [""], "resources": ["events"], "verbs": ["create","patch","update"]},{"apiGroups": [""], "resources": ["configmaps","secrets","endpoints","services"], "verbs": ["get","list","watch"]},{"apiGroups": [""], "resources": ["serviceaccounts/token"], "verbs": ["create"]},{"apiGroups": ["coordination.k8s.io"], "resources": ["leases"], "verbs": ["get","list","watch","create","update","patch","delete"]},{"apiGroups": ["storage.k8s.io"], "resources": ["csinodes","csidrivers","storageclasses","volumeattachments"], "verbs": ["get","list","watch","create","update","patch"]},{"apiGroups": ["certificates.k8s.io"], "resources": ["certificatesigningrequests"], "verbs": ["get","list","watch","create"]},{"apiGroups": ["authentication.k8s.io"], "resources": ["tokenreviews"], "verbs": ["create"]},{"apiGroups": ["authorization.k8s.io"], "resources": ["subjectaccessreviews"], "verbs": ["create"}]}'# 创建 ClusterRoleBinding
curl -X POST -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \"https://EKS-ENDPOINT/apis/rbac.authorization.k8s.io/v1/clusterrolebindings" -d '{"apiVersion": "rbac.authorization.k8s.io/v1","kind": "ClusterRoleBinding","metadata": {"name": "cross-region-node-binding"},"roleRef": {"apiGroup": "rbac.authorization.k8s.io","kind": "ClusterRole","name": "cross-region-node-role"},"subjects": [{"kind": "User","name": "eks-cross-region-node","apiGroup": "rbac.authorization.k8s.io"}]}'
重启节点 kubelet
systemctl restart kubelet
验证结果
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-200-1-254.cn-northwest-1.compute.internal Ready <none> 1m v1.28.x
ip-172-31-18-54.cn-north-1.compute.internal Ready <none> 1h v1.28.x
ip-192-168-3-84.cn-north-1.compute.internal Ready <none> 2d v1.28.x
