模式介绍
项目文档:https://docs.tigera.io/calico/latest/networking/configuring/vxlan-ipip#configure-ip-in-ip-encapsulation-for-only-cross-subnet-traffic
使用 Calico IPIP 模式时,CALICO_IPV 4 POOL_IPIP 默认值为 Always,任何情况下跨节点请求都会经过 IPIP 封装,即使两个节点在同一网段下。
Calico 提供了一个选项,可以仅对跨越子网的流量进行封装。建议将跨子网选项与 IPIP 配合使用,可以做到最小化封装开销。
使用场景
参考官网文档
部署流程
本文分别部署默认 IPIP 模式与 IPIP CrossSubnet 模式,分别在请求同网段、不同网段时进行抓包对比
1.通过脚本快速生成 IPIP 默认模式
#!/bin/bashset -v# 1. Prepare NoCNI environment
cat <<EOF | HTTP_PROXY= HTTPS_PROXY= http_proxy= https_proxy= kind create cluster --name=calico-ipip --image=burlyluo/kindest:v1.27.3 --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:disableDefaultCNI: truepodSubnet: "10.244.0.0/16"nodes:
- role: control-planekubeadmConfigPatches:- |kind: InitConfigurationnodeRegistration:kubeletExtraArgs:node-ip: 10.1.5.10- role: workerkubeadmConfigPatches:- |kind: JoinConfigurationnodeRegistration:kubeletExtraArgs:node-ip: 10.1.5.11- role: workerkubeadmConfigPatches:- |kind: JoinConfigurationnodeRegistration:kubeletExtraArgs:node-ip: 10.1.8.10- role: workerkubeadmConfigPatches:- |kind: JoinConfigurationnodeRegistration:kubeletExtraArgs:node-ip: 10.1.8.11
EOF# 2. Remove taints
controller_node_ip=`kubectl get node -o wide --no-headers | grep -E "control-plane|bpf1" | awk -F " " '{print $6}'`
kubectl taint nodes $(kubectl get nodes -o name | grep control-plane) node-role.kubernetes.io/control-plane:NoSchedule-
kubectl get nodes -o wide./2-setup-clab.sh# 3. Collect startup message
controller_node_name=$(kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | grep control-plane)
if [ -n "$controller_node_name" ]; thentimeout 1 docker exec -t $controller_node_name bash -c 'cat << EOF > /root/monitor_startup.sh
#!/bin/bash
ip -ts monitor all > /root/startup_monitor.txt 2>&1
EOF
chmod +x /root/monitor_startup.sh && /root/monitor_startup.sh'
elseecho "No such controller_node!"
fi# 4. Install CNI[Calico v3.23.2]
kubectl apply -f calico.yaml
其中 2-setup-clab.sh 的作用是通过 containerlab 创建四个容器,给他们设置 IP 后分别与 kind 创建的四个容器共享网络命名空间,这样 k8s 集群就能使用 kind 参数指定的 node-ip 了:
#!/bin/bashset -vfor br in br-pool0 br-pool1; doip link set $br down > /dev/null 2>&1ip link delete $brip link add $br type bridgeip link set $br up
donecat << EOF > clab.yaml | containerlab destroy -t clab.yaml --cleanup -
name: calico-ipip
topology:nodes:gw0:kind: linuximage: hub.deepflow.yunshan.net/network-demo/vyos:1.4.9cmd: /sbin/initbinds:- /lib/modules:/lib/modules- ./startup-conf/gw0-boot.cfg:/opt/vyatta/etc/config/config.bootbr-pool0:kind: bridgebr-pool1:kind: bridgeserver1:kind: linuximage: hub.deepflow.yunshan.net/network-demo/nettoolnetwork-mode: container:calico-ipip-control-planeexec:- ip addr add 10.1.5.10/24 dev net0- ip route replace default via 10.1.5.1server2:kind: linuximage: hub.deepflow.yunshan.net/network-demo/nettoolnetwork-mode: container:calico-ipip-workerexec:- ip addr add 10.1.5.11/24 dev net0- ip route replace default via 10.1.5.1server3:kind: linuximage: hub.deepflow.yunshan.net/network-demo/nettoolnetwork-mode: container:calico-ipip-worker2exec:- ip addr add 10.1.8.10/24 dev net0- ip route replace default via 10.1.8.1server4:kind: linuximage: hub.deepflow.yunshan.net/network-demo/nettoolnetwork-mode: container:calico-ipip-worker3exec:- ip addr add 10.1.8.11/24 dev net0- ip route replace default via 10.1.8.1links:- endpoints: ["br-pool0:br-pool0-net0", "server1:net0"]mtu: 1500- endpoints: ["br-pool0:br-pool0-net1", "server2:net0"]mtu: 1500- endpoints: ["br-pool1:br-pool1-net0", "server3:net0"]mtu: 1500- endpoints: ["br-pool1:br-pool1-net1", "server4:net0"]mtu: 1500- endpoints: ["gw0:eth1", "br-pool0:br-pool0-net2"]mtu: 1500- endpoints: ["gw0:eth2", "br-pool1:br-pool1-net2"]mtu: 1500
EOF
gw0 中 startup-conf/gw0-boot.cfg 文件的作用就是让 10.1.5.0/24 和 10.1.8.0/24 两个子网能互通(两个子网的默认网关都在 gw0 上,gw0 直接转发就行):
interfaces {ethernet eth1 {address "10.1.5.1/24"duplex "auto"speed "auto"}ethernet eth2 {address "10.1.8.1/24"duplex "auto"speed "auto"}loopback lo {}
}
nat {source {rule 100 {outbound-interface {name "eth0"}source {address "10.1.0.0/16"}translation {address "masquerade"}}}
}
system {config-management {commit-revisions "100"}console {device ttyS0 {speed "9600"}}host-name "gw0"login {user vyos {authentication {encrypted-password "$6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/"plaintext-password ""}}}time-zone "UTC"
}
## calico yaml# Auto-detect the BGP IP address.- name: IPvalue: "autodetect"# Enable IPIP- name: CALICO_IPV4POOL_IPIPvalue: "Always"# Enable or Disable VXLAN on the default IP pool.- name: CALICO_IPV4POOL_VXLANvalue: "Never"# Enable or Disable VXLAN on the default IPv6 IP pool.- name: CALICO_IPV6POOL_VXLANvalue: "Never"
2.通过脚本快速生成 IPIP CrossSubnet 模式
其余部署脚本一致,仅在 calico CALICO_IPV4POOL_IPIP 模式中有差异:
## calico yaml# Auto-detect the BGP IP address.- name: IPvalue: "autodetect"# Enable IPIP- name: CALICO_IPV4POOL_IPIPvalue: "CrossSubnet"# Enable or Disable VXLAN on the default IP pool.- name: CALICO_IPV4POOL_VXLANvalue: "Never"# Enable or Disable VXLAN on the default IPv6 IP pool.- name: CALICO_IPV6POOL_VXLANvalue: "Never"
创建测试 Pod
本质是 Nginx,用于后续请求抓包使用
apiVersion: apps/v1
kind: StatefulSet
metadata:labels:app: nginxname: pod
spec:replicas: 4selector:matchLabels:app: nginxtemplate:metadata:labels:app: nginxspec:containers:- image: burlyluo/nettool:latestname: nettoolboxenv:- name: NETTOOL_NODE_NAMEvalueFrom:fieldRef:fieldPath: spec.nodeNamesecurityContext:privileged: trueaffinity:podAntiAffinity:requiredDuringSchedulingIgnoredDuringExecution:- labelSelector:matchLabels:app: nginxtopologyKey: kubernetes.io/hostname
查询部署结果
1.查询 IPIP 默认模式部署结果
root@network-demo:~# docker ps --format '{{.Names}}'
clab-calico-ipip-server2
clab-calico-ipip-server4
clab-calico-ipip-server1
clab-calico-ipip-server3
clab-calico-ipip-gw0
calico-ipip-worker
calico-ipip-worker2
calico-ipip-control-plane
calico-ipip-worker3
在主机上看到创建的 br-pool0-net0 网卡与 containerlab 创建的容器中 net0 网卡对应。在 kind 生成的 docker 容器中也能看到相同的网卡,说明已经共享了同一个网络空间:
root@network-demo:~# ip -d link show br-pool0-net0
198: br-pool0-net0@if197: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-pool0 state UP mode DEFAULT group default link/ether aa:c1:ab:1c:c9:1c brd ff:ff:ff:ff:ff:ff link-netns clab-calico-ipip-server1 promiscuity 1 allmulti 1 minmtu 68 maxmtu 65535 veth bridge_slave state forwarding priority 32 cost 2 hairpin off guard off root_block off fastleave off learning on flood on port_id 0x8001 port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge 8000.c6:58:98:9d:5f:ea designated_root 8000.c6:58:98:9d:5f:ea hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on bcast_flood on mcast_to_unicast off neigh_suppress off group_fwd_mask 0 group_fwd_mask_str 0x0 vlan_tunnel off isolated off locked off addrgenmode eui64 numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536root@network-demo:~# docker exec -it clab-calico-ipip-server1 ip -d link show net0
197: net0@if198: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether aa:c1:ab:bd:45:17 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 65535 veth addrgenmode eui64 numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 root@network-demo:~# docker exec -it calico-ipip-control-plane ip -d link show net0
197: net0@if198: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether aa:c1:ab:bd:45:17 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 65535 veth addrgenmode eui64 numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535
root@network-demo:~# kubectl get pods -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system calico-kube-controllers 1/1 Running 0 16m 10.244.51.196 calico-ipip-control-plane
kube-system calico-node-64f6p 1/1 Running 0 16m 10.1.5.10 calico-ipip-control-plane
kube-system calico-node-p4ks7 1/1 Running 0 16m 10.1.5.11 calico-ipip-worker
kube-system calico-node-pjbc7 1/1 Running 0 16m 10.1.8.11 calico-ipip-worker3
kube-system calico-node-r6rk2 1/1 Running 0 16m 10.1.8.10 calico-ipip-worker2
kube-system coredns-5d78c9869d-jx4lx 1/1 Running 0 17m 10.244.51.194 calico-ipip-control-plane
kube-system coredns-5d78c9869d-mrf2d 1/1 Running 0 17m 10.244.51.195 calico-ipip-control-plane
kube-system etcd-calico-ipip 1/1 Running 0 17m 10.1.5.10 calico-ipip-control-plane
kube-system kube-apiserver-calico-ipip 1/1 Running 0 17m 10.1.5.10 calico-ipip-control-plane
kube-system kube-controller-manager-calico-ipip 1/1 Running 0 17m 10.1.5.10 calico-ipip-control-plane
kube-system kube-proxy-4svbw 1/1 Running 0 17m 10.1.8.10 calico-ipip-worker2
kube-system kube-proxy-4zw9q 1/1 Running 0 17m 10.1.5.10 calico-ipip-control-plane
kube-system kube-proxy-5nnfn 1/1 Running 0 17m 10.1.8.11 calico-ipip-worker3
kube-system kube-proxy-b69xp 1/1 Running 0 17m 10.1.5.11 calico-ipip-worker
kube-system kube-scheduler-calico-ipip 1/1 Running 0 17m 10.1.5.10 calico-ipip-control-planeroot@network-demo:~# kubectl describe pods -n kube-system calico-node-64f6p | grep 'CALICO_IPV4POOL'CALICO_IPV4POOL_IPIP: AlwaysCALICO_IPV4POOL_VXLAN: Neverroot@network-demo:~# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP
calico-ipip-control-plane Ready control-plane 19m v1.27.3 10.1.5.10
calico-ipip-worker Ready <none> 19m v1.27.3 10.1.5.11
calico-ipip-worker2 Ready <none> 19m v1.27.3 10.1.8.10
calico-ipip-worker3 Ready <none> 19m v1.27.3 10.1.8.11
2.查询 IPIP CrossSubnet 部署结果
root@network-demo:~# docker ps --format '{{.Names}}'
clab-calico-ipip-crosssubnet-server2
clab-calico-ipip-crosssubnet-server3
clab-calico-ipip-crosssubnet-server1
clab-calico-ipip-crosssubnet-server4
clab-calico-ipip-crosssubnet-gw0
calico-ipip-crosssubnet-control-plane
calico-ipip-crosssubnet-worker
calico-ipip-crosssubnet-worker2
calico-ipip-crosssubnet-worker3
root@network-demo:~# kubectl get pods -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
default pod-0 1/1 Running 0 29s 10.244.85.129 calico-ipip-crosssubnet-worker
default pod-1 1/1 Running 0 22s 10.244.241.130 calico-ipip-crosssubnet-worker3
default pod-2 1/1 Running 0 16s 10.244.193.197 calico-ipip-crosssubnet-worker2
default pod-3 1/1 Running 0 10s 10.244.81.1 calico-ipip-crosssubnet-control-plane
kube-system calico-kube-controllers-7bdccfc7d8-lgmf8 1/1 Running 0 33m 10.244.193.195 calico-ipip-crosssubnet-worker2
kube-system calico-node-b22wn 1/1 Running 0 33m 10.1.8.11 calico-ipip-crosssubnet-worker3
kube-system calico-node-h7tds 1/1 Running 0 33m 10.1.5.11 calico-ipip-crosssubnet-worker
kube-system calico-node-tthgb 1/1 Running 0 33m 10.1.8.10 calico-ipip-crosssubnet-worker2
kube-system calico-node-wf2g8 1/1 Running 0 33m 10.1.5.10 calico-ipip-crosssubnet-control-plane
kube-system coredns-5d78c9869d-26vp9 1/1 Running 0 33m 10.244.193.194 calico-ipip-crosssubnet-worker2
kube-system coredns-5d78c9869d-qd44j 1/1 Running 0 33m 10.244.193.193 calico-ipip-crosssubnet-worker2
kube-system etcd-calico-ipip-crosssubnet 1/1 Running 0 33m 10.1.5.10 calico-ipip-crosssubnet-control-plane
kube-system kube-apiserver-calico-ipip-crosssubnet 1/1 Running 0 33m 10.1.5.10 calico-ipip-crosssubnet-control-plane
kube-system kube-controller-manager-calico-ipip-crosssubnet 1/1 Running 0 33m 10.1.5.10 calico-ipip-crosssubnet-control-plane
kube-system kube-proxy-4rkfq 1/1 Running 0 33m 10.1.5.11 calico-ipip-crosssubnet-worker
kube-system kube-proxy-5xblr 1/1 Running 0 33m 10.1.5.10 calico-ipip-crosssubnet-control-plane
kube-system kube-proxy-j7cfk 1/1 Running 0 33m 10.1.8.10 calico-ipip-crosssubnet-worker2
kube-system kube-proxy-tlj5m 1/1 Running 0 33m 10.1.8.11 calico-ipip-crosssubnet-worker3
kube-system kube-scheduler-calico-ipip-crosssubnet 1/1 Running 0 33m 10.1.5.10 calico-ipip-crosssubnet-control-planeroot@network-demo:~# kubectl describe pods -n kube-system calico-node-wf2g8 | grep 'CALICO_IPV4POOL'CALICO_IPV4POOL_IPIP: CrossSubnetCALICO_IPV4POOL_VXLAN: Neverroot@network-demo:~# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP
calico-ipip-crosssubnet-control-plane Ready control-plane 32m v1.27.3 10.1.5.10
calico-ipip-crosssubnet-worker Ready <none> 32m v1.27.3 10.1.5.11
calico-ipip-crosssubnet-worker2 Ready <none> 32m v1.27.3 10.1.8.10
calico-ipip-crosssubnet-worker3 Ready <none> 32m v1.27.3 10.1.8.11
验证效果
1.验证 IPIP 默认模式效果
具体逻辑细节请看 Calico IPIP 文章,里面详细讲了 BGP、路由表走向。本文仅作两种模式差异点对比
1.1.跨子网 Pod 请求验证
1.1.1.查询 control-plane 主机路由表
root@network-demo:~# docker exec -it calico-ipip-control-plane ip route show
default via 10.1.5.1 dev net0
10.1.5.0/24 dev net0 proto kernel scope link src 10.1.5.10
blackhole 10.244.51.192/26 proto bird
10.244.51.193 dev calid7e32e8230e scope link
10.244.51.194 dev calie67bc01f3de scope link
10.244.51.195 dev cali6f867153050 scope link
10.244.51.196 dev cali5d8decaab2b scope link
10.244.51.197 dev cali87081bf6f89 scope link
10.244.54.128/26 via 10.1.8.11 dev tunl0 proto bird onlink
10.244.79.0/26 via 10.1.5.11 dev tunl0 proto bird onlink
10.244.244.64/26 via 10.1.8.10 dev tunl0 proto bird onlink
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.3root@network-demo:~# docker exec -it calico-ipip-control-plane ip route show proto bird
blackhole 10.244.51.192/26
10.244.54.128/26 via 10.1.8.11 dev tunl0 onlink
10.244.79.0/26 via 10.1.5.11 dev tunl0 onlink
10.244.244.64/26 via 10.1.8.10 dev tunl0 onlinkroot@network-demo:~# docker exec -it calico-ipip-control-plane ip neighbor show
10.244.51.194 dev calie67bc01f3de lladdr b2:df:0d:1f:68:0f REACHABLE
172.18.0.4 dev eth0 lladdr 62:fe:7e:39:f7:13 REACHABLE
10.244.51.195 dev cali6f867153050 lladdr 72:50:a4:df:7e:08 REACHABLE
172.18.0.1 dev eth0 lladdr d2:6a:15:c7:e3:41 STALE
10.244.51.196 dev cali5d8decaab2b lladdr 06:11:33:a2:c0:b6 REACHABLE
10.1.5.1 dev net0 lladdr aa:c1:ab:eb:cb:6f REACHABLE
10.244.51.193 dev calid7e32e8230e lladdr 8a:9c:24:95:38:db REACHABLE
172.18.0.2 dev eth0 lladdr ee:f7:6a:f4:71:dd REACHABLE
10.244.51.197 dev cali87081bf6f89 lladdr c2:7f:e0:da:10:e1 STALE
10.1.5.11 dev net0 lladdr aa:c1:ab:2a:5a:0c REACHABLE
172.18.0.5 dev eth0 lladdr 32:a4:f7:ab:a8:9d REACHABLE
172:18:0:1::2 dev eth0 lladdr ee:f7:6a:f4:71:dd REACHABLE
fe80::60fe:7eff:fe39:f713 dev eth0 lladdr 62:fe:7e:39:f7:13 STALE
172:18:0:1::4 dev eth0 lladdr 62:fe:7e:39:f7:13 REACHABLE
fe80::30a4:f7ff:feab:a89d dev eth0 lladdr 32:a4:f7:ab:a8:9d STALE
172:18:0:1::5 dev eth0 lladdr 32:a4:f7:ab:a8:9d REACHABLE
1.1.2.跨子网 Pod 请求抓包
control 节点 10.1.5.x 网段 Pod 请求 worker2 节点 10.1.8.x Pod:
root@network-demo:~# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
pod-0 1/1 Running 0 9m10s 10.244.79.1 calico-ipip-worker
pod-1 1/1 Running 0 9m3s 10.244.54.129 calico-ipip-worker3
pod-2 1/1 Running 0 8m54s 10.244.244.65 calico-ipip-worker2
pod-3 1/1 Running 0 8m46s 10.244.51.197 calico-ipip-control-plane
root@network-demo:~# kubectl exec -it pod-3 -- curl -s 10.244.244.65
PodName: pod-2 | PodIP: eth0 10.244.244.65/32
按照路由表规则,流程大致如下:
- 请求 10.244.244.65 后,当路由来到 Client Node 主机时匹配
10.244.244.64/26 via 10.1.8.10 dev tunl0 proto bird onlink路由; - 内核把报文交给 tunl0 设备后进行 IPIP 封装后,进行下面的路由查询;
- 将 dst ip 设置为 via 10.1.8.10,而发给 10.1.8.10 需要走
default via 10.1.5.1 dev net0这条路由; - 走 via 10.1.5.1 时匹配到
10.1.5.0/24 dev net0 proto kernel scope link src 10.1.5.10这条路由; - 因为设置了 scope link 直连,src 10.1.5.10 dev net0 查询 APR 表:10.1.5.1 aa:c1🆎eb:cb:6f 后发至网关。
root@network-demo:~# docker exec -it calico-ipip-control-plane tcpdump -pnei net016:22:36.035362 aa:c1:ab:bd:45:17 > aa:c1:ab:eb:cb:6f, ethertype IPv4 (0x0800), length 94: 10.1.5.10 > 10.1.8.10: 10.244.51.197.60936 > 10.244.244.65.80: Flags [S], seq 4172879107, win 64800, options [mss 1440,sackOK,TS val 1222065392 ecr 0,nop,wscale 7], length 0
16:22:36.035506 aa:c1:ab:eb:cb:6f > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 94: 10.1.8.10 > 10.1.5.10: 10.244.244.65.80 > 10.244.51.197.60936: Flags [S.], seq 3646446642, ack 4172879108, win 64260, options [mss 1440,sackOK,TS val 2658799917 ecr 1222065392,nop,wscale 7], length 0
16:22:36.035539 aa:c1:ab:bd:45:17 > aa:c1:ab:eb:cb:6f, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.51.197.60936 > 10.244.244.65.80: Flags [.], ack 1, win 507, options [nop,nop,TS val 1222065392 ecr 2658799917], length 0
16:22:36.035607 aa:c1:ab:bd:45:17 > aa:c1:ab:eb:cb:6f, ethertype IPv4 (0x0800), length 163: 10.1.5.10 > 10.1.8.10: 10.244.51.197.60936 > 10.244.244.65.80: Flags [P.], seq 1:78, ack 1, win 507, options [nop,nop,TS val 1222065392 ecr 2658799917], length 77: HTTP: GET / HTTP/1.1
16:22:36.035646 aa:c1:ab:eb:cb:6f > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 86: 10.1.8.10 > 10.1.5.10: 10.244.244.65.80 > 10.244.51.197.60936: Flags [.], ack 78, win 502, options [nop,nop,TS val 2658799917 ecr 1222065392], length 0
16:22:36.035764 aa:c1:ab:eb:cb:6f > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 322: 10.1.8.10 > 10.1.5.10: 10.244.244.65.80 > 10.244.51.197.60936: Flags [P.], seq 1:237, ack 78, win 502, options [nop,nop,TS val 2658799917 ecr 1222065392], length 236: HTTP: HTTP/1.1 200 OK
16:22:36.035817 aa:c1:ab:bd:45:17 > aa:c1:ab:eb:cb:6f, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.51.197.60936 > 10.244.244.65.80: Flags [.], ack 237, win 506, options [nop,nop,TS val 1222065392 ecr 2658799917], length 0
16:22:36.035867 aa:c1:ab:eb:cb:6f > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 132: 10.1.8.10 > 10.1.5.10: 10.244.244.65.80 > 10.244.51.197.60936: Flags [P.], seq 237:283, ack 78, win 502, options [nop,nop,TS val 2658799917 ecr 1222065392], length 46: HTTP
16:22:36.035887 aa:c1:ab:bd:45:17 > aa:c1:ab:eb:cb:6f, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.51.197.60936 > 10.244.244.65.80: Flags [.], ack 283, win 506, options [nop,nop,TS val 1222065392 ecr 2658799917], length 0
16:22:36.035983 aa:c1:ab:bd:45:17 > aa:c1:ab:eb:cb:6f, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.51.197.60936 > 10.244.244.65.80: Flags [F.], seq 78, ack 283, win 506, options [nop,nop,TS val 1222065392 ecr 2658799917], length 0
16:22:36.036057 aa:c1:ab:eb:cb:6f > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 86: 10.1.8.10 > 10.1.5.10: 10.244.244.65.80 > 10.244.51.197.60936: Flags [F.], seq 283, ack 79, win 502, options [nop,nop,TS val 2658799917 ecr 1222065392], length 0
16:22:36.036096 aa:c1:ab:bd:45:17 > aa:c1:ab:eb:cb:6f, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.51.197.60936 > 10.244.244.65.80: Flags [.], ack 284, win 506, options [nop,nop,TS val 1222065393 ecr 2658799917], length 0

1.2.同子网 Pod 请求验证
1.2.1.查询 control-plane 主机路由表
详见:1.1.1.查询 control-plane 主机路由表,不再重复。
1.2.2.同子网 Pod 请求抓包
control 节点 10.1.5.x 网段 Pod 请求 worker 节点 10.1.5.x Pod:
root@network-demo:~# kubectl exec -it pod-3 -- curl -s 10.244.79.1
PodName: pod-0 | PodIP: eth0 10.244.79.1/32
按照路由表规则,流程大致如下:
- 请求 10.244.79.1 后,当路由来到 Client Node 主机时匹配
10.244.79.0/26 via 10.1.5.11 dev tunl0 proto bird onlink路由; - 内核把报文交给 tunl0 设备后进行 IPIP 封装后,进行下面的路由查询;
- 将 dst ip 设置为 via 10.1.5.11,需要走
10.1.5.0/24 dev net0 proto kernel scope link src 10.1.5.10这条路由; - 因为设置了 scope link 直连,查 ARP 表找到 10.1.5.11 的 MAC aa:c1🆎2a:5a:0c,直接从 net0 发给 worker,不需要经过网关。
root@network-demo:~# docker exec -it calico-ipip-control-plane tcpdump -pnei net017:02:39.493480 aa:c1:ab:bd:45:17 > aa:c1:ab:2a:5a:0c, ethertype IPv4 (0x0800), length 94: 10.1.5.10 > 10.1.5.11: 10.244.51.197.45792 > 10.244.79.1.80: Flags [S], seq 3200333625, win 64800, options [mss 1440,sackOK,TS val 2011167947 ecr 0,nop,wscale 7], length 0
17:02:39.493608 aa:c1:ab:2a:5a:0c > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 94: 10.1.5.11 > 10.1.5.10: 10.244.79.1.80 > 10.244.51.197.45792: Flags [S.], seq 3446311928, ack 3200333626, win 64260, options [mss 1440,sackOK,TS val 2306157208 ecr 2011167947,nop,wscale 7], length 0
17:02:39.493650 aa:c1:ab:bd:45:17 > aa:c1:ab:2a:5a:0c, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.5.11: 10.244.51.197.45792 > 10.244.79.1.80: Flags [.], ack 1, win 507, options [nop,nop,TS val 2011167947 ecr 2306157208], length 0
17:02:39.493741 aa:c1:ab:bd:45:17 > aa:c1:ab:2a:5a:0c, ethertype IPv4 (0x0800), length 161: 10.1.5.10 > 10.1.5.11: 10.244.51.197.45792 > 10.244.79.1.80: Flags [P.], seq 1:76, ack 1, win 507, options [nop,nop,TS val 2011167947 ecr 2306157208], length 75: HTTP: GET / HTTP/1.1
17:02:39.493790 aa:c1:ab:2a:5a:0c > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 86: 10.1.5.11 > 10.1.5.10: 10.244.79.1.80 > 10.244.51.197.45792: Flags [.], ack 76, win 502, options [nop,nop,TS val 2306157208 ecr 2011167947], length 0
17:02:39.493900 aa:c1:ab:2a:5a:0c > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 322: 10.1.5.11 > 10.1.5.10: 10.244.79.1.80 > 10.244.51.197.45792: Flags [P.], seq 1:237, ack 76, win 502, options [nop,nop,TS val 2306157208 ecr 2011167947], length 236: HTTP: HTTP/1.1 200 OK
17:02:39.493957 aa:c1:ab:bd:45:17 > aa:c1:ab:2a:5a:0c, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.5.11: 10.244.51.197.45792 > 10.244.79.1.80: Flags [.], ack 237, win 506, options [nop,nop,TS val 2011167947 ecr 2306157208], length 0
17:02:39.494011 aa:c1:ab:2a:5a:0c > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 130: 10.1.5.11 > 10.1.5.10: 10.244.79.1.80 > 10.244.51.197.45792: Flags [P.], seq 237:281, ack 76, win 502, options [nop,nop,TS val 2306157208 ecr 2011167947], length 44: HTTP
17:02:39.494033 aa:c1:ab:bd:45:17 > aa:c1:ab:2a:5a:0c, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.5.11: 10.244.51.197.45792 > 10.244.79.1.80: Flags [.], ack 281, win 506, options [nop,nop,TS val 2011167947 ecr 2306157208], length 0
17:02:39.494160 aa:c1:ab:bd:45:17 > aa:c1:ab:2a:5a:0c, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.5.11: 10.244.51.197.45792 > 10.244.79.1.80: Flags [F.], seq 76, ack 281, win 506, options [nop,nop,TS val 2011167948 ecr 2306157208], length 0
17:02:39.494275 aa:c1:ab:2a:5a:0c > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 86: 10.1.5.11 > 10.1.5.10: 10.244.79.1.80 > 10.244.51.197.45792: Flags [F.], seq 281, ack 77, win 502, options [nop,nop,TS val 2306157209 ecr 2011167948], length 0
17:02:39.494324 aa:c1:ab:bd:45:17 > aa:c1:ab:2a:5a:0c, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.5.11: 10.244.51.197.45792 > 10.244.79.1.80: Flags [.], ack 282, win 506, options [nop,nop,TS val 2011167948 ecr 2306157209], length 0

2.验证 IPIP CrossSubnet 模式效果
2.1.跨子网 Pod 请求验证
2.1.1.查询 control-plane 主机路由表
root@network-demo:~# docker exec -it calico-ipip-crosssubnet-control-plane ip route show
default via 10.1.5.1 dev net0
10.1.5.0/24 dev net0 proto kernel scope link src 10.1.5.10
blackhole 10.244.81.0/26 proto bird
10.244.81.1 dev cali87081bf6f89 scope link
10.244.85.128/26 via 10.1.5.11 dev net0 proto bird
10.244.193.192/26 via 10.1.8.10 dev tunl0 proto bird onlink
10.244.241.128/26 via 10.1.8.11 dev tunl0 proto bird onlink
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.3root@network-demo:~# docker exec -it calico-ipip-crosssubnet-control-plane ip route show proto bird
blackhole 10.244.81.0/26
10.244.85.128/26 via 10.1.5.11 dev net0
10.244.193.192/26 via 10.1.8.10 dev tunl0 onlink
10.244.241.128/26 via 10.1.8.11 dev tunl0 onlinkroot@network-demo:~# docker exec -it calico-ipip-crosssubnet-control-plane ip neighbor show
10.244.81.1 dev cali87081bf6f89 lladdr c6:27:94:49:93:c3 STALE
172.18.0.1 dev eth0 lladdr d2:6a:15:c7:e3:41 STALE
172.18.0.4 dev eth0 lladdr 82:92:99:ed:bf:60 REACHABLE
10.1.5.11 dev net0 lladdr aa:c1:ab:91:69:5b STALE
10.1.5.1 dev net0 lladdr aa:c1:ab:8f:b5:3b REACHABLE
172.18.0.2 dev eth0 lladdr aa:7e:87:80:90:17 REACHABLE
172.18.0.5 dev eth0 lladdr 16:c2:d8:16:24:e5 REACHABLE
fe80::8092:99ff:feed:bf60 dev eth0 lladdr 82:92:99:ed:bf:60 STALE
172:18:0:1::4 dev eth0 lladdr 82:92:99:ed:bf:60 REACHABLE
fe80::14c2:d8ff:fe16:24e5 dev eth0 lladdr 16:c2:d8:16:24:e5 STALE
172:18:0:1::5 dev eth0 lladdr 16:c2:d8:16:24:e5 REACHABLE
fe80::a87e:87ff:fe80:9017 dev eth0 lladdr aa:7e:87:80:90:17 STALE
172:18:0:1::2 dev eth0 lladdr aa:7e:87:80:90:17 REACHABLE
2.1.2.跨子网 Pod 请求抓包
control 节点 10.1.5.x 网段 Pod 请求 worker2 节点 10.1.8.x Pod:
root@network-demo:~# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
pod-0 1/1 Running 0 3m59s 10.244.85.129 calico-ipip-crosssubnet-worker
pod-1 1/1 Running 0 3m52s 10.244.241.130 calico-ipip-crosssubnet-worker3
pod-2 1/1 Running 0 3m46s 10.244.193.197 calico-ipip-crosssubnet-worker2
pod-3 1/1 Running 0 3m40s 10.244.81.1 calico-ipip-crosssubnet-control-plane
root@network-demo:~# kubectl exec -it pod-3 -- curl -s 10.244.193.197
PodName: pod-2 | PodIP: eth0 10.244.193.197/32
按照路由表规则,流程大致如下:
- 请求 10.244.193.197 后,当路由来到 Client Node 主机时匹配
10.244.193.192/26 via 10.1.8.10 dev tunl0 proto bird onlink路由; - 内核把报文交给 tunl0 设备后进行 IPIP 封装后,进行下面的路由查询;
- 将 dst ip 设置为 via 10.1.8.10,而发给 10.1.8.10 需要走
default via 10.1.5.1 dev net0这条路由; - 走 via 10.1.5.1 时匹配到
10.1.5.0/24 dev net0 proto kernel scope link src 10.1.5.10这条路由; - 因为设置了 scope link 直连,src 10.1.5.10 dev net0 查询 APR 表:10.1.5.1 aa:c1🆎8f:b5:3b 后发至网关。
root@network-demo:~# docker exec -it calico-ipip-crosssubnet-control-plane tcpdump -pnei net014:10:00.102447 aa:c1:ab:22:9e:a1 > aa:c1:ab:8f:b5:3b, ethertype IPv4 (0x0800), length 94: 10.1.5.10 > 10.1.8.10: 10.244.81.1.44624 > 10.244.193.197.80: Flags [S], seq 3233989932, win 64800, options [mss 1440,sackOK,TS val 128566485 ecr 0,nop,wscale 7], length 0
14:10:00.102586 aa:c1:ab:8f:b5:3b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 94: 10.1.8.10 > 10.1.5.10: 10.244.193.197.80 > 10.244.81.1.44624: Flags [S.], seq 2286706233, ack 3233989933, win 64260, options [mss 1440,sackOK,TS val 4272961461 ecr 128566485,nop,wscale 7], length 0
14:10:00.102617 aa:c1:ab:22:9e:a1 > aa:c1:ab:8f:b5:3b, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.81.1.44624 > 10.244.193.197.80: Flags [.], ack 1, win 507, options [nop,nop,TS val 128566485 ecr 4272961461], length 0
14:10:00.102698 aa:c1:ab:22:9e:a1 > aa:c1:ab:8f:b5:3b, ethertype IPv4 (0x0800), length 164: 10.1.5.10 > 10.1.8.10: 10.244.81.1.44624 > 10.244.193.197.80: Flags [P.], seq 1:79, ack 1, win 507, options [nop,nop,TS val 128566485 ecr 4272961461], length 78: HTTP: GET / HTTP/1.1
14:10:00.102747 aa:c1:ab:8f:b5:3b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 86: 10.1.8.10 > 10.1.5.10: 10.244.193.197.80 > 10.244.81.1.44624: Flags [.], ack 79, win 502, options [nop,nop,TS val 4272961461 ecr 128566485], length 0
14:10:00.102828 aa:c1:ab:8f:b5:3b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 322: 10.1.8.10 > 10.1.5.10: 10.244.193.197.80 > 10.244.81.1.44624: Flags [P.], seq 1:237, ack 79, win 502, options [nop,nop,TS val 4272961461 ecr 128566485], length 236: HTTP: HTTP/1.1 200 OK
14:10:00.102866 aa:c1:ab:22:9e:a1 > aa:c1:ab:8f:b5:3b, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.81.1.44624 > 10.244.193.197.80: Flags [.], ack 237, win 506, options [nop,nop,TS val 128566485 ecr 4272961461], length 0
14:10:00.102929 aa:c1:ab:8f:b5:3b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 133: 10.1.8.10 > 10.1.5.10: 10.244.193.197.80 > 10.244.81.1.44624: Flags [P.], seq 237:284, ack 79, win 502, options [nop,nop,TS val 4272961461 ecr 128566485], length 47: HTTP
14:10:00.102959 aa:c1:ab:22:9e:a1 > aa:c1:ab:8f:b5:3b, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.81.1.44624 > 10.244.193.197.80: Flags [.], ack 284, win 506, options [nop,nop,TS val 128566485 ecr 4272961461], length 0
14:10:00.103171 aa:c1:ab:22:9e:a1 > aa:c1:ab:8f:b5:3b, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.81.1.44624 > 10.244.193.197.80: Flags [F.], seq 79, ack 284, win 506, options [nop,nop,TS val 128566486 ecr 4272961461], length 0
14:10:00.103349 aa:c1:ab:8f:b5:3b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 86: 10.1.8.10 > 10.1.5.10: 10.244.193.197.80 > 10.244.81.1.44624: Flags [F.], seq 284, ack 80, win 502, options [nop,nop,TS val 4272961462 ecr 128566486], length 0
14:10:00.103404 aa:c1:ab:22:9e:a1 > aa:c1:ab:8f:b5:3b, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.81.1.44624 > 10.244.193.197.80: Flags [.], ack 285, win 506, options [nop,nop,TS val 128566486 ecr 4272961462], length 0

2.2.同子网 Pod 请求验证
2.2.1.查询 control-plane 主机路由表
详见:2.1.1.查询 control-plane 主机路由表,不再重复。
2.2.2.同子网 Pod 请求抓包
control 节点 10.1.5.x 网段 Pod 请求 worker 节点 10.1.5.x Pod:
root@network-demo:~# kubectl exec -it pod-3 -- curl -s 10.244.85.129
PodName: pod-0 | PodIP: eth0 10.244.85.129/32
- 请求同子网 Pod 10.244.85.129,匹配路由
10.244.85.128/26 via 10.1.5.11 dev net0 proto bird,注意这里是 dev net0,不是 tunl0,所以不会进行 IPIP 封装; - 下一跳 10.1.5.11 在同网段,匹配路由
10.1.5.0/24 dev net0 proto kernel scope link src 10.1.5.10; - scope link 直连,查 ARP 表:10.1.5.11 dev net0 lladdr aa:c1🆎91:69:5b REACHABLE;
- 查到的 dst mac 是 Server Node net0 地址,通过本机 net0 发过去。
root@network-demo:~# docker exec -it calico-ipip-crosssubnet-control-plane tcpdump -pnei net014:45:28.324182 aa:c1:ab:22:9e:a1 > aa:c1:ab:91:69:5b, ethertype IPv4 (0x0800), length 74: 10.244.81.1.47978 > 10.244.85.129.80: Flags [S], seq 980755404, win 64800, options [mss 1440,sackOK,TS val 3053371879 ecr 0,nop,wscale 7], length 0
14:45:28.324276 aa:c1:ab:91:69:5b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 74: 10.244.85.129.80 > 10.244.81.1.47978: Flags [S.], seq 295421793, ack 980755405, win 64260, options [mss 1440,sackOK,TS val 1697046978 ecr 3053371879,nop,wscale 7], length 0
14:45:28.324297 aa:c1:ab:22:9e:a1 > aa:c1:ab:91:69:5b, ethertype IPv4 (0x0800), length 66: 10.244.81.1.47978 > 10.244.85.129.80: Flags [.], ack 1, win 507, options [nop,nop,TS val 3053371879 ecr 1697046978], length 0
14:45:28.324355 aa:c1:ab:22:9e:a1 > aa:c1:ab:91:69:5b, ethertype IPv4 (0x0800), length 143: 10.244.81.1.47978 > 10.244.85.129.80: Flags [P.], seq 1:78, ack 1, win 507, options [nop,nop,TS val 3053371879 ecr 1697046978], length 77: HTTP: GET / HTTP/1.1
14:45:28.324376 aa:c1:ab:91:69:5b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 66: 10.244.85.129.80 > 10.244.81.1.47978: Flags [.], ack 78, win 502, options [nop,nop,TS val 1697046978 ecr 3053371879], length 0
14:45:28.324474 aa:c1:ab:91:69:5b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 302: 10.244.85.129.80 > 10.244.81.1.47978: Flags [P.], seq 1:237, ack 78, win 502, options [nop,nop,TS val 1697046978 ecr 3053371879], length 236: HTTP: HTTP/1.1 200 OK
14:45:28.324508 aa:c1:ab:22:9e:a1 > aa:c1:ab:91:69:5b, ethertype IPv4 (0x0800), length 66: 10.244.81.1.47978 > 10.244.85.129.80: Flags [.], ack 237, win 506, options [nop,nop,TS val 3053371879 ecr 1697046978], length 0
14:45:28.324541 aa:c1:ab:91:69:5b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 112: 10.244.85.129.80 > 10.244.81.1.47978: Flags [P.], seq 237:283, ack 78, win 502, options [nop,nop,TS val 1697046978 ecr 3053371879], length 46: HTTP
14:45:28.324554 aa:c1:ab:22:9e:a1 > aa:c1:ab:91:69:5b, ethertype IPv4 (0x0800), length 66: 10.244.81.1.47978 > 10.244.85.129.80: Flags [.], ack 283, win 506, options [nop,nop,TS val 3053371879 ecr 1697046978], length 0
14:45:28.324652 aa:c1:ab:22:9e:a1 > aa:c1:ab:91:69:5b, ethertype IPv4 (0x0800), length 66: 10.244.81.1.47978 > 10.244.85.129.80: Flags [F.], seq 78, ack 283, win 506, options [nop,nop,TS val 3053371879 ecr 1697046978], length 0
14:45:28.324741 aa:c1:ab:91:69:5b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 66: 10.244.85.129.80 > 10.244.81.1.47978: Flags [F.], seq 283, ack 79, win 502, options [nop,nop,TS val 1697046978 ecr 3053371879], length 0
14:45:28.324771 aa:c1:ab:22:9e:a1 > aa:c1:ab:91:69:5b, ethertype IPv4 (0x0800), length 66: 10.244.81.1.47978 > 10.244.85.129.80: Flags [.], ack 284, win 506, options [nop,nop,TS val 3053371879 ecr 1697046978], length 0

