一、节点宕机快速恢复服务¶
说明:当节点故障时,k8s集群中默认需要等待5分钟,才能进行漂移。
1.1 环境准备¶
1、清除node02节点上的污点
[root@k8s-master01 ~]# k taint node k8s-node02 ingress-
2、创建测试应用
[root@k8s-master01 ~]# vim test-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-deploy
labels:
app: test-deploy
spec:
replicas: 1
selector:
matchLabels:
app: test-deploy
template:
metadata:
labels:
app: test-deploy
spec:
containers:
- name: nginx
image: registry.cn-hangzhou.aliyuncs.com/zq-demo/nginx:1.14.2
应用
[root@k8s-master01 ~]# kaf test-deploy.yaml
1.2 节点宕机快速恢复服务¶
1、查看部署在node02节点上测试应用
[root@k8s-master01 ~]# kgp -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-deploy-77d76d744-pg948 1/1 Running 0 2m58s 192.168.58.195 k8s-node02 <none> <none>
2、将node02主机关机,来模拟node02主机故障
[root@k8s-node02 ~]# shutdown -h now
3、在master节点上查看节点状态变为Not Ready
[root@k8s-master01 ~]# kg node | grep node02
k8s-node02 NotReady <none> 10d v1.32.3
此时再查看node02节点上pod,观察到虽然主机挂了,但是node02节点上的pod仍然存在,这是因为系统设置的故障等待时间为5分钟
[root@k8s-master01 ~]# kgp -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-deploy-77d76d744-pg948 1/1 Running 0 30m 192.168.58.195 k8s-node02 <none> <none>
4、将node02节点开机复原,观察到节点状态变为Ready
[root@k8s-master01 ~]# kg node | grep node02
k8s-node02 Ready <none> 10d v1.32.3
5、将node02节点上一个测试应用添加tolerationSeconds参数设置宽限期为10s
添加如下配置信息
tolerations:
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 10
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 10
完整配置文件
[root@k8s-master01 ~]# vim test-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-deploy
labels:
app: test-deploy
spec:
replicas: 1
selector:
matchLabels:
app: test-deploy
template:
metadata:
labels:
app: test-deploy
spec:
tolerations:
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 10
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 10
containers:
- name: nginx
image: registry.cn-hangzhou.aliyuncs.com/zq-demo/nginx:1.14.2
重新应用
[root@k8s-master01 ~]# kaf test-deploy.yaml
6、再次模拟节点宕机
[root@k8s-node02 ~]# shutdown -h now
查看节点,观察状态变为NotReady
[root@k8s-master01 ~]# kg node | grep node02
k8s-node02 NotReady <none> 10d v1.32.3
查看测试应用,观察新的pod到飘到node01节点,旧的应用已实现了删除
[root@k8s-master01 ~]# kgp -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-deploy-67df4dfb6b-dkx22 1/1 Running 0 3m1s 192.168.85.217 k8s-node01 <none> <none>
test-deploy-7bb955dd46-s4f4h 1/1 Terminating 0 6m35s 192.168.58.197 k8s-node02 <none> <none>