etcd数据库备份¶
1、获取二进制etcdctl文件
由于我们是使用kubeadm部署,机器上没有etcdctl命令,所以需要下载个二进制包
(1)先获取对应的版本
[root@master01 ~]# kubectl -n kube-system exec -it $(kubectl get po -n kube-system |grep etcd- |head -1|awk '{print $1}') -- etcd --version
etcd Version: 3.5.6
Git SHA: cecbe35ce
Go Version: go1.16.15
Go OS/Arch: linux/amd64
(2)下载合适的包
[root@master01 ~]# wget https://github.com/etcd-io/etcd/releases/download/v3.5.6/etcd-v3.5.6-linux-amd64.tar.gz
(3)解压至/opt目录下
[root@master01 ~]# tar zxf etcd-v3.5.6-linux-amd64.tar.gz -C /opt/
(4)将可执行文件软链到/bin/下
[root@master01 ~]# ln -s /opt/etcd-v3.5.6-linux-amd64/etcdctl /bin/
(5)验证查看
[root@master01 ~]# etcdctl version
etcdctl version: 3.5.6
API version: 3.5
2、在master01节点上进行备份
[root@master01 ~]# mkdir -p /opt/etcd_backup/
[root@master01 ~]# ETCDCTL_API=3 etcdctl \
snapshot save /opt/etcd_backup/snap-etcd-$(date +%F-%H-%M-%S).db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
如果不是kubeadm形式部署,例如手动部署etcd,并且有自己的ssl证书(假设证书路径/etc/etcd/ssl),则备份命令有所差异:
[root@master01 ~]# mkdir -p /opt/etcd_backup/
[root@master01 ~]# ETCDCTL_API=3 etcdctl \
snapshot save /opt/etcd_backup/snap-etcd-$(date +%F-%H-%M-%S).db \
--endpoints=https://192.168.1.60:2379 \
--cacert=/etc/etcd/ssl/ca.pem \
--cert=/etc/etcd/ssl/server.pem \
--key=/etc/etcd/ssl/server-key.pem
etcd数据库恢复(kubeadm方式)¶
单节点etcd¶
1、创建测试的deployment
[root@master01 ~]# kubectl create deployment testdp2 --image=registry.cn-hangzhou.aliyuncs.com/zq-demo/nginx:1.14.2 --replicas=7
2、在master01节点上进行备份
[root@master01 ~]# mkdir -p /opt/etcd_backup/
[root@master01 ~]# ETCDCTL_API=3 etcdctl \
snapshot save /opt/etcd_backup/snap-etcd-$(date +%F-%H-%M-%S).db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
如果不是kubeadm形式部署,例如手动部署etcd,并且有自己的ssl证书(假设证书路径/etc/etcd/ssl),则备份命令有所差异:
[root@master01 ~]# mkdir -p /opt/etcd_backup/
[root@master01 ~]# ETCDCTL_API=3 etcdctl \
snapshot save /opt/etcd_backup/snap-etcd-$(date +%F-%H-%M-%S).db \
--endpoints=https://192.168.1.60:2379 \
--cacert=/etc/etcd/ssl/ca.pem \
--cert=/etc/etcd/ssl/server.pem \
--key=/etc/etcd/ssl/server-key.pem
3、为了验证效果,可以在恢复之前删除掉测试的deployment
[root@master01 ~]# kubectl delete deploy testdp2
4、停掉kube-apiserver和etcd Pod
[root@master01 ~]# mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests_bak
5、挪走现有etcd相关数据
[root@master01 ~]# mv /var/lib/etcd/ /var/lib/etcd_bak
6、恢复etcd数据,其中/var/lib/etcd/目录会自动生成
[root@master01 ~]# ETCDCTL_API=3 /opt/etcd-v3.5.6-linux-amd64/etcdutl snapshot restore /opt/etcd_backup/snap-etcd-2023-11-02-16-10-48.db --data-dir=/var/lib/etcd
7、启动kube-apiserver和etcd Pod
[root@master01 ~]# mv /etc/kubernetes/manifests_bak /etc/kubernetes/manifests
8、再次查看删除掉的Pod,观察到刚被删除的deployment已经有了
kubectl get po
NAME READY STATUS RESTARTS AGE
testdp2-6d9fbdb8cb-98kfw 1/1 Running 0 27s
testdp2-6d9fbdb8cb-d5rkv 1/1 Running 0 27s
testdp2-6d9fbdb8cb-jzg5m 1/1 Running 0 27s
testdp2-6d9fbdb8cb-pd5nm 1/1 Running 0 27s
testdp2-6d9fbdb8cb-plr8m 1/1 Running 0 27s
testdp2-6d9fbdb8cb-vxq8j 1/1 Running 0 27s
testdp2-6d9fbdb8cb-zcf97 1/1 Running 0 27s
多节点etcd¶
1、创建测试的deployment
[root@master01 ~]# kubectl create deployment testdp2 --image=registry.cn-hangzhou.aliyuncs.com/zq-demo/nginx:1.14.2 --replicas=7
2、在master01节点上进行备份
[root@master01 ~]# mkdir -p /opt/etcd_backup/
[root@master01 ~]# ETCDCTL_API=3 etcdctl \
snapshot save /opt/etcd_backup/snap-etcd-$(date +%F-%H-%M-%S).db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
3、在master01节点上将备份文件拷贝到另外两台机器在master02和master03节点,并且将etcd相关二进制文件也拷贝过去
[root@master01 ~]# scp /opt/etcd_backup/snap-etcd-2023-11-02-17-29-16.db master02:/tmp/
[root@master01 ~]# scp /opt/etcd_backup/snap-etcd-2023-11-02-17-29-16.db master03:/tmp/
[root@master01 ~]# scp -r /opt/etcd-v3.5.6-linux-amd64/ master02:/opt/etcd-v3.5.6-linux-amd64/
[root@master01 ~]# scp -r /opt/etcd-v3.5.6-linux-amd64/ master03:/opt/etcd-v3.5.6-linux-amd64/
4、为了验证效果,可以在恢复之前删除掉测试的deployment
[root@master01 ~]# kubectl delete deploy testdp2
5、三个master节点停掉kube-apiserver和etcd Pod
[root@master01 ~]# mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests_bak
[root@master02 ~]# mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests_bak
[root@master03 ~]# mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests_bak
6、三个master节点挪走现有etcd相关数据
[root@master01 ~]# mv /var/lib/etcd/ /var/lib/etcd_bak
[root@master02 ~]# mv /var/lib/etcd/ /var/lib/etcd_bak
[root@master03 ~]# mv /var/lib/etcd/ /var/lib/etcd_bak
7、三个节点分别恢复etcd相关数据
master01上恢复etcd数据
ETCDCTL_API=3 /opt/etcd-v3.5.6-linux-amd64/etcdutl snapshot restore /opt/etcd_backup/snap-etcd-2023-11-02-17-29-16.db --data-dir=/var/lib/etcd --name master01 --initial-cluster="master01=https://192.168.1.60:2380,master02=https://192.168.1.63:2380,master03=https://192.168.1.64:2380" --initial-advertise-peer-urls="https://192.168.1.60:2380"
master02上恢复etcd数据
ETCDCTL_API=3 /opt/etcd-v3.5.6-linux-amd64/etcdutl snapshot restore /tmp/snap-etcd-2023-11-02-17-29-16.db --data-dir=/var/lib/etcd --name master02 --initial-cluster="master01=https://192.168.1.60:2380,master02=https://192.168.1.63:2380,master03=https://192.168.1.64:2380" --initial-advertise-peer-urls="https://192.168.1.63:2380"
master03上恢复etcd数据
ETCDCTL_API=3 /opt/etcd-v3.5.6-linux-amd64/etcdutl snapshot restore /tmp/snap-etcd-2023-11-02-17-29-16.db --data-dir=/var/lib/etcd --name master03 --initial-cluster="master01=https://192.168.1.60:2380,master02=https://192.168.1.63:2380,master03=https://192.168.1.64:2380" --initial-advertise-peer-urls="https://192.168.1.64:2380"
说明:
关于initial-advertise-peer-urls参数和initial-cluster参数可以通过ps aux | grep etcd | grep -v kube-apiserver命令进行查看
[root@master03 ~]# ps aux | grep etcd | grep -v kube-apiserver
root 1911 3.0 2.5 11284288 101480 ? Ssl 17:23 0:24 etcd --advertise-client-urls=https://192.168.1.64:2379 --cert-file= /etc/kubernetes/pki/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --experimental-initial-corrupt-check=true --expe rimental-watch-progress-notify-interval=5s --initial-advertise-peer-urls=https://192.168.1.64:2380 --initial-cluster=master01=https:/ /192.168.1.60:2380,master02=https://192.168.1.63:2380,master03=https://192.168.1.64:2380 --initial-cluster-state=existing --key-file= /etc/kubernetes/pki/etcd/server.key --listen-client-urls=https://127.0.0.1:2379,https://192.168.1.64:2379 --listen-metrics-urls=http: //127.0.0.1:2381 --listen-peer-urls=https://192.168.1.64:2380 --name=master03 --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt --pe er-client-cert-auth=true --peer-key-file=/etc/kubernetes/pki/etcd/peer.key --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt --s napshot-count=10000 --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
root 11551 0.0 0.0 112828 2152 pts/0 S+ 17:36 0:00 grep --color=auto etcd
8、三个节点分别启动kube-apiserver和etcd Pod
[root@master01 ~]# mv /etc/kubernetes/manifests_bak /etc/kubernetes/manifests
[root@master02 ~]# mv /etc/kubernetes/manifests_bak /etc/kubernetes/manifests
[root@master03 ~]# mv /etc/kubernetes/manifests_bak /etc/kubernetes/manifests
9、验证
[root@master01 ~]# k get po
NAME READY STATUS RESTARTS AGE
testdp2-6d9fbdb8cb-98kfw 1/1 Running 0 22m
testdp2-6d9fbdb8cb-d5rkv 1/1 Running 0 22m
testdp2-6d9fbdb8cb-jzg5m 1/1 Running 0 22m
testdp2-6d9fbdb8cb-pd5nm 1/1 Running 0 22m
testdp2-6d9fbdb8cb-plr8m 1/1 Running 0 22m
testdp2-6d9fbdb8cb-vxq8j 1/1 Running 0 22m
testdp2-6d9fbdb8cb-zcf97 1/1 Running 0 22m