minio集群有纠删码机制,即使在集群数据盘挂掉一半的情况下,集群中数据也是安全的。

但是如果集群想要正常读写,就需要有N/2+1的节点数才可以正常读写

如果现有minio集群有节点出现故障,就需要更换节点

注意事项

  • 如果更换节点旧节点数据量较大,在节点更换时可以正常使用请先备份原有节点数据到新节点,避免同步的数据过多导致网络带宽被占用
  • 如果数据量小,可以不进行备份数据,直接进行更换,节点启动完毕会自动同步数
  • 如果节点挂掉时集群还在读写数据,会导致集群挂掉的节点与其他minio节点数据不同,这里在恢复节点后需修复数据(自动修复,无需人为干预)
  • 最好部署minio集群时使用hosts文件做地址解析,避免更换节点时修改minio配置文件参
  • 更换节点时需要停止minio集群客户端的读
  • 更换的新节点所有配置信息要和旧节点保持一致,包括minio版本,配置文件,hosts解析文件,数据目录位置以及大小

一、节点服务故障重启后自动恢复

如果在写入数据时,节点服务故障,当节点服务启动后,会自动同步数据

范例: 节点服务故障重启后自动恢复

#正在写入数据时,将某个节点服务停止
[root@minio2 ~]#systemctl stop minio

[root@ubuntu2204 ~]#mc admin info minio-cluster minio1.wang.org:9000
   Uptime: 3 minutes
   Version: 2023-10-16T04:13:43Z
   Network: 2/3 OK
   Drives: 4/4 OK
   Pool: 1 minio2.wang.org:9000
   Uptime: offline
   Drives: 0/4 OK
● minio3.wang.org:9000
   Uptime: 14 minutes
   Version: 2023-10-16T04:13:43Z
   Network: 2/3 OK
   Drives: 4/4 OK
   Pool: 1
Pools:
   1st, Erasure sets: 1, Drives per erasure set: 12
3.7 GiB Used, 1 Bucket, 1 Object
1 node offline, 8 drives online, 4 drives offline

#数据写入仍然进行,完成后,可以看到不同节点的数据空间不同
[root@minio1 ~]#df -h /data/
Filesystem                   Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-minio   20G  8.1G   11G  44% /data

[root@minio2 ~]#df -h /data
Filesystem                   Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-minio   20G  2.5G   17G  14% /data

[root@minio3 ~]#df -h /data
Filesystem                   Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-minio   20G   11G  8.1G  57% /data

#模拟磁盘损坏
[root@minio2 ~]#rm -rf /data/minio*
[root@minio2 ~]#mkdir /data/minio{1..4}
[root@minio2 ~]#chown -R minio.minio /data/minio{1..4}

#恢复故障节点的服务
[root@minio2 ~]#systemctl start minio

#多次执行空间查看,可以看到数据在同步中
[root@minio2 ~]#df -h /data/
Filesystem                   Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-minio   20G  3.2G   16G  18% /data

[root@minio2 ~]#df -h /data/
Filesystem                   Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-minio   20G  3.4G   16G  19% /data

[root@minio2 ~]#df -h /data/
Filesystem                   Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-minio   20G  3.9G   15G  21% /data

[root@minio2 ~]#df -h /data/
Filesystem                   Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-minio   20G  8.9G  9.9G  48% /data

#集群状态恢复
[root@ubuntu2204 ~]#mc admin info minio-cluster minio1.wang.org:9000
   Uptime: 11 minutes
   Version: 2023-10-16T04:13:43Z
   Network: 3/3 OK
   Drives: 4/4 OK
   Pool: 1 minio2.wang.org:9000
   Uptime: 5 minutes
   Version: 2023-10-16T04:13:43Z
   Network: 3/3 OK
   Drives: 4/4 OK
   Pool: 1 minio3.wang.org:9000
   Uptime: 22 minutes
   Version: 2023-10-16T04:13:43Z
   Network: 3/3 OK
   Drives: 4/4 OK
   Pool: 1
Pools:
   1st, Erasure sets: 1, Drives per erasure set: 12
13 GiB Used, 1 Bucket, 2 Objects
12 drives online, 0 drives offline

二、节点故障重新安装系统恢复故障

范例: 3节点集群中一个节点彻底故障并重新安装进行恢复

#发现一台节点出故障
[root@ubuntu2204 ~]#mc admin info minio-cluster
● minio1.wang.org:9000
   Uptime: 3 hours
   Version: 2023-10-16T04:13:43Z
   Network: 2/3 OK
   Drives: 4/4 OK
   Pool: 1
● minio2.wang.org:9000  #故障节点
   Uptime: offline
   Drives: 0/4 OK
● minio3.wang.org:9000
   Uptime: 3 hours
   Version: 2023-10-16T04:13:43Z
   Network: 2/3 OK
   Drives: 4/4 OK
   Pool: 1
Pools:
   1st, Erasure sets: 1, Drives per erasure set: 12
2.6 MiB Used, 1 Bucket, 4 Objects
1 node offline, 8 drives online, 4 drives offline

#在所有节点上修改/etc/hosts文件中用新节点的IP替代故障节点的IP
[root@minio1 ~]#vim /etc/hosts
10.0.0.101 minio1.wang.org
10.0.0.104 minio2.wang.org  #原主机名保留,更新节点的IP
10.0.0.103 minio3.wang.org

[root@minio1 ~]#for i in {2..3};do scp /etc/hosts minio$i.wang.org:/etc/;done

#修改反向代理配置,替换故障节点的地址为新节点
过程略

#安装一台新的节点,参考2.3.2.1小节:范例: 二进制安装MinIO 实现3节点4磁盘的分布式集群部署
过程略

#在所有节点上重启服务
[root@minio1 ~]#systemctl restart minio.service

#验证节点恢复
[root@ubuntu2204 ~]#mc admin info minio-cluster
● minio1.wang.org:9000
   Uptime: 9 seconds
   Version: 2023-10-16T04:13:43Z
   Network: 3/3 OK
   Drives: 4/4 OK
   Pool: 1
● minio2.wang.org:9000
   Uptime: 9 seconds
   Version: 2023-10-16T04:13:43Z
   Network: 3/3 OK
   Drives: 4/4 OK
   Pool: 1
● minio3.wang.org:9000
   Uptime: 9 seconds
   Version: 2023-10-16T04:13:43Z
   Network: 3/3 OK
   Drives: 4/4 OK
   Pool: 1
Pools:
   1st, Erasure sets: 1, Drives per erasure set: 12
13 GiB Used, 1 Bucket, 2 Objects
12 drives online, 0 drives offline

#在新节点上发现数据恢复
[root@ubuntu2204-107 ~]#tree /data/
/data/
├── lost+found
├── minio1
│   └── mybucket
│       ├── example-object1.txt
│       │   └── xl.meta
│       ├── example-object2.txt
│       │   └── xl.meta
│       └── example-object3.txt
│           └── xl.meta
├── minio2
│   └── mybucket
│       ├── example-object1.txt
│       │   └── xl.meta
│       ├── example-object2.txt
│       │   └── xl.meta
│       └── example-object3.txt
│           └── xl.meta
├── minio3
│   └── mybucket
│       ├── example-object1.txt
│       │   └── xl.meta
│       ├── example-object2.txt
│       │   └── xl.meta
│       └── example-object3.txt
│           └── xl.meta
└── minio4
    └── mybucket
        ├── example-object1.txt
        │   └── xl.meta
        ├── example-object2.txt
        │   └── xl.meta
        └── example-object3.txt
            └── xl.meta

21 directories, 12 files