一、当前的架构

新增一个 secondary节点 S3, 192.168.1.155:27019

ip 端口 ⻆色
192.168.1.153 27018 primary M
192.168.1.154 27018 secondary S1
192.168.1.155 27018 arbiter S2

二、方案 1:直接全量同步(直接 add加入空白节点到集群)

2.1 新启动一个空白的 mongo节点

1、 192.168.1.155上新建相关目录

mkdir -p /data/mongodb/mongodb_repl/data_27019

2、生成 配置文件

根据需求修改相应参数:

# cat > /data/mongodb/mongodb_repl/conf/mongo_27019.conf <<EOF
systemLog:
  destination: file
  logAppend: true
  path: /data/mongodb/mongodb_repl/log/mongo_27019.log
storage:
  dbPath: /data/mongodb/mongodb_repl/data_27019
  journal:
    enabled: true
  wiredTiger:
    engineConfig:
      directoryForIndexes: true
      cacheSizeGB: 1
processManagement:
  fork: true  # fork and run in background
  pidFilePath: /data/mongodb/mongodb_repl/mongo_27019.pid
  timeZoneInfo: /usr/share/zoneinfo
net:
  port: 27019
  bindIp: 0.0.0.0
  #bindIpAll: true
  maxIncomingConnections: 5000
  unixDomainSocket:
    enabled: true
    pathPrefix: /data/mongodb/mongodb_repl/data_27019
    filePermissions: 0700
#security:
#  keyFile: /data/mongodb/mongodb_repl/auth/keyfile.key
#  authorization: enabled
#replication:
#  replSetName: repl
EOF

3、给目录授权

chown -R mongodb.mongodb /data/mongodb/mongodb_repl/

4、启动

# /data/mongodb/mongodb_repl/bin/mongod -f /data/mongodb/mongodb_repl/conf/mongo_27019.conf

5、登录新建管理账号

mongo 192.168.1.155:27019
> use admin
> db.createUser(
  {
    user: "root",
    pwd: "root123456",
    roles: [ { role: "root", db: "admin" } ]
  }
)

6、关闭 mongo

kill 563055

或者

> db.shutdownServer()

7、修改 conf文件,去掉步骤 2中 security和 replication的注释

security:
  keyFile: /data/mongodb/mongodb_repl/auth/keyfile.key
  authorization: enabled
replication:
  replSetName: repl

8、启动 mongo

# /data/mongodb/mongodb_repl/bin/mongod -f /data/mongodb/mongodb_repl/conf/mongo_27019.conf

9、登录

/data/mongodb/mongodb_repl/bin/mongo 192.168.1.155:27019/admin -uroot -proot123456

2.2 加入新的 S3节点到集群中

-- 存储节点 M中执行
# /data/mongodb/mongodb_repl/bin/mongo 192.168.1.153:27018/admin -uroot -proot123456
shard1:PRIMARY> rs.add("192.168.1.155:27019")
{
  "ok" : 1,
  "$clusterTime" : {
    "clusterTime" : Timestamp(1665991566, 1),
    "signature" : {
      "hash" : BinData(0,"eJPwZNjQJBo2M34Fp2SGbUw8bsI="),
      "keyId" : NumberLong("7155366788732026884")
    }
  },
  "operationTime" : Timestamp(1665991566, 1)
}

2.3 S3 的状态变化以及 log信息

S3的 log中 S3的状态变化

  • 1) transition to STARTUP2 from STARTUP
  • 2) shard信息注册
  • 3)建立索引 &cloner集合信息,初始化数据 initial sync done; took 60s.
  • 4) transition to RECOVERING from STARTUP2
  • 5) transition to SECONDARY from RECOVERING

master中 S3的状态变化:

  • 1) Member 192.168.1.155:27102 is now in state STARTUP
  • 2) Member 192.168.1.155:27102 is now in state STARTUP2
  • 3) Member 192.168.1.155:27102 is now in state SECONDARY

初始化信息:

REPL   [replexec-0] This node is 192.168.1.155:27019 in the config
REPL   [replexec-0] transition to STARTUP2 from STARTUP
REPL   [replexec-0] Starting replication storage threads

REPL   [replexec-6] Member 192.168.1.154:27018 is now in state SECONDARY
REPL   [replexec-1] Member 192.168.1.153:27018 is now in state PRIMARY
REPL   [replexec-3] Member 192.168.1.155:27018 is now in state ARBITER
STORAGE  [replexec-0] createCollection: local.temp_oplog_buffer with generated UUID: 2cce2a73-7133-4acb-88d3-685192f54fa0
REPL   [replication-0] Starting initial sync (attempt 1 of 10)
STORAGE  [replication-0] Finishing collection drop for local.temp_oplog_buffer (2cce2a73-7133-4acb-88d3-685192f54fa0).
STORAGE  [replication-0] createCollection: local.temp_oplog_buffer with generated UUID: d341653c-6373-42dc-9a58-ae7d2f700887
REPL   [replication-0] sync source candidate: 192.168.1.154:27018  -- 选择从 1.154:27018上复制信息
REPL   [replication-0] Initial syncer oplog truncation finished in: 0ms
REPL   [replication-0] ******
REPL   [replication-0] creating replication oplog of size: 10240MB...
STORAGE  [replication-0] createCollection: local.oplog.rs with generated UUID: 9284f200-65fe-414d-904b-66400b916276
STORAGE  [replication-0] Starting OplogTruncaterThread local.oplog.rs
STORAGE  [replication-0] The size storer reports that the oplog contains 0 records totaling to 0 bytes
STORAGE  [replication-0] Scanning the oplog to determine where to place markers for truncation
REPL   [replication-0] ******

删除 S3节点

repl:PRIMARY> rs.remove("192.168.1.155:27019")

三、方案 2:从增量开始同步(拷⻉一个节点的 data目录到新节点的 data目录)

将 192.168.1.154:27019使用增量的方式加入到集群中

3.1 关闭其中一个 secondary节点 S1

-- 192.168.1.154:27018节点登录并关闭该 mongo
# /data/mongodb/mongodb_repl/bin/mongo 192.168.1.154:27018/admin -uroot -
proot123456
repl:SECONDARY> db.shutdownServer()

3.2 拷⻉ S1的 data目录到 S3的 data目录

cp -r data data_27019/
cp /data/mongodb/mongodb_repl/conf/mongo_27018.conf /data/mongodb/mongodb_repl/conf/mongo_27019.conf
vim /data/mongodb/mongodb_repl/conf/mongo_27019.conf   //将data改为 data_27019,27018改为27019
chown -R mongodb.mongodb /data/mongodb/

3.3 启动原从节点 S1

# /data/mongodb/mongodb_repl/bin/mongod -f /data/mongodb/mongodb_repl/conf/mongo_27018.conf

3.4 启动新节点 S3

# /data/mongodb/mongodb_repl/bin/mongod -f /data/mongodb/mongodb_repl/conf/mongo_27019.conf
# /data/mongodb/mongodb_repl/bin/mongo 192.168.1.154:27019/admin -uroot -proot123456
repl:OTHER>

3.5 加入 S3 到副本集 shard1中

--primary节点执行命令
repl:PRIMARY> rs.add("192.168.1.154:27019")

新加入的 S3, 27019节点的状态变化:

S3的 log中节点的变化:

  • transition to RECOVERING from REMOVED
  • transition to SECONDARY from RECOVERING

Master的 log中节点的变化:

  • Member 192.168.1.154:27019 is now in state RS_DOWN
  • Member 192.168.1.154:27019 is now in state SECONDARY

S3具体 log信息如下:

--节点的变化:
2021-09-24T13:58:25.371+0800 I REPL   [replexec-0] transition to RECOVERING from REMOVED
2021-09-24T13:58:25.974+0800 I REPL   [rsSync-0] transition to SECONDARY from RECOVERING

2021-09-24T13:58:25.371+0800 I REPL   [replexec-0] New replica set config in use: { _id: "shard1", version: 2, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 0, host: "192.168.1.153:27018", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "192.168.1.153:27019", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "192.168.1.153:27019", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 3, host: "192.168.1.154:27019", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('611db90b5ab6f6f976a4e2b0') } }
2021-09-24T13:58:25.371+0800 I REPL   [replexec-0] This node is 192.168.1.153:27019 in the config
2021-09-24T13:58:25.371+0800 I REPL   [replexec-0] transition to RECOVERING from REMOVED
2021-09-24T13:58:25.371+0800 I REPL   [replexec-0] Resetting sync source to empty, which was :27017
2021-09-24T13:58:25.371+0800 I ASIO   [Replication] Connecting to 192.168.1.154:27018
2021-09-24T13:58:25.371+0800 I ASIO   [Replication] Connecting to 192.168.1.155:27018
2021-09-24T13:58:25.371+0800 I REPL   [replexec-0] Member 192.168.1.153:27018 is now in state PRIMARY
2021-09-24T13:58:25.372+0800 I REPL   [replexec-2] Member 192.168.1.154:27018 is now in state SECONDARY
2021-09-24T13:58:25.974+0800 I REPL   [rsSync-0] transition to SECONDARY from RECOVERING