Ceph集群扩容

旧ceph集群

IP	hostname	角色
10.99.2.155	pool1	ceph-deploy,mon,mgr,osd
10.99.2.156	pool2	mon,mgr,osd
10.99.2.157	pool3	mon,mgr,osd

新ceph节点

做好yum源，放行防火墙，selinux，ntp

IP	hostname	角色
10.99.2.158	pool4	安装ceph，不需要安装ceph-deploy
10.99.2.159	pool5	安装ceph
10.99.2.160	pool6	安装ceph

需求：需要将新集群中的ceph osd加入旧集群中，集群合二为一并且保证集群中数据同步且一致；

源ceph环境

 [root@pool1 ceph]# ceph -s
  services:
    mon: 3 daemons, quorum pool1,pool2,pool3
    mgr: pool1(active), standbys: pool2, pool3
    osd: 3 osds: 3 up, 3 in; 79 remapped pgs
    
[root@pool1 ceph]# ceph osd tree
ID  CLASS WEIGHT  TYPE NAME      STATUS REWEIGHT PRI-AFF 
 -1       0.11151 root default                           
 -3       0.01859     host pool1                         
  0   hdd 0.01859         osd.0      up  1.00000 1.00000 
 -5       0.01859     host pool2                         
  1   hdd 0.01859         osd.1      up  1.00000 1.00000 
 -7       0.01859     host pool3                         
  2   hdd 0.01859         osd.2      up  1.00000 1.00000

新节点的机器做免密登录

pool1执行

1
2
3

[root@pool1 ceph]# ssh-copy-id pool4
[root@pool1 ceph]# ssh-copy-id pool5
[root@pool1 ceph]# ssh-copy-id pool6

新节点安装ceph

pool1执行

1	[root@pool1 ceph]# ceph-deploy install --no-adjust-repos pool4 pool5 pool6

新节点执行

1	[root@pool4 ~]# parted /dev/sdb mklabel gpt -s

将新节点的osd加入到集群中

pool1执行

[root@pool1 ceph]# ceph-deploy disk zap pool4 /dev/sdb
[root@pool1 ceph]# ceph-deploy disk zap pool5 /dev/sdb
[root@pool1 ceph]# ceph-deploy disk zap pool6 /dev/sdb
[root@pool1 ceph]# ceph-deploy osd create pool4 --data /dev/sdb
[root@pool1 ceph]# ceph-deploy osd create pool5 --data /dev/sdb
[root@pool1 ceph]# ceph-deploy osd create pool6 --data /dev/sdb

修改ceph配置文件

pool1执行

> 如需添加控制节点需要修改配置文件，不需要添加则不需要修改

# 添加新集群的hostname及IP
echo "public network=10.99.2.0/24" >> ceph.conf
[root@pool1 ceph]# cat ceph.conf 
[global]
fsid = d35002f6-34f0-4097-b6db-2dc40e66764e
mon_initial_members = pool1, pool2, pool3, pool4, pool5, pool6
mon_host = 10.99.2.155,10.99.2.156,10.99.2.157,10.99.2.158,10.99.2.159,10.99.2.160
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network=10.99.2.0/24
[mon]
mon allow pool delete = true

下发文件

pool1执行

旧节点新节点，都需要同步配置文件

1	[root@pool1 ceph]# ceph-deploy --overwrite-conf admin pool1 pool2 pool3 pool4 pool5 pool6

加入到集群监控

pool1执行

1 2	[root@pool1 ceph]# ceph-deploy mon create pool4 pool5 pool6 [root@pool1 ceph]# ceph-deploy mgr create pool4 pool5 pool6

新节点成功加入集群后

[root@pool1 ceph]# ceph -s
  cluster:
    id:     d35002f6-34f0-4097-b6db-2dc40e66764e
    health: HEALTH_WARN
            26/5439 objects misplaced (0.478%)
            Degraded data redundancy: 1782/5439 objects degraded (32.763%), 105 pgs degraded, 80 pgs undersized
            application not enabled on 3 pool(s)
            clock skew detected on mon.pool4, mon.pool5, mon.pool6
 
  services:
    mon: 6 daemons, quorum pool1,pool2,pool3,pool4,pool5,pool6
    mgr: pool1(active), standbys: pool2, pool3, pool4, pool6, pool5
    osd: 6 osds: 6 up, 6 in; 79 remapped pgs
            
[root@pool1 ceph]# ceph osd tree
ID  CLASS WEIGHT  TYPE NAME      STATUS REWEIGHT PRI-AFF 
 -1       0.11151 root default                           
 -3       0.01859     host pool1                         
  0   hdd 0.01859         osd.0      up  1.00000 1.00000 
 -5       0.01859     host pool2                         
  1   hdd 0.01859         osd.1      up  1.00000 1.00000 
 -7       0.01859     host pool3                         
  2   hdd 0.01859         osd.2      up  1.00000 1.00000 
 -9       0.01859     host pool4                         
  3   hdd 0.01859         osd.3      up  1.00000 1.00000 
-11       0.01859     host pool5                         
  4   hdd 0.01859         osd.4      up  1.00000 1.00000 
-13       0.01859     host pool6                         
  5   hdd 0.01859         osd.5      up  1.00000 1.00000

查看集群状态

[root@pool1 ceph]# ceph mon dump
dumped monmap epoch 4
epoch 4
fsid d35002f6-34f0-4097-b6db-2dc40e66764e
last_changed 2021-03-11 12:34:48.877201
created 2021-03-09 21:59:45.148467
0: 10.99.2.155:6789/0 mon.pool1
1: 10.99.2.156:6789/0 mon.pool2
2: 10.99.2.157:6789/0 mon.pool3
3: 10.99.2.158:6789/0 mon.pool4
4: 10.99.2.159:6789/0 mon.pool5
5: 10.99.2.160:6789/0 mon.pool6
# 此时发现新节点pool4/5/6都已经加入集群
[root@pool4 ceph]# ceph mgr dump

扩容之后发现集群WARN

[root@pool1 ceph]# ceph -s
  cluster:
    id:     d35002f6-34f0-4097-b6db-2dc40e66764e
    health: HEALTH_WARN
            26/5439 objects misplaced (0.478%)
            Degraded data redundancy: 1782/5439 objects degraded (32.763%), 105 pgs degraded, 80 pgs undersized
            application not enabled on 3 pool(s)
            clock skew detected on mon.pool4, mon.pool5, mon.pool6

解决方案

# 所有节点重新ntpdate，同步时间
ntpdate 10.99.2.2
# 重启mon,mgr,osd
systemctl restart ceph-mon.target
systemctl restart ceph-mgr.target
systemctl restart ceph-osd.target

[root@pool1 ceph]# ceph health
HEALTH_OK