Ceph 快速部署 ( Centos7 + Jewel )
文章目录
[隐藏]
- 环境
- 环境清理
- yum源及ceph的安装
- 开始部署
- config推送
- mon&osd启动方式
这篇文章主要介绍了如何用三台虚拟机搭建一套Ceph分布式系统,步骤简洁但不失准确性。环境清理一小节可以解决绝大多数部署不成功的问题,最后一节介绍了常用的Ceph操作,希望能给刚搭建环境的同学一些帮助。
环境
三台装有CentOS 7的主机,每台主机有三个磁盘(虚拟机磁盘要大于100G),详细信息如下:
[root@ceph-1~]# cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core) [root@ceph-1 ~]# lsblk NAME MAJ:MINRM SIZE RO TYPE MOUNTPOINT sda 8:0 0 128G 0 disk ├─sda1 8:1 0 500M 0 part /boot └─sda2 8:2 0 127.5G 0 part ├─centos-root 253:0 0 50G 0 lvm / ├─centos-swap 253:1 0 2G 0 lvm [SWAP] └─centos-home 253:2 0 75.5G 0 lvm /home sdb 8:16 0 2T 0 disk sdc 8:32 0 2T 0 disk sdd 8:48 0 2T 0 disk sr0 11:0 1 1024M 0 rom [root@ceph-1 ~]# cat /etc/hosts .. 192.168.57.222ceph-1 192.168.57.223ceph-2 192.168.57.224 ceph-3
集群配置如下:
环境清理
如果之前部署失败了,不必删除ceph客户端,或者重新搭建虚拟机,只需要在每个节点上执行如下指令即可将环境清理至刚安装完ceph客户端时的状态!强烈建议在旧集群上搭建之前清理干净环境,否则会发生各种异常情况。
psaux|grep ceph |awk '{print $2}'|xargs kill -9 ps -ef|grep ceph #确保此时所有ceph进程都已经关闭!!!如果没有关闭,多执行几次。 umount /var/lib/ceph/osd/* rm -rf /var/lib/ceph/osd/* rm -rf /var/lib/ceph/mon/* rm -rf /var/lib/ceph/mds/* rm -rf /var/lib/ceph/bootstrap-mds/* rm -rf /var/lib/ceph/bootstrap-osd/* rm -rf /var/lib/ceph/bootstrap-rgw/* rm -rf /var/lib/ceph/tmp/* rm -rf /etc/ceph/* rm -rf /var/run/ceph/*
yum源及ceph的安装
需要在每个主机上执行以下指令:
yumclean all rm -rf /etc/yum.repos.d/*.repo wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo sed -i '/aliyuncs/d' /etc/yum.repos.d/CentOS-Base.repo sed -i '/aliyuncs/d' /etc/yum.repos.d/epel.repo sed -i 's/$releasever/7/g' /etc/yum.repos.d/CentOS-Base.repo
增加ceph的源:
vim/etc/yum.repos.d/ceph.repo
添加以下内容:
[ceph] name=ceph baseurl=http://mirrors.163.com/ceph/rpm-jewel/el7/x86_64/ gpgcheck=0 [ceph-noarch] name=cephnoarch baseurl=http://mirrors.163.com/ceph/rpm-jewel/el7/noarch/ gpgcheck=0
安装ceph客户端:
yummakecache yum install ceph ceph-radosgw rdate -y 关闭selinux&firewalld sed-i 's/SELINUX=.*/SELINUX=disabled/' /etc/selinux/config setenforce 0 systemctl stop firewalld systemctl disable firewalld
同步各个节点时间:
yum-y install rdate rdate -s time-a.nist.gov echo rdate -s time-a.nist.gov >> /etc/rc.d/rc.local chmod +x /etc/rc.d/rc.local
开始部署
在部署节点(ceph-1)安装ceph-deploy,下文的部署节点统一指ceph-1:
[root@ceph-1~]# yum -y install ceph-deploy [root@ceph-1 ~]# ceph-deploy --version 1.5.34 [root@ceph-1 ~]# ceph -v ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
在部署节点创建部署目录并开始部署:
[root@ceph-1~]# cd [root@ceph-1 ~]# mkdir cluster [root@ceph-1 ~]# cd cluster/ [root@ceph-1 cluster]# ceph-deploy new ceph-1 ceph-2 ceph-3
如果之前没有ssh-copy-id到各个节点,则需要输入一下密码,过程log如下:
[ceph_deploy.conf][DEBUG] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.34): /usr/bin/ceph-deploy newceph-1 ceph-2 ceph-3 [ceph_deploy.cli][INFO ] ceph-deploy options: [ceph_deploy.cli][INFO ] username : None [ceph_deploy.cli][INFO ] func : <function new at 0x7f91781f96e0> [ceph_deploy.cli][INFO ] verbose : False [ceph_deploy.cli][INFO ] overwrite_conf : False [ceph_deploy.cli][INFO ] quiet : False [ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f917755ca28> [ceph_deploy.cli][INFO ] cluster : ceph [ceph_deploy.cli][INFO ] ssh_copykey : True [ceph_deploy.cli][INFO ] mon :['ceph-1', 'ceph-2', 'ceph-3'] .. .. ceph_deploy.new][WARNIN] could not connect via SSH [ceph_deploy.new][INFO ] will connect again with password prompt The authenticity of host 'ceph-2 (192.168.57.223)' can't be established. ECDSA key fingerprint is ef:e2:3e:38:fa:47:f4:61:b7:4d:d3:24:de:d4:7a:54. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'ceph-2,192.168.57.223' (ECDSA) to the list of knownhosts. root root@ceph-2's password: [ceph-2][DEBUG ] connected to host: ceph-2 .. .. [ceph_deploy.new][DEBUG ] Resolving host ceph-3 [ceph_deploy.new][DEBUG ] Monitor ceph-3 at 192.168.57.224 [ceph_deploy.new][DEBUG ] Monitor initial members are ['ceph-1', 'ceph-2','ceph-3'] [ceph_deploy.new][DEBUG ] Monitor addrs are ['192.168.57.222','192.168.57.223', '192.168.57.224'] [ceph_deploy.new][DEBUG ] Creating a random mon key... [ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring... [ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...
此时,目录内容如下:
[root@ceph-1cluster]# ls ceph.conf ceph-deploy-ceph.log ceph.mon.keyring
根据自己的IP配置向ceph.conf中添加public_network,并稍微增大mon之间时差允许范围(默认为0.05s,现改为2s):
[root@ceph-1cluster]# echo public_network=192.168.57.0/24 >> ceph.conf [root@ceph-1 cluster]# echo mon_clock_drift_allowed = 2 >> ceph.conf [root@ceph-1 cluster]# cat ceph.conf [global] fsid = 0248817a-b758-4d6b-a217-11248b098e10 mon_initial_members = ceph-1, ceph-2, ceph-3 mon_host = 192.168.57.222,192.168.57.223,192.168.57.224 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public_network=192.168.57.0/24 mon_clock_drift_allowed = 2
开始部署monitor:
[root@ceph-1cluster]# ceph-deploy mon create-initial .. ..若干log [root@ceph-1 cluster]# ls ceph.bootstrap-mds.keyring ceph.bootstrap-rgw.keyring ceph.conf ceph.mon.keyring ceph.bootstrap-osd.keyring ceph.client.admin.keyring ceph-deploy-ceph.log
查看集群状态:
[root@ceph-1cluster]# ceph -s cluster 0248817a-b758-4d6b-a217-11248b098e10 health HEALTH_ERR no osds Monitorclock skew detected monmap e1: 3 mons at{ceph-1=192.168.57.222:6789/0,ceph-2=192.168.57.223:6789/0,ceph-3=192.168.57.224:6789/0} electionepoch 6, quorum 0,1,2 ceph-1,ceph-2,ceph-3 osdmap e1: 0 osds: 0 up, 0 in flagssortbitwise pgmap v2: 64 pgs, 1 pools, 0 bytes data, 0objects 0 kB used, 0kB / 0 kB avail 64 creating
开始部署OSD:
ceph-deploy--overwrite-conf osd prepare ceph-1:/dev/sdb ceph-1:/dev/sdc ceph-1:/dev/sddceph-2:/dev/sdb ceph-2:/dev/sdc ceph-2:/dev/sdd ceph-3:/dev/sdb ceph-3:/dev/sdcceph-3:/dev/sdd --zap-disk ceph-deploy --overwrite-conf osd activate ceph-1:/dev/sdb1 ceph-1:/dev/sdc1ceph-1:/dev/sdd1 ceph-2:/dev/sdb1 ceph-2:/dev/sdc1 ceph-2:/dev/sdd1ceph-3:/dev/sdb1 ceph-3:/dev/sdc1 ceph-3:/dev/sdd1
我在部署的时候出了个小问题,有一个OSD没成功(待所有OSD部署完毕后,再重新部署问题OSD即可解决),如果不出意外的话,集群状态应该如下:
[root@ceph-1cluster]# ceph -s cluster 0248817a-b758-4d6b-a217-11248b098e10 health HEALTH_WARN too few PGsper OSD (21 < min 30) monmap e1: 3 mons at{ceph-1=192.168.57.222:6789/0,ceph-2=192.168.57.223:6789/0,ceph-3=192.168.57.224:6789/0} electionepoch 22, quorum 0,1,2 ceph-1,ceph-2,ceph-3 osdmap e45: 9 osds: 9 up, 9 in flagssortbitwise pgmap v82: 64 pgs, 1 pools, 0 bytes data, 0objects 273 MB used,16335 GB / 16336 GB avail 64 active+clean
去除这个WARN,只需要增加rbd池的PG就好:
[root@ceph-1cluster]# ceph osd pool set rbd pg_num 128 set pool 0 pg_num to 128 [root@ceph-1 cluster]# ceph osd pool set rbd pgp_num 128 set pool 0 pgp_num to 128 [root@ceph-1 cluster]# ceph -s cluster 0248817a-b758-4d6b-a217-11248b098e10 health HEALTH_ERR 19 pgs arestuck inactive for more than 300 seconds 12 pgspeering 19 pgs stuckinactive monmap e1: 3 mons at{ceph-1=192.168.57.222:6789/0,ceph-2=192.168.57.223:6789/0,ceph-3=192.168.57.224:6789/0} electionepoch 22, quorum 0,1,2 ceph-1,ceph-2,ceph-3 osdmap e49: 9 osds: 9 up, 9 in flagssortbitwise pgmap v96: 128 pgs, 1 pools, 0 bytes data, 0objects 308 MB used,18377 GB / 18378 GB avail 103 active+clean 12 peering 9 creating 4 activating [root@ceph-1 cluster]# ceph -s cluster 0248817a-b758-4d6b-a217-11248b098e10 health HEALTH_OK monmap e1: 3 mons at {ceph-1=192.168.57.222:6789/0,ceph-2=192.168.57.223:6789/0,ceph-3=192.168.57.224:6789/0} electionepoch 22, quorum 0,1,2 ceph-1,ceph-2,ceph-3 osdmap e49: 9 osds: 9 up, 9 in flagssortbitwise pgmap v99: 128 pgs, 1 pools, 0 bytes data, 0objects 310 MB used,18377 GB / 18378 GB avail 128 active+clean
至此,集群部署完毕。
config推送
请不要使用直接修改某个节点的/etc/ceph/ceph.conf文件的方式,而是去部署节点(此处为ceph-1:/root/cluster/ceph.conf)目录下修改。因为节点到几十个的时候,不可能一个个去修改的,采用推送的方式快捷安全!修改完毕后,执行如下指令,将conf文件推送至各个节点:
[root@ceph-1cluster]# ceph-deploy --overwrite-conf config push ceph-1 ceph-2 ceph-3
此时,需要重启各个节点的monitor服务,见下一节。
mon&osd启动方式
#monitor start/stop/restart #ceph-1为各个monitor所在节点的主机名。 systemctl start [email protected] systemctl restart [email protected] systemctl stop [email protected] #OSD start/stop/restart #0为该节点的OSD的id,可以通过`ceph osd tree`查看 systemctl start/stop/restart [email protected] [root@ceph-1 cluster]# ceph osd tree ID WEIGHT TYPE NAME UP/DOWNREWEIGHT PRIMARY-AFFINITY -1 17.94685 rootdefault -2 5.98228 hostceph-1 0 1.99409 osd.0 up 1.00000 1.00000 1 1.99409 osd.1 up 1.00000 1.00000 8 1.99409 osd.2 up 1.00000 1.00000 -3 5.98228 hostceph-2 2 1.99409 osd.3 up 1.00000 1.00000 3 1.99409 osd.4 up 1.00000 1.00000 4 1.99409 osd.5 up 1.00000 1.00000 -4 5.98228 hostceph-3 5 1.99409 osd.6 up 1.00000 1.00000 6 1.99409 osd.7 up 1.00000 1.00000 7 1.99409 osd.8 up 1.00000 1.00000
原文出处:qcloud -> https://www.qcloud.com/community/article/904638