8 ceph dashboard 和监控

Ceph dashboard 是通过一个 web 界面，对已经运行的 ceph 集群进行状态查看及功能配置等功能，早期 ceph 使用的是第三方的 dashboard 组件，如

Calamari：

Calamari 对外提供了十分漂亮的 Web 管理和监控界面，以及一套改进的 REST API 接口（不 同于 Ceph 自身的 REST API），在一定程度上简化了 Ceph 的管理。最初 Calamari 是作为 Inktank 公司的 Ceph 企业级商业产品来销售，红帽 2015 年收购 Inktank 后为了更好地推动 Ceph 的 发展，对外宣布 Calamari 开源
优点：
    管理功能好 
    界面友好 
    可以利用它来部署 Ceph 和监控 Ceph 

缺点：
    非官方 
    依赖 OpenStack 某些包

VSM:

https://github.com/intel/virtual-storage-manager

Virtual Storage Manager (VSM)是 Intel 公司研发并且开源的一款 Ceph 集群管理和监控软件， 简化了一些 Ceph 集群部署的一些步骤，可以简单的通过 WEB 页面来操作。

优点：
    易部署 
    轻量级 
    灵活（可以自定义开发功能） 

缺点：
    监控选项少 
    缺乏 Ceph 管理功能

Inkscope：

https://github.com/inkscope/inkscope

lInkscope 是一个 Ceph 的管理和监控系统，依赖于 Ceph 提供的 API，使用 MongoDB 来 存储实时的监控数据和历史信息。

优点：
    易部署 
    轻量级 
    灵活（可以自定义开发功能） 

缺点：
    监控选项少 
    缺乏 Ceph 管理功能

Ceph-Dash：

http://cephdash.crapworks.de/

Ceph-Dash 是用 Python 开发的一个 Ceph 的监控面板，用来监控 Ceph 的运行状态。同时 提供 REST API 来访问状态数据。
优点：
    易部署 
    轻量级 
    灵活（可以自定义开发功能） 

缺点：
    功能相对简单

8.1 启用 dashboard 插件：

这里我使用 Ceph-Dash，部署再 mgr 节点上

https://docs.ceph.com/en/mimic/mgr/

https://docs.ceph.com/en/latest/mgr/dashboard/

https://packages.debian.org/unstable/ceph-mgr-dashboard#15 版本有依赖需要单独解决

Ceph mgr 是一个多插件(模块化)的组件，其组件可以单独的启用或关闭,以下为在

ceph-deploy 服务器操作：

# 开启 dashboard 模块
root@ceph-deploy:~# ceph mgr module enable dashboard

注：模块启用后还不能直接访问，需要配置关闭 SSL 或启用 SSL 及指定监听地址

8.1.2 开启 dashboard 功能

我们在 deploy 节点上开启了 dashboard 插件之后，可以直接在 deploy 节点上开启设置

Ceph dashboard 在 mgr 节点进行开启设置，并且可以配置开启或者关闭 SSL，如下

1.关闭 ssl 功能

# 关闭 ssl 功能
root@ceph-deploy:~# ceph config set mgr mgr/dashboard/ssl false

2.指定 dashboard 监听地址

root@ceph-deploy:~# ceph config set mgr mgr/dashboard/ceph-mgr1/server_addr 10.0.0.105

# mgr/dashboard/ceph-mgr1/server_addr ：指定监控 ceph-mgr1 节点
# mgr1 节点 ip 10.0.0.105

3.指定 dashboard 监听端口，这个端口可以自定义

root@ceph-deploy:~# ceph config set mgr mgr/dashboard/ceph-mgr1/server_port 9999

# mgr/dashboard/ceph-mgr1/server_port ：指定监控 ceph-mgr1 节点端口
# mgr1 自定义端口为 9999

4.重启 mgr1 节点的 mgr 服务

root@ceph-mgr1:~# systemctl restart ceph-mgr@ceph-mgr1.service
# 端口以开启
root@ceph-mgr1:~# ss -ntl | grep 9999
LISTEN   0         5                10.0.0.105:9999             0.0.0.0:*

5.我们在 deploy 节点上查看 ceph 集群状态是否有报错

root@ceph-deploy:~# ceph -s
  cluster:
    id:     14ce1d1a-3323-4337-963a-f96484ddd363
    health: HEALTH_OK   # 状态为 OK

  services:
    mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 94m)
    mgr: ceph-mgr1(active, since 11m), standbys: ceph-mgr2
    mds: 2/2 daemons up, 2 standby
    osd: 16 osds: 16 up (since 93m), 16 in (since 3d)
    rgw: 1 daemon active (1 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   11 pools, 377 pgs
    objects: 354 objects, 67 MiB
    usage:   600 MiB used, 1.6 TiB / 1.6 TiB avail
    pgs:     377 active+clean


如果有以下报错: 
Module 'dashboard' has failed: error('No socket could be created',) 
需要检查 mgr 服务是否正常运行，可以重启一遍 mgr 服务

8.1.3 dashboard 访问验证

但是 ceph 默认不提供用户和密码

http://10.0.0.105:9999/

8.1.4 设置 dashboard 账户及密码

在 deploy 节点上操作

1.创建 pass.txt 文件

root@ceph-deploy:~# touch pass.txt

# 指定密码为 123456
root@ceph-deploy:~# echo "123456" >> pass.txt

2.创建用户

# 创建 zhang 用户，-i 指定密码从 pass.txt 文件中获取
root@ceph-deploy:~# ceph dashboard set-login-credentials zhang -i pass.txt
******************************************************************
***          WARNING: this command is deprecated.              ***
*** Please use the ac-user-* related commands to manage users. ***
******************************************************************
Username and password updated

8.1.5 dashboard 登录验证

登录成功

8.2 通过 prometheus 监控 ceph node 节点

https://prometheus.io/

通过 Prometheus 连接至 ceph 上去采集数据，然后通过 grafana 渲染数据

8.2.1 部署 Prometheus

我们随便找一个台 ceph 集群中的机器就行，他们之间没有依赖关系

这里我部署在 mon 节点上

1.下载安装 Prometheus

root@ceph-mon1:~# mkdir /apps
root@ceph-mon1:~# cd /apps/

# 下载 Prometheus server 
root@ceph-mon1:/apps# wget https://github.com/prometheus/prometheus/releases/download/v2.29.2/prometheus-2.29.2.linux-amd64.tar.gz

# 解压
root@ceph-mon1:/apps# tar xf prometheus-2.29.2.linux-amd64.tar.gz

2.编写 service 文件

# 先对 Prometheus 做一个软连接
root@ceph-mon1:/apps# ln -sv /apps/prometheus-2.29.2.linux-amd64 /apps/prometheus
'/apps/prometheus' -> '/apps/prometheus-2.29.2.linux-amd64'

# 编写 service 文件
root@ceph-mon1:/apps# vim /etc/systemd/system/prometheus.service

[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target

[Service]
Restart=on-failure
WorkingDirectory=/apps/prometheus/
ExecStart=/apps/prometheus/prometheus --config.file=/apps/prometheus/prometheus.yml

[Install]
WantedBy=multi-user.target

3.通过 systemctl 启动 Prometheus

root@ceph-mon1:/apps# systemctl daemon-reload

# 实现开机自启动
root@ceph-mon1:/apps# systemctl enable --now prometheus.service

8.2.2 访问 Prometheus

http://10.0.0.102:9090

8.2.3 部署 node_exporter

https://prometheus.io/download/#node_exporter

然后部署 node_exporter 来实现对当前宿主机数据的采集，需要不是在 ceph 集群中的所有 node 节点，并且提供接口给 Prometheus 使用

1.各个 node 节点安装 node_exporter

# node1 节点
root@ceph-node1:~# mkdir /apps
root@ceph-node1:~# cd /apps/
root@ceph-node1:/apps# wget https://github.com/prometheus/node_exporter/releases/download/v1.2.2/node_exporter-1.2.2.linux-amd64.tar.gz
root@ceph-node1:/apps# tar xf node_exporter-1.2.2.linux-amd64.tar.gz
root@ceph-node1:/apps# ln -sv /apps/node_exporter-1.2.2.linux-amd64 /apps/node_exporter


# node2 节点
root@ceph-node2:~# mkdir /apps
root@ceph-node2:~# cd /apps/
root@ceph-node2:/apps# wget https://github.com/prometheus/node_exporter/releases/download/v1.2.2/node_exporter-1.2.2.linux-amd64.tar.gz
root@ceph-node2:/apps# tar xf node_exporter-1.2.2.linux-amd64.tar.gz
root@ceph-node2:/apps# ln -sv /apps/node_exporter-1.2.2.linux-amd64 /apps/node_exporter

# node3 节点
root@ceph-node3:~# mkdir /apps
root@ceph-node3:~# cd /apps/
root@ceph-node3:/apps# wget https://github.com/prometheus/node_exporter/releases/download/v1.2.2/node_exporter-1.2.2.linux-amd64.tar.gz
root@ceph-node3:/apps# tar xf node_exporter-1.2.2.linux-amd64.tar.gz
root@ceph-node3:/apps# ln -sv /apps/node_exporter-1.2.2.linux-amd64 /apps/node_exporter

# node4 节点
root@ceph-node4:~# mkdir /apps
root@ceph-node4:~# cd /apps/
root@ceph-node4:/apps# wget https://github.com/prometheus/node_exporter/releases/download/v1.2.2/node_exporter-1.2.2.linux-amd64.tar.gz
root@ceph-node4:/apps# tar xf node_exporter-1.2.2.linux-amd64.tar.gz
root@ceph-node4:/apps# ln -sv /apps/node_exporter-1.2.2.linux-amd64 /apps/node_exporter

2.编写 service 文件

root@ceph-node1:/apps# vim /etc/systemd/system/node-exporter.service

[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
ExecStart=/apps/node_exporter/node_exporter

[Install]
WantedBy=multi-user.target

# 分发给其他 node 节点
root@ceph-node1:/apps# scp /etc/systemd/system/node-exporter.service 10.0.0.108:/etc/systemd/system/

root@ceph-node1:/apps# scp /etc/systemd/system/node-exporter.service 10.0.0.109:/etc/systemd/system/

root@ceph-node1:/apps# scp /etc/systemd/system/node-exporter.service 10.0.0.110:/etc/systemd/system/

3.启动 node_exporter 服务

默认开启 node_exporter 服务会监听 9100 端口

# node1
root@ceph-node1:/apps# systemctl daemon-reload
root@ceph-node1:/apps# systemctl enable --now node-exporter.service

# node2
root@ceph-node2:/apps# systemctl daemon-reload
root@ceph-node2:/apps# systemctl enable --now node-exporter.service

# node3
root@ceph-node3:/apps# systemctl daemon-reload
root@ceph-node3:/apps# systemctl enable --now node-exporter.service

# node4
root@ceph-node4:/apps# systemctl daemon-reload
root@ceph-node4:/apps# systemctl enable --now node-exporter.service

8.2.4 验证各个 node

验证各 node 节点的 node_exporter 数据：

通过浏览器访问 node ip 加 9100 端口实现访问 node_exporter web 页面，这里我只访问了一个 node ，其他的 node 均可以访问

8.2.5 配置 prometheus server 数据并验证

到 mon1 节点上修改 Prometheus server 的配置文件

# 配置 Prometheus，将需要监控的 node 节点添加进来
root@ceph-mon1:/apps# vim prometheus/prometheus.yml
- job_name: "ceph-node"
    static_configs:
      - targets: ["10.0.0.107:9100","10.0.0.108:9100","10.0.0.109:9100","10.0.0.110:9100"]
      
# 重启 Prometheus
root@ceph-mon1:/apps# systemctl restart prometheus.service

访问 Prometheus server

http://10.0.0.102:9090/

已经拿到 node 数据

8.3 通过 prometheus 监控 ceph 服务

Ceph manager 内部的模块中包含了 prometheus 的监控模块,并监听在每个 manager 节点的 9283 端口，该端口用于将采集到的信息通过 http 接口向 prometheus 提供数据。

https://docs.ceph.com/en/mimic/mgr/prometheus/?highlight=prometheus

8.3.1 启用 prometheus 监控模块

# 在 deploy 节点上开 Prometheus 模块
root@ceph-deploy:~# ceph mgr module enable prometheus

# 然后再每个 mgr 节点上验证 9283 端口开启
root@ceph-mgr1:~# ss -ntl | grep 9283
LISTEN   0         5                10.0.0.105:9283             0.0.0.0:*

8.3.2 验证 manager 数据

9283 端口可以直接访问，而且提供的数据用于给 Prometheus 实现监控

mgr1 节点：http://10.0.0.105:9283

下面我们配置 Prometheus server 的配置文件实现对 ceph 集群的监控

8.3.3 配置 prometheus 采集数据

到 Prometheus server 节点上配置 ceph 的采集数据

root@ceph-mon1:/apps# vim prometheus/prometheus.yml
  - job_name: "ceph-cluster-data"
    static_configs:
    - targets: ["10.0.0.105:9283"]      # 这里监控 ceph 的 9283

# 重启 Prometheus 
root@ceph-mon1:/apps# systemctl restart prometheus.service

8.3.4 访问 Prometheus 页面验证数据

http://10.0.0.102:9090/

8.4 安装 grafana 显示监控数据

通过 granfana 显示对 ceph 的集群监控数据及 node 数据

8.4.1 安装 grafana

https://grafana.com/grafana/download/7.3.7?platform=linux

1.我们将 grafana 安装到 deploy 节点上

root@ceph-deploy:~# cd /usr/local/src/
root@ceph-deploy:/usr/local/src# sudo apt-get install -y adduser libfontconfig1
root@ceph-deploy:/usr/local/src# wget https://dl.grafana.com/oss/release/grafana_7.5.7_amd64.deb
root@ceph-deploy:/usr/local/src# sudo dpkg -i grafana_7.5.7_amd64.deb

2.启动

# 启动
root@ceph-deploy:/usr/local/src# systemctl enable --now grafana-server.service

# 监听 3000 端口
root@ceph-deploy:/usr/local/src# ss -ntl | grep  3000
LISTEN   0         128                       *:3000                   *:*