Doris BE 状态监控

Doris BE 监控,

当 BE 挂掉,Azkaban 定时任务会监控到,同时将 BE 服务重启,

重启完成后,Azkaban 失败、告警。


Azkaban Zip 脚本

azkaban.project

1
azkaban-flow-version: 2.0

doris_check.flow

1
2
3
4
5
6
7
8
9
10
11
12
13
nodes:
- name: doris_node170_be_was_dead_and_restart_complited_now
type: command
config:
command: sh /opt/sync/sync_script/sink_doris/az_doris_be_check.sh 10.0.14.170
- name: doris_node171_be_was_dead_and_restart_complited_now
type: command
config:
command: sh /opt/sync/sync_script/sink_doris/az_doris_be_check.sh 10.0.14.171
- name: doris_node172_be_was_dead_and_restart_complited_now
type: command
config:
command: sh /opt/sync/sync_script/sink_doris/az_doris_be_check.sh 10.0.14.172

az_doris_be_check.sh

1
2
3
4
5
6
7
8
9
#! /bin/bash
host=$1
doris_check_script_path=/opt/doris/be/bin

ssh root@$host << eeooff

$doris_check_script_path/doris_be_check.sh

eeooff

BE 节点脚本

$DORIS_HOME/be/bin/doris_be_check.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#! /bin/bash

check="$(ps -ef | grep palo_be | grep -v grep | awk '{print $8}' | awk -F '[/]' '{print $NF}')"
doris_path="/opt/doris"


start(){
now=`date "+%Y-%m-%d %H:%M:%S"`
echo "BE重启中... 重启时间:$now..."
$doris_path/be/bin/start_be.sh --daemon
sleep 10s
test_after_restart="$(ps -ef | grep palo_be | grep -v grep | awk '{print $8}' | awk -F '[/]' '{print $NF}')"
if [[ $test_after_restart = "palo_be" ]];
then
echo "be重启成功..."
else
echo "be启动失败..."
fi
}


if [[ $check = "palo_be" ]];
then
echo "BE 运行正常..."
exit 0
else
echo "BE 挂了..."
start
exit 1
fi