背景
我们总共有13台服务器,1个24口交换机(没有路由器),仅购买了两个公网IP,也就是说仅有两台服务器可以直接连通公网(全配置为全转发路由器),而其他机器得通过NAT来访问公网。
问题
存在的问题:托管在机房的服务器由于是二手硬件,难免会偶尔直接崩溃无法开机。。。而我们没有实际上的路由器,若网关服务器炸了,作为网络内需要通过NAT来上网的机器,该如何自动切换到备用网关呢?
解决思路
由于我们并不是一台服务器上面多条物理网线,所以没办法使用单台服务器内的网口上下线事件来作为切换通知
所以解决方案是:那部分需要做NAT的机器,通过自身定时检测是否能通过网关访问公网,若不能则切换为备用网关
前提条件
- 两台能分配到公网的机器开启全路由模式
- 公司主路由(openwrt 系统)和托管机房服务器通过
Zerotier
组网,并且两台公网服务器分别分配到 zerotier ip 172.16.200.1
和 172.16.200.2
解决步骤
- 增加ping包的目标路由表、ping指定网络接口,可以在不影响现有网络运行的情况下达到定向检测网络接口可用性
ping -I $NIC -c 2 -W $PING_TIMEOUT $REMOTE_IP > /dev/null
# 通过$NIC网络接口发出对$REMOTE_IP的ping包,数量2个,若超过超时时间$PING_TIMEOUT(seconds)则认为是失败,> /dev/null
不输出ping命令的结果到控制台
PING_RESULT=$?
# 通过获取上一次命令执行成功的结果,成功为0,失败为1
- 检测主网关和备用网关可用性,若主网关可用则优先使用主网关
- 添加
crontab
任务,每分钟执行一次判断以达到动态切换作用
-
-
-
-
- xxx.sh > /tmp/route_failover.log
实现切换的脚本参考(适用于Linux服务器)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
|
#!/bin/bash
#
# Linux default gateway failover script [route via lan]
# @author CsHeng 2020.06.11
# ref: https://blog.rapellys.biz/2014/10/18/linux-default-gateway-failover-script/
#
#*********************************************************************
# Configuration
#*********************************************************************
DEF_GATEWAY="172.16.2.1" # Default Gateway
BCK_GATEWAY="172.16.2.2" # Backup Gateway
SUBNET="default" # ip route destination subnet
RMT_IP_1="119.29.29.29" # first remote ip
RMT_IP_2="223.5.5.5" # second remote ip
PING_TIMEOUT="1" # Ping timeout in seconds
#*********************************************************************
# fail fast
set -e
# check user
if [ `whoami` != "root" ]
then
echo "Failover script must be run as root!"
exit 1
fi
#Check GW
CURRENT_GW=`ip route show | grep $SUBNET | head -n 1 | awk '{print $3}'`
PING_NIC=`ip route show | grep $SUBNET | head -n 1 | awk '{print $5}'` # ping network interface
if [ "$CURRENT_GW" == "$DEF_GATEWAY" ]
then
ping -4 -I $PING_NIC -c 2 -W $PING_TIMEOUT $RMT_IP_1 > /dev/null
PING_1=$?
ping -4 -I $PING_NIC -c 2 -W $PING_TIMEOUT $RMT_IP_2 > /dev/null
PING_2=$?
else
# add static routes to remote ip's
ip route add $RMT_IP_1 via $DEF_GATEWAY
ip route add $RMT_IP_2 via $DEF_GATEWAY
ping -4 -I $PING_NIC -c 2 -W $PING_TIMEOUT $RMT_IP_1 > /dev/null
PING_1=$?
ping -4 -I $PING_NIC -c 2 -W $PING_TIMEOUT $RMT_IP_2 > /dev/null
PING_2=$?
# del static route to remote ip's
ip route del $RMT_IP_1
ip route del $RMT_IP_2
fi
LOG_TIME=`date +%b' '%d' '%T`
# both ping fail
if [ "$PING_1" == "1" ] && [ "$PING_2" == "1" ]
then
if [ "$CURRENT_GW" == "$DEF_GATEWAY" ]
then
ip route del $SUBNET
ip route add $SUBNET via $BCK_GATEWAY
# flushing routing cache
ip route flush cache
echo "$LOG_TIME: $0 - Switched $SUBNET gateway to default with IP $BCK_GATEWAY"
fi
elif [ "$CURRENT_GW" != "$DEF_GATEWAY" ]
then
# switching to default
ip route del $SUBNET
ip route add $SUBNET via $DEF_GATEWAY
ip route flush cache
echo "$LOG_TIME: $0 - Switched $SUBNET gateway to default with IP $DEF_GATEWAY"
fi
|
扩展阅读
由于我们使用zerotier将公司网络和托管机房进行组网,因此公司主路由器也应该做动态路由切换
另外一份实现的Shell参考(适用于OpenWRT,除了极个别兼容改动,主体逻辑同Linux Shell)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
|
#!/bin/sh
#
# Linux dynamic route failover script [route via zerotier]
# @author CsHeng 2020.06.11
# ref: https://blog.rapellys.biz/2014/10/18/linux-default-gateway-failover-script/
#
#*********************************************************************
# Configuration
#*********************************************************************
DEF_GATEWAY="172.16.200.1" # Default Gateway
BCK_GATEWAY="172.16.200.2" # Backup Gateway
SUBNET="172.16.2.0/24" # ip route destination subnet
RMT_IP_1="172.16.2.3" # first remote ip
RMT_IP_2="172.16.2.4" # second remote ip
PING_TIMEOUT="1" # Ping timeout in seconds
#*********************************************************************
# fail fast
set -e
#Check GW
CURRENT_GW=`ip route show | grep $SUBNET | head -n 1 | awk '{print $3}'`
PING_NIC=`ip route show | grep $SUBNET | head -n 1 | awk '{print $5}'` # ping network interface
if [ "$CURRENT_GW" == "$DEF_GATEWAY" ]
then
ping -4 -I $PING_NIC -c 2 -W $PING_TIMEOUT $RMT_IP_1 > /dev/null
PING_1=$?
ping -4 -I $PING_NIC -c 2 -W $PING_TIMEOUT $RMT_IP_2 > /dev/null
PING_2=$?
else
# add static routes to remote ip's
ip route add $RMT_IP_1 via $DEF_GATEWAY
ip route add $RMT_IP_2 via $DEF_GATEWAY
ping -4 -I $PING_NIC -c 2 -W $PING_TIMEOUT $RMT_IP_1 > /dev/null
PING_1=$?
ping -4 -I $PING_NIC -c 2 -W $PING_TIMEOUT $RMT_IP_2 > /dev/null
PING_2=$?
# del static route to remote ip's
ip route del $RMT_IP_1
ip route del $RMT_IP_2
fi
LOG_TIME=`date +%b' '%d' '%T`
# both ping fail
if [ "$PING_1" == "1" ] && [ "$PING_2" == "1" ]
then
if [ "$CURRENT_GW" == "$DEF_GATEWAY" ]
then
ip route del $SUBNET
ip route add $SUBNET via $BCK_GATEWAY
# flushing routing cache
ip route flush cache
echo "$LOG_TIME: $0 - Switched $SUBNET gateway to default with IP $BCK_GATEWAY"
fi
elif [ "$CURRENT_GW" != "$DEF_GATEWAY" ]
then
# switching to default
ip route del $SUBNET
ip route add $SUBNET via $DEF_GATEWAY
ip route flush cache
echo "$LOG_TIME: $0 - Switched $SUBNET gateway to default with IP $DEF_GATEWAY"
fi
|
PS: 以上思路来源(抄袭)自 linux-default-gateway-failover-script