背景

我们总共有13台服务器,1个24口交换机(没有路由器),仅购买了两个公网IP,也就是说仅有两台服务器可以直接连通公网(全配置为全转发路由器),而其他机器得通过NAT来访问公网。

问题

存在的问题:托管在机房的服务器由于是二手硬件,难免会偶尔直接崩溃无法开机。。。而我们没有实际上的路由器,若网关服务器炸了,作为网络内需要通过NAT来上网的机器,该如何自动切换到备用网关呢?

解决思路

由于我们并不是一台服务器上面多条物理网线,所以没办法使用单台服务器内的网口上下线事件来作为切换通知 所以解决方案是:那部分需要做NAT的机器,通过自身定时检测是否能通过网关访问公网,若不能则切换为备用网关

前提条件

  • 两台能分配到公网的机器开启全路由模式
    • 内核允许转发
    • iptables允许转发
  • 公司主路由(openwrt 系统)和托管机房服务器通过 Zerotier 组网,并且两台公网服务器分别分配到 zerotier ip 172.16.200.1172.16.200.2

解决步骤

  1. 增加ping包的目标路由表、ping指定网络接口,可以在不影响现有网络运行的情况下达到定向检测网络接口可用性

    ping -I $NIC -c 2 -W $PING_TIMEOUT $REMOTE_IP > /dev/null # 通过$NIC网络接口发出对$REMOTE_IP的ping包,数量2个,若超过超时时间$PING_TIMEOUT(seconds)则认为是失败,> /dev/null 不输出ping命令的结果到控制台 PING_RESULT=$? # 通过获取上一次命令执行成功的结果,成功为0,失败为1

  2. 检测主网关和备用网关可用性,若主网关可用则优先使用主网关
  3. 添加crontab任务,每分钟执行一次判断以达到动态切换作用
            • xxx.sh > /tmp/route_failover.log

实现切换的脚本参考(适用于Linux服务器)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
#!/bin/bash
# 
#  Linux default gateway failover script [route via lan]
# @author CsHeng 2020.06.11
# ref: https://blog.rapellys.biz/2014/10/18/linux-default-gateway-failover-script/
# 
#*********************************************************************
#       Configuration
#*********************************************************************
DEF_GATEWAY="172.16.2.1"      # Default Gateway
BCK_GATEWAY="172.16.2.2"     # Backup Gateway
SUBNET="default"                      # ip route destination subnet
RMT_IP_1="119.29.29.29"           # first remote ip
RMT_IP_2="223.5.5.5"                # second remote ip
PING_TIMEOUT="1"                    # Ping timeout in seconds
#*********************************************************************

# fail fast
set -e

# check user
if [ `whoami` != "root" ]
then
        echo "Failover script must be run as root!"
        exit 1
fi

#Check GW
CURRENT_GW=`ip route show | grep $SUBNET | head -n 1 | awk '{print $3}'`
PING_NIC=`ip route show | grep $SUBNET | head -n 1 | awk '{print $5}'`  # ping network interface
if [ "$CURRENT_GW" == "$DEF_GATEWAY" ]
then
        ping -4 -I $PING_NIC -c 2 -W $PING_TIMEOUT $RMT_IP_1 > /dev/null
        PING_1=$?
        ping -4 -I $PING_NIC -c 2 -W $PING_TIMEOUT $RMT_IP_2 > /dev/null
        PING_2=$?
else
        # add static routes to remote ip's
        ip route add $RMT_IP_1 via $DEF_GATEWAY
        ip route add $RMT_IP_2 via $DEF_GATEWAY
        ping -4 -I $PING_NIC -c 2 -W $PING_TIMEOUT $RMT_IP_1 > /dev/null
        PING_1=$?
        ping -4 -I $PING_NIC -c 2 -W $PING_TIMEOUT $RMT_IP_2 > /dev/null
        PING_2=$?
        # del static route to remote ip's
        ip route del $RMT_IP_1
        ip route del $RMT_IP_2
fi

LOG_TIME=`date +%b' '%d' '%T`

# both ping fail
if [ "$PING_1" == "1" ] && [ "$PING_2" == "1" ]
then
        if [ "$CURRENT_GW" == "$DEF_GATEWAY" ]
        then
                ip route del $SUBNET
                ip route add $SUBNET via $BCK_GATEWAY
                # flushing routing cache
                ip route flush cache
                echo "$LOG_TIME: $0 - Switched $SUBNET gateway to default with IP $BCK_GATEWAY"
        fi

elif [ "$CURRENT_GW" != "$DEF_GATEWAY" ]
then
        # switching to default
        ip route del $SUBNET
        ip route add $SUBNET via $DEF_GATEWAY
        ip route flush cache
        echo "$LOG_TIME: $0 - Switched $SUBNET gateway to default with IP $DEF_GATEWAY"
fi

扩展阅读

由于我们使用zerotier将公司网络和托管机房进行组网,因此公司主路由器也应该做动态路由切换 另外一份实现的Shell参考(适用于OpenWRT,除了极个别兼容改动,主体逻辑同Linux Shell)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#!/bin/sh
# 
#  Linux dynamic route failover script [route via zerotier]
# @author CsHeng 2020.06.11
# ref: https://blog.rapellys.biz/2014/10/18/linux-default-gateway-failover-script/
# 
#*********************************************************************
#       Configuration
#*********************************************************************
DEF_GATEWAY="172.16.200.1"      # Default Gateway
BCK_GATEWAY="172.16.200.2"    # Backup Gateway
SUBNET="172.16.2.0/24"              # ip route destination subnet
RMT_IP_1="172.16.2.3"                  # first remote ip
RMT_IP_2="172.16.2.4"                 # second remote ip
PING_TIMEOUT="1"                      # Ping timeout in seconds
#*********************************************************************

# fail fast
set -e

#Check GW
CURRENT_GW=`ip route show | grep $SUBNET | head -n 1 | awk '{print $3}'`
PING_NIC=`ip route show | grep $SUBNET | head -n 1 | awk '{print $5}'`  # ping network interface
if [ "$CURRENT_GW" == "$DEF_GATEWAY" ]
then
        ping -4 -I $PING_NIC -c 2 -W $PING_TIMEOUT $RMT_IP_1 > /dev/null
        PING_1=$?
        ping -4 -I $PING_NIC -c 2 -W $PING_TIMEOUT $RMT_IP_2 > /dev/null
        PING_2=$?
else
        # add static routes to remote ip's
        ip route add $RMT_IP_1 via $DEF_GATEWAY
        ip route add $RMT_IP_2 via $DEF_GATEWAY
        ping -4 -I $PING_NIC -c 2 -W $PING_TIMEOUT $RMT_IP_1 > /dev/null
        PING_1=$?
        ping -4 -I $PING_NIC -c 2 -W $PING_TIMEOUT $RMT_IP_2 > /dev/null
        PING_2=$?
        # del static route to remote ip's
        ip route del $RMT_IP_1
        ip route del $RMT_IP_2
fi

LOG_TIME=`date +%b' '%d' '%T`

# both ping fail
if [ "$PING_1" == "1" ] && [ "$PING_2" == "1" ]
then
        if [ "$CURRENT_GW" == "$DEF_GATEWAY" ]
        then
                ip route del $SUBNET
                ip route add $SUBNET via $BCK_GATEWAY
                # flushing routing cache
                ip route flush cache
                echo "$LOG_TIME: $0 - Switched $SUBNET gateway to default with IP $BCK_GATEWAY"
        fi

elif [ "$CURRENT_GW" != "$DEF_GATEWAY" ]
then
        # switching to default
        ip route del $SUBNET
        ip route add $SUBNET via $DEF_GATEWAY
        ip route flush cache
        echo "$LOG_TIME: $0 - Switched $SUBNET gateway to default with IP $DEF_GATEWAY"
fi

PS: 以上思路来源(抄袭)自 linux-default-gateway-failover-script