利用 ip monitor 工具实现路由观察分析,并利用 OpenWRT 的 hotplug 事件解决 Zerotier 启动时添加自定义路由的难题。

背景

  1. 公司有多个办公地点,使用zerotier组网
  2. 不是所有办公地点都开放实现互访
  3. 每个办公地点都会定期变动公网ip,每日执行 ifup wan 对电信网络执行重新拨号,因此会导致自己手动添加的 ip route 路由被清除
  4. 如何获得真正加入 zerotier 网络并可用变成了必须解决的问题,因此本文探讨的问题即为此
    1. Zerotier本身依赖UDP打洞技术,属于P2P V*N,没有明确的中心化概念,所以什么时候明确连接上网络是个不可预知的问题
    2. Zerotier本身有所有节点的自定义路由,可以在控制器添加,但本次仅为了单台节点自身添加自定义路由而不希望网络中其他节点也有此路由,因此只能在节点本身动手脚

解决方案

解决方案大体有这么两种

  • 方案1:可以全部通过zerotier custom route实现,全路由转发,再配合iptables ACCEPT来实现访问控制
  • 方案2:仅在需要访问其他子网的路由节点本地添加自定义路由

分析步骤

  1. 打开监控 ip monitor all 之后执行 service zerotier stop
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
[NEIGH]172.16.200.1 dev ztjqgyage2 lladdr d2:10:71:16:e9:16 STALE
[ROUTE]Deleted 172.16.2.0/24 via 172.16.200.1 dev ztjqgyage2 proto static
[NEIGH]172.16.200.1 dev ztjqgyage2 lladdr d2:10:71:16:e9:16 PROBE
[LINK]79: ztjqgyage2: <BROADCAST,MULTICAST> mtu 2800 qdisc fq_codel state DOWN group default
    link/ether d2:73:08:7f:bf:e7 brd ff:ff:ff:ff:ff:ff
[NEIGH]Deleted 172.16.200.1 dev ztjqgyage2 lladdr d2:10:71:16:e9:16 PROBE
[ROUTE]Deleted fe80::/64 dev ztjqgyage2 proto kernel metric 256 pref medium
[ROUTE]Deleted anycast fe80:: dev ztjqgyage2 table local proto kernel metric 0 pref medium
[ROUTE]Deleted local fe80::b8ba:12ff:fedd:2fc0 dev ztjqgyage2 table local proto kernel metric 0 pref medium
[ROUTE]Deleted multicast ff00::/8 dev ztjqgyage2 table local proto kernel metric 256 pref medium
[NEIGH]Deleted ff02::1:ffdd:2fc0 dev ztjqgyage2 lladdr 33:33:ff:dd:2f:c0 NOARP
[NEIGH]Deleted ff02::16 dev ztjqgyage2 lladdr 33:33:00:00:00:16 NOARP
[ADDR]Deleted 79: ztjqgyage2    inet6 fe80::b8ba:12ff:fedd:2fc0/64 scope link
       valid_lft forever preferred_lft forever
[ADDR]Deleted 79: ztjqgyage2    inet 172.16.200.13/24 brd 172.16.200.255 scope global ztjqgyage2
       valid_lft forever preferred_lft forever
[ROUTE]Deleted local 172.16.200.13 dev ztjqgyage2 table local proto kernel scope host src 172.16.200.13
[NETCONF]Deleted inet ztjqgyage2
[NETCONF]Deleted inet6 ztjqgyage2
[LINK]Deleted 79: ztjqgyage2: <BROADCAST,MULTICAST> mtu 2800 qdisc noop state DOWN group default
    link/ether d2:73:08:7f:bf:e7 brd ff:ff:ff:ff:ff:ff
  1. 那么再观察下service zerotier start都干了什么? 先打开监控 ip monitor all,然后执行 service zerotier start
[NETCONF]inet ztjqgyage2 forwarding on rp_filter off mc_forwarding off proxy_neigh off ignore_routes_with_linkdown off
[NETCONF]inet6 ztjqgyage2 forwarding on mc_forwarding off proxy_neigh off ignore_routes_with_linkdown off
[LINK]80: ztjqgyage2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
    link/ether e2:e8:19:70:ce:6a brd ff:ff:ff:ff:ff:ff
[LINK]80: ztjqgyage2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default
    link/ether e2:e8:19:70:ce:6a brd ff:ff:ff:ff:ff:ff
[ROUTE]multicast ff00::/8 dev ztjqgyage2 table local proto kernel metric 256 pref medium
[ROUTE]fe80::/64 dev ztjqgyage2 proto kernel metric 256 pref medium
[LINK]80: ztjqgyage2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default
    link/ether d2:73:08:7f:bf:e7 brd ff:ff:ff:ff:ff:ff
[NEIGH]Deleted ff02::16 dev ztjqgyage2 lladdr 33:33:00:00:00:16 NOARP
[NEIGH]Deleted ff02::1:ff70:ce6a dev ztjqgyage2 lladdr 33:33:ff:70:ce:6a NOARP
[LINK]80: ztjqgyage2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc fq_codel state UNKNOWN group default
    link/ether d2:73:08:7f:bf:e7 brd ff:ff:ff:ff:ff:ff
[ADDR]80: ztjqgyage2    inet 172.16.200.13/24 brd 172.16.200.255 scope global ztjqgyage2
       valid_lft forever preferred_lft forever
[ROUTE]local 172.16.200.13 dev ztjqgyage2 table local proto kernel scope host src 172.16.200.13
[ROUTE]broadcast 172.16.200.255 dev ztjqgyage2 table local proto kernel scope link src 172.16.200.13
[ROUTE]172.16.200.0/24 dev ztjqgyage2 proto kernel scope link src 172.16.200.13
[ROUTE]broadcast 172.16.200.0 dev ztjqgyage2 table local proto kernel scope link src 172.16.200.13
[ROUTE]172.16.2.0/24 via 172.16.200.1 dev ztjqgyage2 proto static
[ROUTE]192.168.3.0/24 via 172.16.200.101 dev ztjqgyage2
[ADDR]80: ztjqgyage2    inet6 fe80::e0e8:19ff:fe70:ce6a/64 scope link
       valid_lft forever preferred_lft forever
[ROUTE]local fe80::e0e8:19ff:fe70:ce6a dev ztjqgyage2 table local proto kernel metric 0 pref medium
[ROUTE]anycast fe80:: dev ztjqgyage2 table local proto kernel metric 0 pref medium
[NEIGH]172.16.200.1 dev ztjqgyage2 lladdr d2:10:71:16:e9:16 REACHABLE
[NEIGH]172.16.200.101 dev ztjqgyage2 lladdr d2:23:9e:74:63:f2 REACHABLE
  1. 从以上的步骤可以分析得知,在一个zerotier网络存续期间,大概会有这么些[EVENT]产生,那么根据我们的主路由 openwrt 所提供的 hotplug 机制,我们可以看哪些hook点可以实现逻辑插入
  • 执行 ls /etc/hotplug.d/
  • 得出 block dhcp iface ipsec neigh net ntp tftp
  • 结论:应该能从neigh net 这两处地方做对应的hook处理。

实现方案

  • 首选方案:[NEIGH]172.16.200.1 dev ztjqgyage2 lladdr d2:10:71:16:e9:16 REACHABLE
    • zerotier 虚拟网口 ztjqgyage2 上的网关ip 172.16.200.1变为 REACHABLE 时进行处理
    • 理想很美好,现实很骨感,实际在service zerotier start/stop时并不能触发任何 hotplug neigh事件
  • 备选方案(实际使用方案)
    • 根据测试,在service zerotier start/stop时具有hotplug.dnet/iface变更通知
    • 那么我们根据事件通知来编写最后的代码,vim /etc/hotplug.d/net/99-zerotier

      目的是为了在zerotier连接上其他p2p节点之后,在当前节点添加自定义路由ip route add 192.168.3.0/24 via 172.16.200.101(方圆E时光访问建中路大本营的局域网,方便运营访问共享192.168.3.x) 由于Zerotier本身连通过程的不确定性(时间上不明确多久可以打通网络),所以代码添加了循环重试

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      
      #!/bin/sh
      # ref: https://openwrt.org/docs/guide-user/base-system/hotplug
      # @author: CsHeng
      # @date: 2021-11-08
      
      # add route to gw-office after zerotier gateway neighbor reachable
      [ "$ACTION" = "add" -a "$INTERFACE" = "ztjqgyage2" ] && {
      	echo "`date` net dev ztjqgyage2 add detected..." >> /var/log/hotplug_net.log
      	r=0
      	# pppoe dial-up will need some more time...zerotier reconnection even~~~
      	while [ $r -ne 1 ]
      	do
      		echo "`date` sleep 5s, waiting for zt connection" >> /var/log/hotplug_net.log
      		sleep 5
      		ip route add 192.168.3.0/24 via 172.16.200.101
      		r=`ip route | grep 192.168.3.0 | wc -l`
      	done
      	echo "`date` ip route add 192.168.3.0/24 via 172.16.200.101" >> /var/log/hotplug_net.log
      }
      
      # custom controller
      zerotier-cli orbit 1e4735524a 1e4735524a
      

对结果进行验证

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
> tail -n 100 /var/log/hotplug_net.log

Tue Nov  9 00:56:03 CST 2021 net dev ztjqgyage2 add detected...
Tue Nov  9 00:56:03 CST 2021 sleep 5s, waiting for zt connection
... # 省略N行
Tue Nov  9 00:59:38 CST 2021 sleep 5s, waiting for zt connection
Tue Nov  9 00:59:43 CST 2021 sleep 5s, waiting for zt connection
Tue Nov  9 00:59:48 CST 2021 ip route add 192.168.3.0/24 via 172.16.200.101
Tue Nov  9 00:59:49 CST 2021 net dev ztjqgyage2 add detected...
Tue Nov  9 00:59:49 CST 2021 sleep 5s, waiting for zt connection
Tue Nov  9 00:59:54 CST 2021 ip route add 192.168.3.0/24 via 172.16.200.101

>  ip route
default via 116.22.200.1 dev pppoe-wan proto static
116.22.200.1 dev pppoe-wan proto kernel scope link src 116.22.200.28
172.16.2.0/24 via 172.16.200.1 dev ztjqgyage2 proto static
172.16.200.0/24 dev ztjqgyage2 proto kernel scope link src 172.16.200.13
192.168.3.0/24 via 172.16.200.101 dev ztjqgyage2
192.168.13.0/24 dev br-lan proto kernel scope link src 192.168.13.1
  • 可以确认到在每天0:56分进行pppoe拨号时,触发了事件通知,我们的代码被执行了,并且最后成功添加了路由 192.168.3.0/24 via 172.16.200.101 dev ztjqgyage2

Ref Docs