This was more annoying to figure out as this happened as some nodes happened to experience actual networking issues as this was going on, although it's possible networking issues on a node caused this to happen in the first place.

If you are experiencing packet loss over the ZeroTier network but the traffic over the public network is fine, this could be useful to you.

Setup

  • A Kubernetes cluster where the private networking is done over ZeroTier so you can firewall

Symptoms

  • Worker nodes stop reporting kubelet
  • On an affected node:
  • Connection timed out to 127.0.0.1:6443 (kube-proxy to apiserver)
  • ping -i 0.1 <master's ZeroTier IP> results in 10-20% packet loss, however
  • ping -i 0.1 <master's public IP> results in no packet loss

Diagnosis

Use zerotier-cli -j peers and find the entry relating to the node you're having connectivity issues with.

The issue in my case was ZeroTier was attempting to route through flannel.1 (likely due to internal heuristics that private IPs are better), then falling back to the public IP when this failed.

Example of a problematic entry is below:

 {
  "address": "<affected node addr>",
  "latency": 92,
  "paths": [
   {
    "active": false,                # This should be true!
    "address": "<public IP>/9993",
    "expired": false,
    "lastReceive": 1589329373917,
    "lastSend": 1589329373918,
    "preferred": true,
    "trustedPathId": 0
   },
   {
    "active": true,
    "address": "10.42.x.x/9993",    # flannel.1
    "expired": false,               # Turns out it was trying to route through flannel's interface
    "lastReceive": 1589329373917,
    "lastSend": 1589329373918,
    "preferred": true,
    "trustedPathId": 0
   }
  ],
  "role": "LEAF",
  "version": "1.4.6",
  "versionMajor": 1,
  "versionMinor": 4,
  "versionRev": 6
 },

Solution

Prevent ZeroTier from using the flannel.* interface with:

# /var/lib/zerotier-one/local.conf
{
  "settings": {
    "interfacePrefixBlacklist": [ "flannel" ]
  }
}

Restart ZeroTier with /etc/init.d/zerotier-one restart

I hope this helps!