Hyper-V, Docker, and Networking Drama

22 Mar 2019

I had a major problem a few hours before giving my Nomad: Kubernetes Without the Complexity talk this morning: the demo stopped working.

Now, the first thing to note is the entire setup of the demo is scripted, and the scripts hadn’t changed. The only thing I had done was restart the machine, and now things were breaking.

The Symptoms

A docker container started inside the guest VMs with a port mapped to the machine’s public IP wasn’t resolvable outside the host.

For example, using a machine based off the bento/ubuntu-16.04 base box, provisioned with docker, running this from inside an SSH connection to the machine would work:

vagrant ssh

# launch a container which can respond to a http get
docker run -d --rm -p 172.127.48.105:5000:5000 registry:latest

# curl by public ip
curl http://172.127.48.105:5000 --silent -w "%{http_code}"   # 200

But running the same curl command on the host would fail:

# container is still running
curl http://172.127.48.105:5000 --silent -w "%{http_code}"   # timeout

Investigation

So it’s 5 hours before the demo (thankfully it’s not 10 minutes before), so let’s start digging into what could be causing this.

Docker Networking

I also was searching for Nomad and Docker networking issues - as I figured I could change the Nomad job to bind the container to all interfaces (e.g. -p 5000:5000) instead of just the one IP. This reply mentioned the docker0 network, and when I checked the guest machines, I saw that this network is also in the 172.* range.

So my guest machines had public addresses which happened to fall in the same range as a separate network adaptor on that machine.

Hyper-V IP Addresses

While I was checking the Windows Firewall to see if anything was weird in there, I stumbled across a rule I’d added to allow exposure of a NodeJS service from my host to Hyper-v guests (but not anywhere else). I noticed that the IP range it defined was 192.168.*, and I now had machines with 172.* addresses.

So the IP address range for guest machines had changed.

The Solution

Luckily, there is a straightforward solution to this:

Reboot until you get the range you want

Really.

The other solution is to use an External Switch in Hyper-V and bridge it with your host’s internet connection, which doesn’t really help me, as I am on a laptop, on different WiFi networks, and sometimes I use a thunderbolt based network adaptor too. And having to update/rebuild machines on every network change would be an absolute pain.

So I rebooted — a lot.

So if anyone from Microsoft is reading this: Please let us configure the Default Switch. Or have a way to recreate it without rebooting at least.

vagrant, docker, hyperv, networking

« RabbitMQ clustering with Consul in Nomad Running a Secure RabbitMQ Cluster in Nomad »