I noticed when running an Alpine based virtual machine with Consul DNS forwarding set up, that sometimes the machine couldn’t resolve *.consul
domains, but not in a consistent manner. Inspecting the logs looked like the request was being made and responded to successfully, but the result was being ignored.
After a lot of googling and frustration, I was able to track down that it’s down to a difference (or optimisation) in musl libc, which glibc doesn’t do. From Musl libc’s Functional differences from glibc page, we can see under the Name Resolver/DNS section the relevant information:
Traditional resolvers, including glibc’s, make use of multiple nameserver lines in resolv.conf by trying each one in sequence and falling to the next after one times out. musl’s resolver queries them all in parallel and accepts whichever response arrives first.
The machine’s /etc/resolv.conf
file has two nameserver
specified:
nameserver 127.0.0.1
nameserver 192.168.121.1
The first is our Unbound
instance which handles the forwarding to Consul, and the second is the DHCP set DNS server, in this case, libvirt/qemu’s dnsmasq instance.
When running in a glibc based system, queries go to the first nameserver
, and then if that can’t resolve the request, it is then sent to the next nameserver
, and so forth. As Alpine Linux uses muslc, it makes the requests in parallel and uses the response from whichever response comes back first.
When the DHCP DNS server is a network hop away, the latency involved means our resolution usually works, as the queries will hit the local DNS and get a response first. However, when the DHCP DNS is not that far away, for example when it is the DNS server that libvirt runs in the virtual network the machine is attached to, it becomes much more likely to get a response from that DNS server first, causing the failures I was seeing.
The solution to this is to change the setup so that all requests go to Unbound, which can then decide where to send them on to. This also has the additional benefits of making all DNS requests work the same on all systems; regardless of glibc or muslc being used.
Rebuilding DNS Resolution
You can follow the same instructions in my previous Consul DNS forwarding post to setup Consul, as that is already in the right state for us.
Once Consul is up and running, it’s time to fix the rest of our pipeline.
Unbound
First, install unbound
and configure it to start on boot:
apk add unbound
rc-update add unbound
The unbound config file (/etc/unbound/unbound.conf
) is almost the same as the previous version, except we also have an include
statement, pointing to a second config file, which we will generate shortly:
server:
verbosity: 1
do-not-query-localhost: no
domain-insecure: "consul"
stub-zone:
name: "consul"
stub-addr: 127.0.0.1@8600
include: "/etc/unbound/forward.conf"
Dhclient
Next, we install dhclient
so that we can make use of it’s hooks feature to generate our additional unbound config file.
apk add dhclient
Create a config file for dhclient (/etc/dhcp/dhclient.conf
), which again is almost the same as the previous post, but this time doesn’t specify prepend domain-name-servers
:
option rfc3442-classless-static-routes code 121 = array of unsigned integer 8;
send host-name = gethostname();
request subnet-mask, broadcast-address, time-offset, routers,
domain-name, domain-name-servers, domain-search, host-name,
dhcp6.name-servers, dhcp6.domain-search, dhcp6.fqdn, dhcp6.sntp-servers,
netbios-name-servers, netbios-scope, interface-mtu,
rfc3442-classless-static-routes, ntp-servers;
Now we can write two hooks. The first is an enter hook, which we can use to write the forward.conf
file out.
touch /etc/dhclient-enter-hooks
chmod +x /etc/dhclient-enter-hooks
The content is a single statement to write the new_domain_name_servers
value into a forward-zone
for unbound:
#!/bin/sh
(
cat <<-EOF
forward-zone:
name: "."
forward-addr: ${new_domain_name_servers}
EOF
) | sudo tee /etc/unbound/forward.conf
The second hook is an exit ook, which runs after dhclient has finished writing out all the files it controls (such as /etc/resolv.conf
):
touch /etc/dhclient-exit-hooks
chmod +x /etc/dhclient-exit-hooks
The content is a single sed
statement to replace the address of nameserver
directives written to the /etc/resolv.conf
with the unbound address:
#!/bin/sh
sudo sed -i 's/nameserver.*/nameserver 127.0.0.1/g' /etc/resolv.conf
It’s worth noting; we could put the content of the enter
hook into the exit
hook if you would rather.
Finally, we can delete our current resolv.conf
and restart the networking service:
rm /etc/resolv.conf # hack due to it dhclient making an invalid `chown` call.
rc-service networking restart
Testing
We can now test that we can resolve the three kinds of address we care about:
dig consul.service.consul
- should return theeth0
ip of the machinedig alpinetest.karhu.xyz
- should be resolved by libvirt’s dnsmasq instancedig example.com
- should be resolved by an upstream DNS server
Conculsion
This was an interesting and somewhat annoying problem to solve, but it means I have a more robust setup in my virtual machines now. It’s interesting to note that if the DNS server from DHCP were not a local instance, the network latency added would make all the system function properly most of the time, as the local instance would answer before the remote instance could.