For an internal DNS resolver/cacher, I decided to use Unbound. However after the installation of unbound 1.5.8-1ubuntu1 and an initial configuration, DNS lookups didn't work.
The config:
server:
interface: 0.0.0.0
access-control: 10.0.0.0/16 allow
access-control: 127.0.0.0/8 allow
access-control: 172.16.0.0/12 allow
access-control: 192.168.0.0/16 allow
verbosity: 1
root-hints: /var/lib/unbound/root.hints
forward-zone:
name: "."
forward-addr: domaincontroller1.example.com
forward-addr: domaincontroller2.example.com
forward-addr: domaincontroller3.example.com
#forward-addr: 8.8.4.4 # Google
#forward-addr: 8.8.8.8 # Google
DNS resolving didn't work. Unbound returned a SERVFAIL error.
# dig claudiokuenzler.com A @localhost
; <<>> DiG 9.10.3-P4-Ubuntu <<>> claudiokuenzler.com A @localhost
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 8470
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;claudiokuenzler.com. IN A
;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Feb 16 07:04:55 CET 2017
;; MSG SIZE rcvd: 37
When I restarted Unbound, I was puzzled how slow the restart was and eventually some errors showed up in syslog:
Feb 16 06:46:52 dnscacher systemd[1]: Stopping unbound.service...
Feb 16 06:46:52 dnscacher unbound[20218]: * Stopping DNS server unbound
Feb 16 06:46:52 dnscacher unbound: [19904:0] info: service stopped (unbound 1.5.8).
Feb 16 06:46:52 dnscacher unbound: [19904:0] info: server stats for thread 0: 8 queries, 4 answers from cache, 4 recursions, 0 prefetch
Feb 16 06:46:52 dnscacher unbound: [19904:0] info: server stats for thread 0: requestlist max 1 avg 0.5 exceeded 0 jostled 0
Feb 16 06:46:52 dnscacher unbound: [19904:0] info: average recursion processing time 0.183547 sec
Feb 16 06:46:52 dnscacher unbound: [19904:0] info: histogram of recursion processing times
Feb 16 06:46:52 dnscacher unbound: [19904:0] info: [25%]=0.000384 median[50%]=0.000512 [75%]=0.393216
Feb 16 06:46:52 dnscacher unbound: [19904:0] info: lower(secs) upper(secs) recursions
Feb 16 06:46:52 dnscacher unbound: [19904:0] info: 0.000256 0.000512 2
Feb 16 06:46:52 dnscacher unbound: [19904:0] info: 0.262144 0.524288 2
Feb 16 06:46:52 dnscacher unbound[20218]: ...done.
Feb 16 06:46:52 dnscacher systemd[1]: Stopped unbound.service.
Feb 16 06:46:52 dnscacher systemd[1]: Starting unbound.service...
Feb 16 06:46:52 dnscacher unbound[20247]: * Starting DNS server unbound
Feb 16 06:47:42 dnscacher unbound-anchor: /var/lib/unbound/root.key has content
Feb 16 06:47:42 dnscacher unbound-anchor: fail: the anchor is NOT ok and could not be fixed
Feb 16 06:47:42 dnscacher unbound: [20339:0] notice: init module 0: validator
Feb 16 06:47:42 dnscacher unbound: [20339:0] notice: init module 1: iterator
Feb 16 06:47:42 dnscacher unbound: [20339:0] info: start of service (unbound 1.5.8).
Feb 16 06:47:42 dnscacher unbound: [20339:0] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN
Feb 16 06:47:42 dnscacher unbound: message repeated 5 times: [ [20339:0] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN]
Feb 16 06:47:42 dnscacher unbound[20247]: ...done.
Feb 16 06:47:42 dnscacher systemd[1]: Started unbound.service.
Feb 16 06:47:42 dnscacher systemd[1]: Reached target Host and Network Name Lookups.
It turned out that the slow start is caused by the unbound-achor verification, which can be manually triggered using the following command:
# time unbound-anchor -a /var/lib/unbound/root.key -v
/var/lib/unbound/root.key has content
fail: the anchor is NOT ok and could not be fixed
real 0m50.453s
user 0m0.004s
sys 0m0.012s
Here we have the 50 seconds which slowed down the start (see times in the syslog log above). By using tcpdump, I was able to determine what happens during the start (and what causes this slow startup):
# service unbound restart & 2>&1; tcpdump port 53
[1] 11942
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens160, link-type EN10MB (Ethernet), capture size 262144 bytes
10:58:07.788300 IP dnscacher.23444 > 192.5.5.241.53: 1053% [1au] NS? . (28)
10:58:08.164795 IP dnscacher.58715 > 192.112.36.4.53: 41519% [1au] NS? . (28)
10:58:08.541290 IP dnscacher.26205 > 192.112.36.4.53: 52241% [1au] NS? . (28)
10:58:08.917835 IP dnscacher.18709 > 199.7.91.13.53: 42065% [1au] NS? . (28)
10:58:09.294371 IP dnscacher.42646 > 199.7.91.13.53: 21034% [1au] NS? . (28)
10:58:09.670891 IP dnscacher.56481 > 198.97.190.53.53: 9754% [1au] NS? . (28)
10:58:10.047466 IP dnscacher.61621 > 198.97.190.53.53: 51384% [1au] NS? . (28)
10:58:10.424028 IP dnscacher.37874 > 192.5.5.241.53: 14983% [1au] NS? . (28)
10:58:11.176957 IP dnscacher.43299 > 192.5.5.241.53: 31294% [1au] NS? . (28)
[...] # shortened output
Unbound tried to connect to these and some more IP addresses. They look kinda familiar, don't they? Ah, of course, the DNS root servers, the spine of the Internet!
# grep 192.5.5.241 /var/lib/unbound/root.hints
F.ROOT-SERVERS.NET. 3600000 A 192.5.5.241
As this internal DNS resolver/cacher doesn't have Internet access (and therefore no DNS access to the root servers), this explains the slow start. But why wouldn't Unbound show any results at all? The connectivity to the internal Active Directory controllers (domaincontroller1-3) works.
Here I stumbled across Debian bug report #699641:
"you can set the "domain-insecure" option on your local domains in order to prevent DNSSEC failures from impacting the resolution of those domains"
So this is the actual workaround (disable DNSSEC) but doesn't explain the reason. There are two reasons for Unbound unable to show results for the resolved domains:
1.Missing Internet connection
The error in syslog gives a good hint:
Feb 16 06:47:42 dnscacher unbound: [20339:0] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN
Feb 16 06:47:42 dnscacher unbound: message repeated 5 times: [ [20339:0] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN]
As tcpdump revealed, Unbound tried to connect to all DNS Root servers. This happened not just for fun, Unbound tried to verify the DNSKEY in order to enable DNSSEC. As the Internet connection is non-existant on this host, the verification didn't work.
2. Unbound runs in DNSSEC mode
Unbound runs by default in DNSSEC mode, at least with the base installation on Ubuntu. The failed DNSKEY validation causes Unbound to stop working for DNSSEC requests - which is enabled. All non-DNSSEC requests return a SERVFAIL status.
Solution: Either make sure the cacher host has Internet access (at least to the DNS Root servers) or disable DNSSEC completely.
As this is a DNS resolver and cacher for internal networks only, I didn't see the necessity for DNSSEC and disabled it. In the Ubuntu setup two changes are necessary in order to disable DNSSEC:
Added domain-insecure option in unbound.conf:
server:
interface: 0.0.0.0
access-control: 10.0.0.0/16 allow
access-control: 127.0.0.0/8 allow
access-control: 172.16.0.0/12 allow
access-control: 192.168.0.0/16 allow
verbosity: 1
root-hints: /var/lib/unbound/root.hints
domain-insecure: *
But that's not enough. There's also a special config file active (/etc/unbound/unbound.conf.d/root-auto-trust-anchor-file.conf). In this file I commented/disabled the auto-trust-anchor-file option:
# cat /etc/unbound/unbound.conf.d/root-auto-trust-anchor-file.conf
server:
# The following line will configure unbound to perform cryptographic
# DNSSEC validation using the root trust anchor.
#auto-trust-anchor-file: "/var/lib/unbound/root.key"
Then another restart of Unbound:
# service unbound restart & 2>&1; tail -f /var/log/syslog
Feb 16 07:11:11 dnscacher unbound[23185]: * Starting DNS server unbound
Feb 16 07:12:01 dnscacher unbound-anchor: /var/lib/unbound/root.key has content
Feb 16 07:12:01 dnscacher unbound-anchor: fail: the anchor is NOT ok and could not be fixed
Feb 16 07:12:01 dnscacher unbound: [23275:0] warning: did not exit gracefully last time (22816)
Feb 16 07:12:01 dnscacher unbound: [23276:0] notice: init module 0: validator
Feb 16 07:12:01 dnscacher unbound: [23276:0] notice: init module 1: iterator
Feb 16 07:12:01 dnscacher unbound: [23276:0] info: start of service (unbound 1.5.8).
Feb 16 07:12:01 dnscacher unbound[23185]: ...done.
Feb 16 07:12:01 dnscacher systemd[1]: Started unbound.service.
Feb 16 07:12:01 dnscacher systemd[1]: Reached target Host and Network Name Lookups.
Yes, it still takes 50 seconds to start due to the missing Internet connection, but DNS resolution now works:
# dig -t A www.claudiokuenzler.com @localhost
; <<>> DiG 9.10.3-P4-Ubuntu <<>> -t A www.claudiokuenzler.com @localhost
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15474
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.claudiokuenzler.com. IN A
;; ANSWER SECTION:
www.claudiokuenzler.com. 38400 IN A 144.76.83.23
;; Query time: 47 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Feb 16 07:12:33 CET 2017
;; MSG SIZE rcvd: 68
And the already queried DNS record can now be found in Unbound's cache:
# unbound-control dump_cache | grep claudiokuenzler
www.claudiokuenzler.com. 23458 IN A 144.76.83.23
msg www.claudiokuenzler.com. IN A 33152 1 23458 2 1 0 0
www.claudiokuenzler.com. IN A 0
Note: Even if I disabled DNSSEC in this case, it's still wise to open the firewall for this host to access the DNS Root servers. Not only to avoid the slow start or to keep DNSSEC as a possibility, but rather because this is the design of DNS. The Root servers should be queried if forwarders return no results.
Update February 17th 2017: As I mentioned at the end, a DNS connection to the Root servers should work if possible. So the missing firewall rules are now created and a restart is now as quick as expected:
# time service unbound restart & 2>&1; tail -f /var/log/syslog
Feb 17 11:37:15 dnscacher systemd[1]: Stopping unbound.service...
Feb 17 11:37:15 dnscacher unbound[12066]: * Stopping DNS server unbound
Feb 17 11:37:15 dnscacher unbound: [4569:0] info: service stopped (unbound 1.5.8).
Feb 17 11:37:15 dnscacher unbound: [4569:0] info: server stats for thread 0: 766 queries, 763 answers from cache, 3 recursions, 0 prefetch
Feb 17 11:37:15 dnscacher unbound: [4569:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
Feb 17 11:37:15 dnscacher unbound: [4569:0] info: average recursion processing time 0.000464 sec
Feb 17 11:37:15 dnscacher unbound: [4569:0] info: histogram of recursion processing times
Feb 17 11:37:15 dnscacher unbound: [4569:0] info: [25%]=0 median[50%]=0 [75%]=0
Feb 17 11:37:15 dnscacher unbound: [4569:0] info: lower(secs) upper(secs) recursions
Feb 17 11:37:15 dnscacher unbound: [4569:0] info: 0.000256 0.000512 3
Feb 17 11:37:15 dnscacher unbound[12066]: ...done.
Feb 17 11:37:15 dnscacher systemd[1]: Stopped unbound.service.
Feb 17 11:37:15 dnscacher systemd[1]: Starting unbound.service...
Feb 17 11:37:15 dnscacher unbound[12096]: * Starting DNS server unbound
Feb 17 11:37:16 dnscacher unbound-anchor: /var/lib/unbound/root.key has content
Feb 17 11:37:16 dnscacher unbound-anchor: success: the anchor is ok
Feb 17 11:37:16 dnscacher unbound: [12112:0] notice: init module 0: validator
Feb 17 11:37:16 dnscacher unbound: [12112:0] notice: init module 1: iterator
Feb 17 11:37:16 dnscacher unbound: [12112:0] info: start of service (unbound 1.5.8).
Feb 17 11:37:16 dnscacher unbound[12096]: ...done.
Feb 17 11:37:16 dnscacher systemd[1]: Started unbound.service.
real 0m0.152s
user 0m0.000s
sys 0m0.000s
Feb 17 11:37:16 dnscacher systemd[1]: Reached target Host and Network Name Lookups.
And as expected, the DNSKEY validation is now working, too:
Feb 17 11:37:16 dnscacher unbound-anchor: /var/lib/unbound/root.key has content
Feb 17 11:37:16 dnscacher unbound-anchor: success: the anchor is ok
ck from Switzerland wrote on Apr 10th, 2017:
The firewall rules allowed the outgoing connections to the DNS Root Servers (see https://www.iana.org/domains/root/servers) using tcp and udp 53.
brianx from usa wrote on Apr 8th, 2017:
The February 17th 2017 says that firewall rules fixed the fail: the anchor is NOT ok and could not be fixed error, but it's unclear what rules were added.
Does this require forwarding in specific udp or tcp ports or will the existing NAT forwards of my generic firewall be enough?
Dominik from wrote on Feb 16th, 2017:
I like it :)
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder