When working with DNS zone files, the most common way to work with SOA serial numbers (which indicate the "version" of the zone file) is to use a date format in YYYYMMDDRR (where RR stands for an increasing counter of revisions).
I've been working with BIND (and more recently PowerDNS) for many many years and I've always made extra sure to not make a typo in the serial number. But after more than 20 years in the IT field it happened; I've mistakenly set the serial number to a future date.
Of course when the mistake happened, I didn't even realize it. Only two days later, when a Let's Encrypt Wildcard certificate needed to be renewed I finally became aware of the error.
When renewing (or creating) a Let's Encrypt certificate with the DNS challenge, the certbot command tells you which DNS record to create for validation. After I created the said entry in the zone file certbot would still fail with an error:
Please deploy a DNS TXT record under the name
_acme-challenge.test.customer.dev with the following value:
6kBtCcs55-kEiOZDNBkvSDiGuH_ZcK_2igPhLSZgo98
Before continuing, verify the record is deployed.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Press Enter to Continue
Waiting for verification...
Cleaning up challenges
Failed authorization procedure. test.customer.dev (dns-01): urn:ietf:params:acme:error:unauthorized :: The client lacks sufficient authorization :: Incorrect TXT record "j0CHCTOepylAyYfCcfAnMeYS-6B7V1GkaSkW-PTCIII" (and 2 more) found at _acme-challenge.test.customer.dev
A manual check of the TXT records confirmed it: There were different TXT entries shown than what I've just deployed. Even after removing all relevant TXT records on _acme-challenge.test.customer.dev, dig still showed previous entries:
ck@mint:~$ dig -t TXT _acme-challenge.test.customer.dev
; <<>> DiG 9.18.18-0ubuntu0.22.04.1-Ubuntu <<>> -t TXT _acme-challenge.test.customer.dev
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23362
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;_acme-challenge.test.customer.dev. IN TXT
;; ANSWER SECTION:
_acme-challenge.test.customer.dev. 166 IN CNAME _acme-test-customer-dev.example.com.
_acme-test-customer-dev.example.com. 166 IN TXT "j0CHCTOepylAyYfCcfAnMeYS-6B7V1GkaSkW-PTCIII"
_acme-test-customer-dev.example.com. 166 IN TXT "5nAAB6MijpqG6IJsfpHKvgINklgDB7oBfufnT4TcKHk"
_acme-test-customer-dev.example.com. 166 IN TXT "Baqg7Wk5_zQk0Cu7Aotg0dF3p1SEAQvvBTSblmLstSk"
;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Mon Feb 12 08:20:44 CET 2024
;; MSG SIZE rcvd: 284
What's going on?
To make sure my zone file changes were actually deployed, I verified the SOA Serial of the domain:
ck@mint:~$ dig -t SOA example.com +short
ns1.example.com. dnsadmin.example.com. 2024091001 10800 3600 604800 38400
Crap! The Serial date was set to September 2024 - but we're in February. My own zone updates with the current date (e.g. 2024021201) would never update the slave servers - as they already have a "newer" serial number starting with 202409... .
From conversations I've had with other IT folks, I know this has happened to many before and they were all able to fix it. But the question is: How do I fix the serial number set to a future date?
The official BIND troubleshooting documentation mentions the following:
Zone serial numbers are just numbers — they are not date-related. However, many people set them to a number that represents a date, usually of the form YYYYMMDDRR. Occasionally they make a mistake and set the serial number to a date in the future, then try to correct it by setting it to the current date. This causes problems because serial numbers are used to indicate that a zone has been updated. If the serial number on the secondary server is lower than the serial number on the primary, the secondary server attempts to update its copy of the zone.
Setting the serial number to a lower number on the primary server than the one on the secondary server means that the secondary will not perform updates to its copy of the zone.
The solution to this is to add 2147483647 (2^31-1) to the number, reload the zone and make sure all secondaries have updated to the new zone serial number, then reset it to the desired number and reload the zone again.
So let's try this and add the mentioned 2147483647 to the current serial:
root@ns1:~# echo $(( 2024021206 + 2147483647))
4171504853
The zone file was adjusted with the result as serial number and BIND reloaded (rndc reload):
root@ns1:~# vi /etc/bind/example.com.hosts
root@ns1:~# head /etc/bind/example.com.hosts
$ttl 86400
example.com. IN SOA ns1.example.com. dnsadmin.example.com. (
4171504853
10800
3600
604800
38400 )
example.com. IN NS ns1.example.com.
example.com. IN NS ns2.example.com.
example.com. IN NS ns3.example.com.
root@ns1:~# rndc reload
server reload successful
Then the SOA was checked again on all name servers:
ck@mint:~$ dig -t SOA example.com @ns1.example.com +short
ns1.example.com. dnsadmin.example.com. 4171504853 10800 3600 604800 38400
ck@mint:~$ dig -t SOA example.com @ns2.example.com +short
ns1.example.com. dnsadmin.example.com. 4171504853 10800 3600 604800 38400
ck@mint:~$ dig -t SOA example.com @ns3.example.com +short
ns1.example.com. dnsadmin.example.com. 4171504853 10800 3600 604800 38400
All name servers now show the new SOA Serial. Time to reset the serial on ns1 (setting it to 2024021206):
root@ns1:~# vi /etc/bind/example.com.hosts
root@ns1:~# head /etc/bind/example.com.hosts
$ttl 86400
example.com. IN SOA ns1.example.com. dnsadmin.example.com. (
2024021206
10800
3600
604800
38400 )
example.com. IN NS ns1.example.com.
example.com. IN NS ns2.example.com.
example.com. IN NS ns3.example.com.
root@ns1:~# rndc reload
server reload successful
SOA verification on all name servers:
ck@mint:~$ dig -t SOA example.com @ns1.example.com +short
ns1.example.com. dnsadmin.example.com. 2024021206 10800 3600 604800 38400
ck@mint:~$ dig -t SOA example.com @ns2.example.com +short
ns1.example.com. dnsadmin.example.com. 4171504853 10800 3600 604800 38400
ck@mint:~$ dig -t SOA example.com @ns3.example.com +short
ns1.example.com. dnsadmin.example.com. 4171504853 10800 3600 604800 38400
Argh! This didn't work, both slave servers have not updated the zone file.
Let's try an alternative: Set the Serial number to the highest possible number.
Note: After writing this post, I realized that I may have misunderstood the BIND documentation. It probably meant to add 2147483647 to the CURRENT (wrong) Serial, not the wanted serial number.
According to Christopher Paquin's blog post, a DNS zone serial number reset can also be done by simply using the highest possible number for the Serial field. This field is a 32-bit counter and the value size is therefore limited (see also related post mentioning the Y2K38 problem).
Let's try it with this approach.
root@ns1:~# vi /etc/bind/example.com.hosts
root@ns1:~# head /etc/bind/example.com.hosts
$ttl 86400
example.com. IN SOA ns1.example.com. dnsadmin.example.com. (
4294967295
10800
3600
604800
38400 )
example.com. IN NS ns1.example.com.
example.com. IN NS ns2.example.com.
example.com. IN NS ns3.example.com.
root@ns1:~# rndc reload
server reload successful
The SOA serial was now set to 4294967295, the highest possible Serial. SOA verfication on all name servers:
ck@mint:~$ dig -t SOA example.com @ns1.example.com +short
ns1.example.com. dnsadmin.example.com. 4294967295 10800 3600 604800 38400
ck@mint:~$ dig -t SOA example.com @ns2.example.com +short
ns1.example.com. dnsadmin.example.com. 4294967295 10800 3600 604800 38400
ck@mint:~$ dig -t SOA example.com @ns3.example.com +short
ns1.example.com. dnsadmin.example.com. 4294967295 10800 3600 604800 38400
So far this has worked. But what about the reset back to 2024021206?
root@ns1:~# vi /etc/bind/example.com.hosts
root@ns1:~# head /etc/bind/example.com.hosts
$ttl 86400
example.com. IN SOA ns1.example.com. dnsadmin.example.com. (
2024021206
10800
3600
604800
38400 )
example.com. IN NS ns1.example.com.
example.com. IN NS ns2.example.com.
example.com. IN NS ns3.example.com.
root@ns1:~# rndc reload
server reload successful
SOA verfication on all DNS servers, again:
ck@mint:~$ dig -t SOA example.com @ns1.example.com +short
ns1.example.com. dnsadmin.example.com. 2024021206 10800 3600 604800 38400
ck@mint:~$ dig -t SOA example.com @ns2.example.com +short
ns1.example.com. dnsadmin.example.com. 2024021206 10800 3600 604800 38400
ck@mint:~$ dig -t SOA example.com @ns3.example.com +short
ns1.example.com. dnsadmin.example.com. 2024021206 10800 3600 604800 38400
YES! Finally the SOA Serial has been reset to the correct date!
Once the zone file's serial number was fixed, the certbot dns renewal worked, too (of course).
Hopefully this was a reminder to my brain to verify the serial before reloading the zone. Let's use this quote from Brazilian lyricist Paulo Coelho:
Everything that happens once can never happen again. But everything that happens twice will surely happen a third time.
While this SOA calculation workaround works fine with BIND DNS servers, there might be a mixed success rate with PowerDNS or other DNS servers.
In one situation the SOA serial on the primary DNS server (running on BIND 9) was correctly reset, but the two slaves (running on PowerDNS) did not reset the SOA serial:
root@linux:~# dig -t SOA example.com @ns1.example.com +short
ns1.example.com. dnsadmin.example.com. 2024021207 10800 3600 604800 38400
root@linux:~# dig -t SOA example.com @ns2.example.com +short
ns1.example.com. dnsadmin.example.com. 4294967295 10800 3600 604800 38400
In this case, the domain/zone needs to be deleted on the slave and re-created. The slave should then launch an AXFR (transfer) request and pull the zone from the primary DNS server again.
After doing this, the secondary/slave servers now match the SOA serial:
root@linux:~# dig -t SOA example.com @ns3.example.com +short
ns1.example.com. dnsadmin.example.com. 2024021207 10800 3600 604800 38400
root@linux:~# dig -t SOA example.com @ns2.example.com +short
ns1.example.com. dnsadmin.example.com. 2024021207 10800 3600 604800 38400
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder