After rolling out global mail queue monitoring, one Ubuntu 20.04 server returned a cryptic error message, instead of the size of the mail queue:
postqueue: fatal: Queue report unavailable - mail system is down
CRITICAL: Error code 69 returned from /usr/bin/mailq
After logging into that server it became clear that Postfix is not running and no SMTP listener was up:
root@ubuntu:~# netstat -lntup|grep 25
# no output
Restart Postfix service. Easy, right? That's what I thought, too.
But a restart did not give me the results I was looking for, as a quick look at the mail error log showed:
root@ubuntu:~# systemctl restart postfix
root@ubuntu:~# tail /var/log/mail.err
Dec 6 18:35:02 ubuntu postfix/postqueue[738031]: fatal: Queue report unavailable - mail system is down
Dec 6 18:36:00 ubuntu postfix/postqueue[738087]: fatal: Queue report unavailable - mail system is down
Dec 6 18:36:53 ubuntu postfix/postqueue[738218]: fatal: Queue report unavailable - mail system is down
Dec 6 18:36:58 ubuntu postfix/postqueue[738225]: fatal: Queue report unavailable - mail system is down
Dec 6 18:37:28 ubuntu postfix/postfix-script[738316]: fatal: the Postfix mail system is already running
At short moment after the Postfix service was restarted, the postfix-script logged that Postfix is already running. But no process nor a listener was running. There was a PID file from Postfix's master process though:
root@ubuntu:~# ll /var/spool/postfix/pid/master.pid
-rw------- 2 postfix postfix 33 Apr 3 2019 /var/spool/postfix/pid/master.pid
Curious is certainly the old age of that pid file. Could it be that Postfix has been dead on that server since April 2019?
But even if a pid file exists, a restart of the service should be possible. Let's take a closer look what is causing the error logged in mail.err.
By looking for that logged string, we can identify the postfix-script command as source of the log entries (besides that it also logged itself in the log event):
root@ubuntu:~# grep -rni "the Postfix mail system is already running" /usr/lib/postfix/*
/usr/lib/postfix/sbin/postfix-script:135: $FATAL the Postfix mail system is already running
/usr/lib/postfix/sbin/postfix-script:149: $FATAL the Postfix mail system is already running
Note: The same file (postfix-script) can also be found in /etc/postfix/.
By looking at one of the mentioned lines (135) shows the following code snippet:
quick-start)
$daemon_directory/master -t 2>/dev/null || {
$FATAL the Postfix mail system is already running
exit 1
}
Here we are inside the "quick-start" action. It only does one thing: Launch $daemon_directory/master -t and if the return code is not true (or 0 for that matter), return "$FATAL the Postfix mail system is already running".
To run this command manually, we first need to find where this master command is located. For this we need to find the definition of the $daemon_directory variable. Luckily a grep through /etc/postfix/ quickly reveals it:
root@ubuntu:~# grep daemon_directory /etc/postfix/* -rni
/etc/postfix/main.cf.proto:57:# The daemon_directory parameter specifies the location of all Postfix
/etc/postfix/main.cf.proto:61:daemon_directory = /usr/lib/postfix/sbin
[...]
Alright! Now we know daemon_directory is /usr/lib/postfix/sbin. Let's launch the master command from there:
root@ubuntu:/usr/lib/postfix/sbin# ./master -t 2
root@ubuntu:/usr/lib/postfix/sbin# echo $?
1
The return code of that command is definitely not 0. But what does that master command actually do? We can find this information from Postfix's man page on master:
-t Test mode. Return a zero exit status when the master.pid lock
file does not exist or when that file is not locked. This is
evidence that the master(8) daemon is not running.
The master -t command is therefore used to check whether or not a lock file exists. Only if no lock file is detected, the command will return 0 - which we need in order to complete the Postfix restart through Systemd.
Some additional research led to a blog entry from Theresa Arzadon-Labajo revealing:
Another log showed:
postfix/master[1490]: fatal: open lock file /var/lib/postfix/master.lock: unable to set exclusive lock: Resource temporarily unavailable
Let's take a look into /var/lib/postfix:
root@ubuntu:~# ls -la /var/lib/postfix/
total 32
drwxr-xr-x 2 postfix postfix 4096 Mar 14 2019 ./
drwxr-xr-x 50 root root 4096 Nov 23 12:03 ../
-rw------- 2 postfix postfix 33 Apr 3 2019 master.lock
-rw------- 1 postfix postfix 1024 Apr 25 2019 prng_exch
-rw------- 1 postfix postfix 8192 Apr 3 2019 smtp_tls_session_cache.db
-rw------- 1 postfix postfix 8192 Apr 3 2019 smtpd_scache.db
And there it is: The master.lock file. Conveniently with the exact same date as the previously discovered PID file.
Let's remove the master.lock file and restart Postfix once more:
root@ubuntu:~# rm /var/lib/postfix/master.lock
root@ubuntu:~# ls -la /var/lib/postfix/master.lock
ls: cannot access '/var/lib/postfix/master.lock': No such file or directory
root@ubuntu:~# systemctl restart postfix && tail /var/log/mail.log
Dec 6 19:28:33 ubuntu postfix/postfix-script[743306]: warning: /var/spool/postfix/etc/services and /etc/services differ
Dec 6 19:28:33 ubuntu postfix/postfix-script[743310]: warning: /var/spool/postfix/etc/resolv.conf and /etc/resolv.conf differ
Dec 6 19:28:33 ubuntu postfix/postfix-script[743316]: warning: /var/spool/postfix/lib/x86_64-linux-gnu/libgcc_s.so.1 and /lib/x86_64-linux-gnu/libgcc_s.so.1 differ
Dec 6 19:28:33 ubuntu postfix/postfix-script[743318]: warning: /var/spool/postfix/lib/x86_64-linux-gnu/libnss_systemd.so.2 and /lib/x86_64-linux-gnu/libnss_systemd.so.2 differ
Dec 6 19:28:33 ubuntu postfix/postfix-script[743324]: starting the Postfix mail system
The Postfix process was (finally) started this time and the SMTP listener on port 25 started up:
root@ubuntu:~# netstat -lntup|grep 25
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 743326/master
If you're unable to restart Postfix and no master processe or smtp listener show up (even though Systemd's Postfix service says active), make sure there is no master.lock file blocking a start of a new Postfix master process.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder