It's been a couple of weeks since I set up a PostgreSQL replication and added it to our monitoring system (see How to monitor a PostgreSQL replication) and it has been running smoothly so far. But in the past few days a disk usage warning popped up.
Although the databases themselves only use around 10GB of disk space, the WAL files (especially the archived WAL files) eat 63GB!
This is because by default the archived WAL files are kept forever if "archive_mode" is set to on in the PostgreSQL config:
archive_mode = on # enables archiving; off, on, or always
archive_command = 'cp %p /var/lib/postgresql/9.6/main/archive/%f'
I thought the solution is easy: I just disable the archiv_mode on the master and enable it on the replica (a hot standby). NOPE! I was following the replica as the WAL files were rotating through (I have wal_keep_segments = 32) but no files in the archive directory were created.
A look at an older mail from February 2014 in the PostgreSQL mailing list reveals:
"It works fine, only the server will not generate WAL while it is in recovery. As soon as you promote the standby, it will archive ist WALs."
A hot_standby replica server is basically ALWAYS running in recovery; means that the "archive_command" will never run on it. Lesson 1 learned: Cleaning up must be done on the master server.
Note: This is only true for hot_standby, it may be different for other kinds of replication modes.
To clean up the archived WAL files, there's a special command pg_archivecleanup. The program can be added into the recovery.conf on a standby server (not hot_standby!) or used as standalone command:
pg_archivecleanup [option...] archivelocation oldestkeptwalfile
I decided to go with the standalone command and build a wrapper around the command. This resulted in a shell script walarchivecleanup.sh. The script allows different options and is able to dynamically looking up a the "oldestkeptwalfile" given by the a max age parameter (-a). A specific "oldestkeptwalfile" can also be given (-f).
Example:
# ./walarchivecleanup.sh -p /var/lib/postgresql/9.6/main/archive -a 14 -d
pg_archivecleanup: keep WAL file "/var/lib/postgresql/9.6/main/archive/0000000100000002000000B6" and later
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000E6"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000002000000B1"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000B0"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/000000010000000200000056"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000008F"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000006F"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000BC"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000A2"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000B6"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000A4"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000004F"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000D0"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000004E"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000F1"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000002000000B5"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/000000010000000200000070"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000001C"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000002000000B4"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/000000010000000200000039"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000E0"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000FD"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000003E"
[...]
General information and usage:
$ ./walarchivecleanup.sh
./walarchivecleanup.sh (c) 2017 Claudio Kuenzler
This script helps to clean up archived WAL logs on a PostgreSQL master server using the pg_archivecleanup command.
Please note that WAL archiving currently only works on a master server (as of 9.6).
---------------------
Options:
-p Path to the archived WAL logs (e.g. /var/lib/postgresql/9.6/main/archive)
-a Age of archived logs to keep (days), anything older will be deleted
-f Specify a certain archived WAL file, anything older than this file will be deleted
Note: If you use -f, it will override -a parameter
-c Full path to pg_archivecleanup command (if not found in $PATH)
-d Show debug information
-n Dry run (simulation only)
---------------------
Usage: ./walarchivecleanup.sh -p archivepath -a age (days) [-d debug] [-f archivefile] [-c path_to_pg_archivecleanup]
Example 1: ./walarchivecleanup.sh -p /var/lib/postgresql/9.6/main/archive -a 10
Example 2: ./walarchivecleanup.sh -p /var/lib/postgresql/9.6/main/archive -f 00000001000000010000001E
---------------------
Cronjob example: 00 03 * * * /root/scripts/walarchivecleanup.sh -p /var/lib/postgresql/9.6/main/archive -a 14
The script is now published on Github and can be found here: https://github.com/Napsty/scripts/blob/master/pgsql/walarchivecleanup.sh. Enjoy!
ck from Switzerland wrote on Aug 31st, 2023:
Hi stoffel, thank you for the comment. I have not yet tested this on PG 15, still on older versions here ;-). But yes, it is possible that some commands have changed in the newer version.
stoffel from wrote on Aug 31st, 2023:
hi,
does this still work for Postgres 15? got strange output when I uncomment the execute:
/usr/pgsql-15/bin/pg_archivecleanup /data/postgres/archive cleanup.log
pg_archivecleanup: error: invalid file name argument
pg_archivecleanup: hint: Try "pg_archivecleanup --help" for more information.
this will clearly not work off course with these kind of arguments, don't know from where the cleanup.log comes from?
zzz from wrote on Dec 15th, 2019:
My mistake! please disregard my previous email
I just added in your script
if [ "$cmd_file" != "" ]; then
execute="$cmd_command $cmd_debug $cmd_dry $cmd_path $cmd_file"
fi
in case of $cmd_file is empty, so pg_archivecleanup won't complain.
thanks
ZZZ from wrote on Dec 15th, 2019:
With postgresql 12.1 walarchivecleanup seems broken, it says
"pg_archivecleanup: error: must specify oldest kept WAL file
Try "pg_archivecleanup --help" for more information."
any clue? Thanks
ck from Switzerland wrote on Jul 10th, 2019:
zzzHH, correct. See the comment inside the script: Please note that WAL archiving currently only works on a master server (as of 9.6).
zzzHH from wrote on Jul 10th, 2019:
Thanks!
is this command/script can be used only on the master?
maria from wrote on Feb 25th, 2018:
Find it very useful. Thanks!
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder