Automatically cleaning up archived WAL files on a PostgreSQL server

Written by - 7 comments

Published on - Listed in DB Postgres PGSQL Database PostgreSQL


It's been a couple of weeks since I set up a PostgreSQL replication and added it to our monitoring system (see How to monitor a PostgreSQL replication) and it has been running smoothly so far. But in the past few days a disk usage warning popped up.

Although the databases themselves only use around 10GB of disk space, the WAL files (especially the archived WAL files) eat 63GB!

This is because by default the archived WAL files are kept forever if "archive_mode" is set to on in the PostgreSQL config:

archive_mode = on        # enables archiving; off, on, or always
archive_command = 'cp %p /var/lib/postgresql/9.6/main/archive/%f'

I thought the solution is easy: I just disable the archiv_mode on the master and enable it on the replica (a hot standby). NOPE! I was following the replica as the WAL files were rotating through (I have wal_keep_segments = 32) but no files in the archive directory were created.

A look at an older mail from February 2014 in the PostgreSQL mailing list reveals:

"It works fine, only the server will not generate WAL while it is in recovery.  As soon as you promote the standby, it will archive ist WALs."

A hot_standby replica server is basically ALWAYS running in recovery; means that the "archive_command" will never run on it. Lesson 1 learned: Cleaning up must be done on the master server.
Note: This is only true for hot_standby, it may be different for other kinds of replication modes.

To clean up the archived WAL files, there's a special command pg_archivecleanup. The program can be added into the recovery.conf on a standby server (not hot_standby!) or used as standalone command:

pg_archivecleanup [option...] archivelocation oldestkeptwalfile

I decided to go with the standalone command and build a wrapper around the command. This resulted in a shell script walarchivecleanup.sh. The script allows different options and is able to dynamically looking up a the "oldestkeptwalfile" given by the a max age parameter (-a). A specific "oldestkeptwalfile" can also be given (-f).

Example:

# ./walarchivecleanup.sh -p /var/lib/postgresql/9.6/main/archive -a 14 -d
pg_archivecleanup: keep WAL file "/var/lib/postgresql/9.6/main/archive/0000000100000002000000B6" and later
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000E6"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000002000000B1"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000B0"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/000000010000000200000056"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000008F"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000006F"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000BC"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000A2"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000B6"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000A4"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000004F"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000D0"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000004E"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000F1"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000002000000B5"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/000000010000000200000070"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000001C"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000002000000B4"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/000000010000000200000039"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000E0"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000FD"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000003E"
[...]

General information and usage:

$ ./walarchivecleanup.sh
./walarchivecleanup.sh (c) 2017 Claudio Kuenzler
This script helps to clean up archived WAL logs on a PostgreSQL master server using the pg_archivecleanup command.
Please note that WAL archiving currently only works on a master server (as of 9.6).
---------------------
Options:
  -p         Path to the archived WAL logs (e.g. /var/lib/postgresql/9.6/main/archive)
  -a         Age of archived logs to keep (days), anything older will be deleted
  -f         Specify a certain archived WAL file, anything older than this file will be deleted
             Note: If you use -f, it will override -a parameter
  -c         Full path to pg_archivecleanup command (if not found in $PATH)
  -d         Show debug information
  -n         Dry run (simulation only)
---------------------
Usage: ./walarchivecleanup.sh -p archivepath -a age (days) [-d debug] [-f archivefile] [-c path_to_pg_archivecleanup]
Example 1: ./walarchivecleanup.sh -p /var/lib/postgresql/9.6/main/archive -a 10
Example 2: ./walarchivecleanup.sh -p /var/lib/postgresql/9.6/main/archive -f 00000001000000010000001E
---------------------
Cronjob example: 00 03 * * * /root/scripts/walarchivecleanup.sh -p /var/lib/postgresql/9.6/main/archive -a 14

The script is now published on Github and can be found here: https://github.com/Napsty/scripts/blob/master/pgsql/walarchivecleanup.sh. Enjoy!


Add a comment

Show form to leave a comment

Comments (newest first)

ck from Switzerland wrote on Aug 31st, 2023:

Hi stoffel, thank you for the comment. I have not yet tested this on PG 15, still on older versions here ;-). But yes, it is possible that some commands have changed in the newer version.


stoffel from wrote on Aug 31st, 2023:

hi,

does this still work for Postgres 15? got strange output when I uncomment the execute:
/usr/pgsql-15/bin/pg_archivecleanup /data/postgres/archive cleanup.log
pg_archivecleanup: error: invalid file name argument
pg_archivecleanup: hint: Try "pg_archivecleanup --help" for more information.

this will clearly not work off course with these kind of arguments, don't know from where the cleanup.log comes from?


zzz from wrote on Dec 15th, 2019:

My mistake! please disregard my previous email
I just added in your script
if [ "$cmd_file" != "" ]; then
execute="$cmd_command $cmd_debug $cmd_dry $cmd_path $cmd_file"
fi
in case of $cmd_file is empty, so pg_archivecleanup won't complain.
thanks


ZZZ from wrote on Dec 15th, 2019:

With postgresql 12.1 walarchivecleanup seems broken, it says
"pg_archivecleanup: error: must specify oldest kept WAL file
Try "pg_archivecleanup --help" for more information."
any clue? Thanks


ck from Switzerland wrote on Jul 10th, 2019:

zzzHH, correct. See the comment inside the script: Please note that WAL archiving currently only works on a master server (as of 9.6).


zzzHH from wrote on Jul 10th, 2019:

Thanks!
is this command/script can be used only on the master?


maria from wrote on Feb 25th, 2018:

Find it very useful. Thanks!


RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder