Filesystem full because of Filebeat still hanging on to rotated log

Written by - 0 comments

Published on - Listed in Linux


In the past weeks I've seen recurring file system warnings on certain servers. After some investigation it turned out to be Filebeat still hanging on to a already rotated log file and therefore not releasing the inode, ergo not giving back the available disk space to the file system.

Let's start off at the begin. Icinga sent the disk warning for /var:

root@server:~# df -h
Filesystem            Type        Size  Used Avail Use% Mounted on
sysfs                 sysfs          0     0     0    - /sys
proc                  proc           0     0     0    - /proc
udev                  devtmpfs    2.0G   12K  2.0G   1% /dev
devpts                devpts         0     0     0    - /dev/pts
tmpfs                 tmpfs       396M  736K  395M   1% /run
/dev/sda1             ext4        4.5G  2.0G  2.3G  48% /
none                  tmpfs       4.0K     0  4.0K   0% /sys/fs/cgroup
none                  fusectl        0     0     0    - /sys/fs/fuse/connections
none                  debugfs        0     0     0    - /sys/kernel/debug
none                  securityfs     0     0     0    - /sys/kernel/security
none                  tmpfs       5.0M     0  5.0M   0% /run/lock
none                  tmpfs       2.0G     0  2.0G   0% /run/shm
none                  tmpfs       100M     0  100M   0% /run/user
none                  pstore         0     0     0    - /sys/fs/pstore
/dev/mapper/vg0-lvvar ext4         37G   33G  3.0G  92% /var
/dev/mapper/vg0-lvtmp ext4        922M  1.2M  857M   1% /tmp
systemd               cgroup         0     0     0    - /sys/fs/cgroup/systemd

As you can see, /var is at 92% full.

With lsof we checked for open yet deleted files:

root@server:~# lsof +L1
COMMAND    PID     USER   FD   TYPE DEVICE   SIZE/OFF NLINK   NODE NAME
init         1     root   11w   REG  252,1        309     0     79 /var/log/upstart/systemd-logind.log.1 (deleted)
filebeat 30336     root    3r   REG  252,1 1538307897     0    326 /var/log/haproxy.log.1 (deleted)
filebeat 30336     root    4r   REG  252,1 1474951702     0   2809 /var/log/haproxy.log.1 (deleted)
filebeat 30336     root    6r   REG  252,1 1513061121     0   1907 /var/log/haproxy.log.1 (deleted)
filebeat 30336     root    7r   REG  252,1 1566966965     0     72 /var/log/haproxy.log.1 (deleted)
filebeat 30336     root    8r   REG  252,1 1830485663     0   2558 /var/log/haproxy.log.1 (deleted)
filebeat 30336     root    9r   REG  252,1 1426600050     0    163 /var/log/haproxy.log.1 (deleted)
nginx    31673 www-data   64u   REG  252,1     204800     0   2978 /var/lib/nginx/proxy/9/53/0008826539 (deleted)
nginx    31674 www-data  154u   REG  252,1     204800     0 131334 /var/lib/nginx/proxy/2/95/0008825952 (deleted)
nginx    31676 www-data  111u   REG  252,1     241664     0   4064 /var/lib/nginx/proxy/0/54/0008826540 (deleted)

There it is; Filebeat still hanging on to a (meanwhile) rotated HAProxy log file (which is quite big as you can see in the column SIZE/OFF).

To release the inode, Filebeat can either restart or force reloaded.

root@server:~# /etc/init.d/filebeat force-reload
 * Restarting Filebeat sends log files to Logstash or directly to Elasticsearch. filebeat
2018/05/23 13:39:20.004534 beat.go:297: INFO Home path: [/usr/share/filebeat] Config path: [/etc/filebeat] Data path: [/var/lib/filebeat] Logs path: [/var/log/filebeat]
2018/05/23 13:39:20.004580 beat.go:192: INFO Setup Beat: filebeat; Version: 5.6.9
2018/05/23 13:39:20.004623 logstash.go:91: INFO Max Retries set to: 3
2018/05/23 13:39:20.004663 outputs.go:108: INFO Activated logstash as output plugin.
2018/05/23 13:39:20.004681 metrics.go:23: INFO Metrics logging every 30s
2018/05/23 13:39:20.004727 publish.go:300: INFO Publisher name: server
2018/05/23 13:39:20.004850 async.go:63: INFO Flush Interval set to: 1s
2018/05/23 13:39:20.006018 async.go:64: INFO Max Bulk Size set to: 2048
Config OK

Verification with lsof again:

root@server:~# lsof +L1
COMMAND   PID     USER   FD   TYPE DEVICE SIZE/OFF NLINK   NODE NAME
init        1     root   11w   REG  252,1      309     0     79 /var/log/upstart/systemd-logind.log.1 (deleted)
nginx   31674 www-data  154u   REG  252,1   204800     0 131334 /var/lib/nginx/proxy/2/95/0008825952 (deleted)

Looks better! What about the file system size?

root@server:~# df -h
Filesystem            Type        Size  Used Avail Use% Mounted on
sysfs                 sysfs          0     0     0    - /sys
proc                  proc           0     0     0    - /proc
udev                  devtmpfs    2.0G   12K  2.0G   1% /dev
devpts                devpts         0     0     0    - /dev/pts
tmpfs                 tmpfs       396M  736K  395M   1% /run
/dev/sda1             ext4        4.5G  2.0G  2.3G  48% /
none                  tmpfs       4.0K     0  4.0K   0% /sys/fs/cgroup
none                  fusectl        0     0     0    - /sys/fs/fuse/connections
none                  debugfs        0     0     0    - /sys/kernel/debug
none                  securityfs     0     0     0    - /sys/kernel/security
none                  tmpfs       5.0M     0  5.0M   0% /run/lock
none                  tmpfs       2.0G     0  2.0G   0% /run/shm
none                  tmpfs       100M     0  100M   0% /run/user
none                  pstore         0     0     0    - /sys/fs/pstore
/dev/mapper/vg0-lvvar ext4         37G   24G   12G  68% /var
/dev/mapper/vg0-lvtmp ext4        922M  1.2M  857M   1% /tmp
systemd               cgroup         0     0     0    - /sys/fs/cgroup/systemd

The /var partition is now back to 68% used. Good!

But how does one prevent this from happening again? In the logrotate config for HAProxy (/etc/logrotate.d/haproxy) I added a postrotate command to reload Filebeat:

root@server:~# cat /etc/logrotate.d/haproxy
/var/log/haproxy.log {
    daily
    rotate 52
    missingok
    notifempty
    compress
    delaycompress
    postrotate
        invoke-rc.d rsyslog rotate >/dev/null 2>&1 || true
        service filebeat force-reload >/dev/null 2>&1
    endscript
}

For the last couple of weeks I've been watching this on that particular server and it turns out this definitely solved the problem that Filebeat was still hanging on to inodes not in use anymore. The following graphs shows the disk usage of /var:

Filebeat still hanging on to logfiles after rotation

The red circles show when I manually forced a reload of Filebeat.
The blue circle notes the day when I added the "service filebeat force-reload >/dev/null 2>&1" line into postrotate in the logrotate file - and when it first executed (note the significant fall compared to the days before).

I also had to add the reload line to the logrotate config of Nginx, as I'm using Filebeat to go through HAProxy and Nginx logs.

Note: This happened under both Ubuntu 14.04 and 16.04 with Filebeat 5.6.9. Meanwhile Filebeat 6.x is out, maybe this fixes the rotated log file issue but I didn't have the time to upgrade yet. 


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder