Analysis with dtrace: Filesystem is full although files were deleted

Written by - 0 comments

Published on - Listed in Linux Shell


Today I had a case, where a zfs file system ran full. This wouldn't normally be very interesting, but in this case, the output of df and du were completely showing different values. Why?

Besides the information that the file system is full, I was also told from the user that "some files were already cleaned" but of course without details. Although the user deleted some big files, the filesystem is still shown as full (99%). The following troubleshooting commands were launched on a Solaris OS.

On my own checks with df, zfs and du I noticed a big difference on how the three commands calculated the used disk space:

# df -h | grep polo
datapool/myapp      5.9G   5.8G    75M    99%    /opt/myapp

# zfs list -t filesystem | grep myapp
datapool/myapp       5.80G  75.5M  5.80G  /opt/myapp

# du -ksh /opt/myapp
 715M   /opt/myapp

The 'du' command shows it clearly: There are actually only 715MB used. No, there were no zfs snapshots on this filesystem.

After I compared the values a couple of times, I even noticed a slight change in df, where the file system still grew. This fact told me that something must still be written somewhere in /opt/myapp.

Let's use the powerful dtrace command to see what is currently written on this server:

# dtrace -n 'syscall::write:entry { @[fds[arg0].fi_pathname] = count(); }'
dtrace: description 'syscall::write:entry ' matched 1 probe
^C
  /devices/pseudo/clone@0:ptm                                       1
  /devices/pseudo/pts@0:1                                           1
  /var/spool/postfix/public/pickup                                  1
  /opt/myapp/logs/myapp.example.com-access_log                    6
  /opt/myapp/logs/testapp.example.com-access_log                6
  /opt/myapp/logs/testapp.example.com-error_log                6
  /opt/myapp/logs/alpha.example.com-access_log                6
  /opt/myapp/logs/prod.example.com-access_log                6
  /opt/myapp/logs/prod.example.com-error_log                6
                                                          16

The above output shows me all the files that were written during I let dtrace run. And here we got some log files in our famous /opt/myapp path.
Let's check, how big the log files are:

# ls -la /opt/myapp/logs
total 6
drwxr-xr-x   2 myapp myapp       2 Oct 17 12:03 .
drwxr-xr-x  11 myapp myapp      15 Oct 18 08:01 ..

Ehm... They're... gone? So that must have been the files that were already 'cleaned up' by the user. But why and who is still writing to non-existant files, therefore blocking the disk inodes to be freed?
Let's again use dtrace to see which process (execname) writes to which file:

# dtrace -n 'syscall::write:entry { printf("%s writes in %s", execname, fds[arg0].fi_pathname);}'
dtrace: description 'syscall::write:entry ' matched 1 probe

CPU     ID       FUNCTION:NAME
  0  44639       write:entry httpd.worker writes in
  0  44639       write:entry httpd.worker writes in /opt/myapp/logs/testapp.example.com-error_log
  0  44639       write:entry httpd.worker writes in /opt/myapp/logs/testapp.example.com-access_log
  0  44639       write:entry httpd.worker writes in
  0  44639       write:entry httpd.worker writes in /opt/myapp/logs/prod.example.com-error_log
  0  44639       write:entry httpd.worker writes in /opt/myapp/logs/prod.example.com-access_log
  0  44639       write:entry httpd.worker writes in /opt/myapp/logs/alpha.example.com-access_log
  0  44639       write:entry httpd.worker writes in /opt/myapp/logs/prod.example.com-access_log
  0  44639       write:entry dtrace writes in /devices/pseudo/pts@0:1
  0  44639       write:entry dtrace writes in /devices/pseudo/pts@0:1
  0  44639       write:entry dtrace writes in /devices/pseudo/pts@0:1
  0  44639       write:entry dtrace writes in /devices/pseudo/pts@0:1

So finally I see that the process 'httpd.worker' is still writing to the filesystem, although the files are not there anymore. The continuous write on the same files block the inodes on the disk, having the same disk space effect as if the files were still there -> using disk space.

As a solution, the responsible process must be reloaded:

# svcadm refresh svc:/network/http:sun-apache2

And, no surprise, df and zfs now show correct values:

# df -h | grep myapp
datapool/myapp      5.9G   766M   5.1G    13%    /opt/myapp

# zfs list -t filesystem | grep myapp
datapool/myapp        766M  5.13G   766M  /opt/myapp

Note for Linux: As written above, these dtrace commands were launched in Solaris. As of today (18.10.2012) the file descriptor (fds) is not yet available nor working in dtrace4linux. But I wrote to Paul Fox, the responsible guy for dtrace4linux, and he responded me that this will most likely be released in the next ~2 weeks.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder