User quota disk space monitoring with check_disk

Written by - 0 comments

Published on - Listed in BSD Linux Shell Nagios Monitoring


The official Nagios plugin 'check_disk' is one of the oldest and probably one of the most powerful plugins. However in most scenarios the disk checks using check_disk are only using a percentage of the full force. 

A very typical disk check looks like this:

# check free disk space on / partition
/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
DISK OK - free space: / 516 MB (56% inode=98%);| /=393MB;791;890;0;989

There's also the possibility to check all available partitions with one command:

# check all free disk space
/usr/lib/nagios/plugins/check_disk -w 20% -c 10%
DISK CRITICAL - free space: / 516 MB (56% inode=98%); /dev 0 MB (0% inode=-); /tmp 910 MB (99% inode=99%); /usr 72906 MB (66% inode=97%); /var 5210 MB (57% inode=95%); .................

But with this command the plugin returns a HUGE list of different file systems. Why? Because on this particular server (a shared hosting server running FreeBSD) has all its customer folders chrooted with zfs quota. So each customer is shown like this in df:

df -h | grep webcustomer1
datapool/home/webcustomer1   15G    7.5G    7.5G    50%    /home/webcustomer1/
/bin                         989M    393M    516M    43%    /home/webcustomer1/bin
/lib                         989M    393M    516M    43%    /home/webcustomer1/lib
/libexec                     989M    393M    516M    43%    /home/webcustomer1/libexec
/usr/bin                     116G     36G     71G    34%    /home/webcustomer1/usr/bin
/usr/lib                     116G     36G     71G    34%    /home/webcustomer1/usr/lib
/usr/lib32                   116G     36G     71G    34%    /home/webcustomer1/usr/lib32
/usr/local/bin               116G     36G     71G    34%    /home/webcustomer1/usr/local/bin
/usr/local/lib               116G     36G     71G    34%    /home/webcustomer1/usr/local/lib
/usr/local/share             116G     36G     71G    34%    /home/webcustomer1/usr/local/share
/usr/share/locale            116G     36G     71G    34%    /home/webcustomer1/usr/share/locale
/usr/share/misc              116G     36G     71G    34%    /home/webcustomer1/usr/share/misc
devfs                        1.0k    1.0k      0B   100%    /home/webcustomer1/dev

So for each customer there are several mount points which are actually always the same mount points for all customers. check_disk doesn't care about that - it will measure bin, lib, dev, etc for each customer. If you count 300 customers you make check_disk check at least 3900 file systems. Not even overkill is a word which would match that correctly....

So I dug into the manpage of check_disk and took a look what is actually possible. I found two methods how to use check_disk to check the user quotas in a fast and reliable way.

1) By using a regular expression and ignore list

# check the home file systems but ignore bin, lib, libexec, usr, etc. Do not output each file system which is OK (-e).
/usr/lib/nagios/plugins/check_disk -w 100 -c 50 -r home -i "(bin|lib|libexec|usr|dev)"
DISK OK| /home=0MB;329293;329343;0;329393 /home/webcustomer1/=35MB;5020;5070;0;5120 /home/webcustomer2/=24MB;5020;5070;0;5120 /home/webcustomer3/=897MB;5020;5070;0;5120 .......

Time to explain. I set the warning threshold to 100 (MB), the critical threshold to 50 (MB). I told check_disk to look for "home" in the file systems (-r home) but to ignore file systems containing bin, lib, libexec, usr and dev.
The -e parameter doesn't output the DISK OK status for each found file systems, it only would output the file system in a warning or critical state.

2) By ignoring certain file system types (preferred)

As I mentioned above, the bin, lib, libexec, etc file systems under each customer's chroot are actually all the same source/location from the system. They are all mounted as "nullfs" file system pointing to the original mount points from the system. As an example: /home/webcustomer1/bin is mounted from the original /bin (as it can be seen on the df output). By using df -T (to see the file system type), this can be verified:

df -T /home/webcustomer1/bin
Filesystem  Type   1K-blocks   Used  Avail Capacity  Mounted on
/bin        nullfs   1012974 402712 529226    43%    /home/fibervalais/bin

So instead of manually ignoring all the sub-mountpoints, it is actually easier to tell check_disk to ignore certain file system types. In this case nullfs and devfs.

# check the home file systems but ignore nullfs and devfs file system types. Do not output each file system which is ok (-e)
./check_disk -w 100 -c 50 -e -r home -X nullfs -X devfs
DISK OK| /home=0MB;329291;329341;0;329391 /home/webcustomer1/=35MB;5020;5070;0;5120 /home/webcustomer2/=24MB;5020;5070;0;5120 ........

If you wonder about using twice -X (as seen above), do not worry. This can be used several times (also written in the manpage of check_disk).


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder