On a Windows server, a service was hanging and nobody noticed it. The application team found out that this service, when working correctly, always creates certain temporary folders which disappear after a few minutes. This can be monitored, of course!
As the Windows servers have NSClient installed, I can use check_nrpe from the Icinga server to check for the folders. So I created a folder "claudiotest" in the temp folder of the application:
Note that I used an asterisk wildcard in the path in order to simulate the temporary folders of the application, they all start with the same name but have a different ending.
$ /usr/lib/nagios/plugins/check_nrpe -H windowsserver -c check_files -a "file=C:\Program Files\Application\tmp\claudio*"
OK: All 1 files are ok|
Indeed, there was one file found (my folder "claudiotest").
What if I search for another name?
$ /usr/lib/nagios/plugins/check_nrpe -H windowsserver -c check_files -a "file=C:\Program Files\Application\tmp\claudiooo*"
No files found|
No surprise, nothing was found with that name.
So here I had to add filters to limit my search result. I only wanted to have results matching the filename (C:\Program Files\Application\tmp\claudio*) and an age older than 15 minutes:
$ /usr/lib/nagios/plugins/check_nrpe -H windowsserver -c check_files -a "file=C:\Program Files\Application\tmp\claudio*" "filter=age>900"
OK: All 1 files are ok|
So far so good, but it should not be OK, it should WARN that the application is probably hanging. For this the "warn" argument must be used:
$ /usr/lib/nagios/plugins/check_nrpe -H windowsserver -c check_files -a "file=C:\Program Files\Application\tmp\claudio*" "filter=age>900" "warn=count>0"
WARNING: 1/1 files (claudiotest)|'count'=1;0;0
This means: As soon as the check found at least one file matching the filename and the age is older than 15min, it will return a warning.
But I faced one more issue. When no such directories exist (which can happen), I got an UNKNOWN return code (3):
$ /usr/lib/nagios/plugins/check_nrpe -H windowsserver -c check_files -a "file=C:\Program Files\Application\tmp\claudiooo*" "filter=age>900" "warn=count>0"; echo $?
No files found|'count'=0;0;0
3
This means, that in Icinga this would be shown as an UNKNOWN alert, which should not be the case. But this can be solved with the parameter "empty-state". This basically means when nothing applies to the filter (no result), this return code should be used:
$ /usr/lib/nagios/plugins/check_nrpe -H windowsserver -c check_files -a "file=C:\Program Files\Application\tmp\claudiooo*" "filter=age>900" "empty-state=ok" "warn=count>0"; echo $?
No files found|'count'=0;0;0
0
This time, the return code was OK (0).
And the final check:
$ /usr/lib/nagios/plugins/check_nrpe -H windowsserver -c check_files -a "file=C:\Program Files\Application\tmp\claudio*" "filter=age>900" "empty-state=ok" "warn=count>0"
WARNING: 1/1 files (claudiotest)|'count'=1;0;0
Solved!
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder