Today I had to solve a special case where an Icinga 2 satellite server ran out of disk space in /var. After I increased the disk size I noticed that almost all network switches, checked via this satellite using check_nwc_health, returned an UNKNOWN status. Service output: rumms.
I manually verified this on the cli:
# /usr/lib/nagios/plugins/check_nwc_health --hostname aswitch --community public --mode interface-usage --name Ethernet1/1
rumms
UNKNOWN - no interfaces
I manually re-listed all interfaces:
# /usr/lib/nagios/plugins/check_nwc_health --hostname aswitch --community public --mode list-interfaces
83886080 mgmt0
151060482 Vlan2
[...]
526649088 Ethernet101/1/29
526649152 Ethernet101/1/30
526649216 Ethernet101/1/31
526649280 Ethernet101/1/32
OK - have fun
And then the check worked again:
# /usr/lib/nagios/plugins/check_nwc_health --hostname aswitch --community public --mode interface-usage --name Ethernet1/1
OK - interface Ethernet1/1 (alias UCS-FI-A) usage is in:0.82% (82014272.36bit/s) out:3.21% (320758024.71bit/s) | 'Ethernet1/1_usage_in'=0.82%;80;90;0;100 'Ethernet1/1_usage_out'=3.21%;80;90;0;100 'Ethernet1/1_traffic_in'=82014272.36;8000000000;9000000000;0;10000000000 'Ethernet1/1_traffic_out'=320758024.71;8000000000;9000000000;0;10000000000
The reason for this is that by default check_nwc_health creates a "cached" list of interfaces per checked device. This cached list is a file in /var/tmp/check_nwc_health:
# ls -l /var/tmp/check_nwc_health | grep cache
-rw-r--r-- 1 nagios nagios 8192 Jul 20 08:03 01switch_interface_cache_d2e08e73bba4b976b8b4dcdcf66e3c7d
-rw-r--r-- 1 nagios nagios 8577 Jul 20 08:17 02switch_interface_cache_d2e08e73bba4b976b8b4dcdcf66e3c7d
-rw-r--r-- 1 nagios nagios 8192 Jul 20 08:04 aswitch_interface_cache_81b3d521b731e73215515a4f1f4a3ccf
-rw-r--r-- 1 nagios nagios 0 Jul 20 07:32 bswitch_interface_cache_81b3d521b731e73215515a4f1f4a3ccf
-rw-r--r-- 1 nagios nagios 8192 Jul 20 08:06 cswitch_interface_cache_81b3d521b731e73215515a4f1f4a3ccf
-rw-r--r-- 1 nagios nagios 7017 Jul 20 08:18 dswitch_interface_cache_81b3d521b731e73215515a4f1f4a3ccf
-rw-r--r-- 1 nagios nagios 7013 Jul 20 08:19 eswitch_interface_cache_81b3d521b731e73215515a4f1f4a3ccf
-rw-r--r-- 1 nagios nagios 0 Jul 20 07:31 fswitch_interface_cache_81b3d521b731e73215515a4f1f4a3ccf
-rw-rw-r-- 1 nagios nagios 9291 Jul 20 08:16 gswitch_interface_cache_81b3d521b731e73215515a4f1f4a3ccf
-rw-r--r-- 1 nagios nagios 6245 Jul 20 07:44 hswitch_interface_cache_d2e08e73bba4b976b8b4dcdcf66e3c7d
-rw-r--r-- 1 nagios nagios 0 Jul 20 07:46 iswitch_interface_cache_d2e08e73bba4b976b8b4dcdcf66e3c7d
-rw-r--r-- 1 nagios nagios 4096 Jul 20 08:12 jswitch_interface_cache_d2e08e73bba4b976b8b4dcdcf66e3c7d
-rw-r--r-- 1 nagios nagios 4096 Jul 20 07:46 kswitch_interface_cache_d2e08e73bba4b976b8b4dcdcf66e3c7d
[...]
Note the cache-files with a 0-byte size. That's an empty list of interfaces for the specific device - ergo unknown interface for any given interface.
Because /var was full during the time the interface cache file was written the last time, it was a 0-byte file causing check_nwc_health to think there are no interfaces at all on the network device to check.
By removing the cache files the check worked again (if there is no interface cache file, it will re-created).
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder