On a Nagios 4.x installation, a particular host was marked as DOWN.
Yet the remote host was perfectly up and working. Why would Nagios think that the host is down though?
Taking a closer look at the host definition showed the following configuration:
define host{
use remote-host
hostgroups group1,group2
host_name www.example.com
}
Note: www.example.com is obviously an anonymized place holder.
If you've been a long time Nagios user, the first thing which catches your eye is the missing "address" in this host definition. The official Nagios object definition documentation clearly marks the address field as required keyword:
However, there's a catch with the address field: It can actually be omitted. In this situation, the host_name will be used as address. The same documentation mentions:
Note: If you do not specify an address directive in a host definition, the name of the host will be used as its address. A word of caution about doing this, however - if DNS fails, most of your service checks will fail because the plugins will be unable to resolve the host name.
That means that www.example.com will be used as the host's address. To manually check what happens in the background of a host check, we need to look at the "remote-host" template:
# remote-host template
define host{
name remote-host ; The name of this host template
use generic-host ; This template inherits other values from the generic-host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 3 ; Actively check the host every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 2 ; Check each Linux host 10 times (max)
check_command check-host-alive ; Default command to check Linux hosts
notification_period 24x7 ; Linux admins hate to be woken up, so we only notify during the day
[...]
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
The check_command of this host (template) object shows the check-host-alive. Now let's find out what this command is doing by taking a look at the command definition:
# 'check-host-alive' command definition
define command{
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}
The check-host-alive uses the check_ping monitoring plugin in the background to determine whether the remote host is up or down. By executing the plugin manually, we should obtain the same result as Nagios shows in the user interface:
root@nagios:~# /usr/lib/nagios/plugins/check_ping -H www.example.com -w 3000.0,80% -c 5000.0,100% -p 5
CRITICAL - Network Unreachable (www.example.com)
Indeed, the same "Network Unreachable" error is shown. But why does this happen? A DNS resolving of the target name shows why: The target (www.example.com) can be resolved to both IPv4 and IPv6 addresses. Today's servers are often using IPv6 by default, unless otherwise configured or completely disabled.
To enforce a communication with IPv4, the check_ping plugin can be told to use IPv4 with the -4 parameter:
root@nagios:~# /usr/lib/nagios/plugins/check_ping -H www.example.com -w 3000.0,80% -c 5000.0,100% -p 5 -4
PING OK - Packet loss = 0%, RTA = 1.16 ms|rta=1.160000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0
The ping now responds enforcing IPv4.
The solution for Nagios in this situation? Either add the IPv4 address in the host definition or adjust the check-host-alive command (append -4 to the command_line).
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder