Last update: December 08, 2020
This is a plugin to monitor a Dell Equallogic with Nagios. Its written in bash so it should run on almost all Linux/Unix based systems. It's using SNMP (v2) to query the informations from the Equallogic device. To be able to use the script, please also check the requirements.
If you are looking for commercial support for this monitoring plugin, need customized modifications or in general customized monitoring plugins, contact us at Infiniroot.com.
18548 downloads so far...
Download plugin and save it in your Nagios/Monitoring plugin folder (usually /usr/lib/nagios/plugins, depends on your distribution). Afterwards adjust the permissions (usually chmod 755).
Community contributions welcome on GitHub repo.
The plugin has been successfully tested on the following Dell Equallogic devices:
Please let me know if you have another Equallogic model and/or another firmware running.
20091109 Started Script programming checks: health, disk, raid, uptime, ps, info
20091112 Added ethif, conn
20091118 Added diskusage
20091119 Bugfix on Outputs (removed Pipes)
20091121 Public Release
20091204 Bugfix (removed IP addresses)
20091206 Bugfix (removed SNMP community names)
20091222 Fixed raid, ps, health and diskusage checks when multiple member devices exists. By Mathias Sundman.
20100112 Successful tests on PS5000XV - thanks to Scott Sawin
20100209 Compatibility matrix now on website (see Tested on above)
20100416 Beta Testing for rewritten ethif check (allows more than 3 interfaces)
20100420 Corrected ethif output, finished new ethif check
20100526 Using proper order of snmpwalk command, thanks Roland Ripoll
20100531 Added performance data for diskusage and connections - thanks to Benoit Poulet.
20100622 Corrected perfdata output (+added thresholds), thanks to Christian Lauf
20100809 Fixed conn type -> Now the total number of connections of all members in a group is used
20101026 Using /bin/bash instead of /bin/sh again (Ubuntu users had problems due to /bin/sh symlink to /bin/dash)
20101026 Bugfix in snmpwalk usage (using vqe instead of vq), thanks to Fabio Panigatti
20101102 Added fan
20101202 Added volumes (checks utilization of all volumes)
20110315 Bugfix in Fan Warning check and changed output in diskusage check
20110323 Mysteriously disappeared 'temp' type added again. Thanks to Peter Wirdemo
20110328 Beta Testing for etherrors check by Martin Conzelmann
20110404 Added thresholds to etherrors check by Martin Conzelmann
20110404 Bugfix in volumes check
20110407 New temp check by Martin Conzelmann: Rewritten and more information in output
20110725 New disk check by Amir Shakoor (~6x faster). Some bugfixes then added.
20110804 New poolusage check by Chris Funderburg and Markus Becker (perfdata)
20110808 New vol check - checks single volume for utilization
20111013 Bugfix in vol check for similar vol names by Matt White
20111031 Bugfix in ethif check for int response by Francois Borlet-Hote
20120104 Bugfix in temp check if only one controller available
20120104 Bugfix in info check if only one controller available
20120123 Bugfix in volumes check
20120125 Added perfdata in volumes check, volume names now w/o quotes
20120319 Added poolconn check by Erwin Bleeker
20120330 Rewrite of poolusage (original poolusage is now called memberusage) by Erwin Bleeker
20120405 Bugfix in poolusage to show result without thresholds
20120430 Added snapshots type by Roland Penner
20120503 Rewrite of info check (Fix for multiple members, added firmware check)
20120815 Added percentage of raid rebuild when raid reconstructing
20120821 Minor bugfix in vol/volumes check (added space in perfdata)
20120911 Added percentage of raid rebuild when raid expanding
20120913 Bugfix in percentage output in raid check
20121204 Added percentage of raid rebuild when raid verifying
20121204 Changed raid percentage output when multiple members around
20121228 ps type now also checks for failed power supply fans
20130728 Added copy to spare raid status by Peter Lieven
20131024 Bugfix in temp check (Backplane_sensor_0 was not shown)
20131025 Optical cleanup
20131122 Bugfix in vol check when volumes spread across members
20131219 Bugfix in poolusage check when a pool was not used (0 size)
20140626 Bugfix in etherrors check
20140711 Added snmp connection check function
20150203 Bugfix in vol check in percentage calculation
20151006 Bugfix in vol check if volume not found by Stephane Loeuillet
20151126 Bugfix in memberusage and poolusage checks (missing newline)
Parameter | Description |
-H* | Hostname or IP address of Equallogic member |
-C* | SNMP Communityname (must be at least readable) |
-t* | Type of check you want to do (see the definition of types further down) |
-v | Name of single volume to check |
-w | Warning threshold (optional and only in combination with certain types) |
-c | Critical threshold (optional and only in combination with certain types) |
--help | Help text for correct usage of this script |
* mandatory parameters
Check Type | Description |
conn | Checks number of current ISCSI connections (thresholds possible) |
disk | Checks Status of all disks |
diskuage | Checks how much raid space is already used (thresholds possible) |
etherrors | Checks ethernet interfaces for ethernet packet errors |
ethif | Checks status of ethernet interfaces (thresholds possible) |
fan | Checks status of fans |
health | Checks overall health of Equallogic device |
info | Checks overall health of Equallogic device |
memberusage | Shows disk utilisation of all members of the same group (thresholds possible) |
poolconn | Check highest number of ISCSI connections per pool (thresholds possible) |
poolusage | Checks utilization of pools (thresholds possible) |
ps | Checks status of power supply(ies) |
raid | Checks RAID status |
snapshots | Checks Snapshot Reserve status (warning level is taken from the equallogic volume config, critical level can be set with -c ) |
temp | Checks temperature sensors |
uptime | Shows uptime of Equallogic device |
vol | Checks a single volume, must be used with -v option (thresholds possible) |
volumes | Checks utilization of all created ISCSI volumes (thresholds possible) |
Usage:
./check_equallogic.sh -H host -C community -t checktype [-v volume] [-w warning] [-c critical]
Examples:
./check_equallogic.sh -H 10.0.0.200 -C public -t disk
./check_equallogic.sh -H 10.0.0.200 -C public -t vol -v Volume1 -w 90 -c 95
# 'check_equallogic' command definition
define command{
command_name check_equallogic
command_line $USER1$/check_equallogic -H $HOSTADDRESS$ -C $ARG1$ -t $ARG2$ $ARG3$
}
Note: I defined the -C (SNMP Communityname) as a variable. You can set this to a static value if you use the same SNMP community for all your EQL hosts.
object CheckCommand "check_equallogic" {
import "plugin-check-command"
command = [ PluginContribDir + "/check_equallogic.sh" ]
arguments = {
"-H" = {
value = "$equallogic_host$"
description = "DNS hostname or IP address of the Equallogic member"
}
"-C" = {
value = "$equallogic_community$"
description = "SNMP community"
}
"-t" = {
value = "$equallogic_checktype$"
description = "Check Type"
}
"-v" = {
value = "$equallogic_volume$"
description = "Volume name for single volume check"
}
"-w" = {
value = "$equallogic_warning$"
description = "Warning threshold"
}
"-c" = {
value = "$equallogic_critical$"
description = "Critical threshold"
}
}
vars.equallogic_host = "$address$"
vars.equallogic_community = "public"
}
Show information of all EQL members of the same group:
# Check Equallogic Information
define service{
use generic-service
host_name eql1
service_description General Information
check_command check_equallogic!public!info
}
Health check:
# Check Equallogic Health
define service{
use generic-service
host_name eql1
service_description General Health
check_command check_equallogic!public!health
}
Physical drives check:
# Check Equallogic Disk Status
define service{
use generic-service
host_name eql1
service_description Disk Status
check_command check_equallogic!public!disk
}
Volumes check:
# Check Equallogic Volumes
define service{
use generic-service
host_name eql1
service_description Volumes
check_command check_equallogic!public!volumes!-w 90 -c 95
}
The thresholds for the type 'volumes' are normal integers. The volume-type check measures the disk utilization of all created ISCSI volumes. In this example a WARNING notification will be sent when at least one volume uses more than 90% of its capacity, a CRITICAL notification when at least one volume uses more than 95% of its capacity. Without thresholds the plugin will output the current utilization of all volumes and will be status OK.
The following example is a standard service object checking a single volume:
# Check Equallogic Volume MyVol1
object Service "Hardware" {
import "generic-service"
host_name "eql1"
check_command = "check_equallogic"
vars.equallogic_checktype = "volume"
vars.equallogic_volume = "MyVol1"
vars.equallogic_warning = "75"
vars.equallogic_critical = "90"
}