Since a couple of years I successfully use the Nagios plugin check_smart (https://www.monitoringexchange.org/inventory/Check-Plugins/Hardware/Storage/Check-SMART-status) by Kurt Yoder to monitor the health of hard disks using the S.M.A.R.T. values.
It has always been working like a charm - as long as the OS was seeing the drives directly. In most cases I used the plugin in environments with software raid (mdadm) and therefore the disks were still seen as /dev/sda and /dev/sdb.
However I got aware, that the plugin does not work with disks behind a hardware raid controller, for example MegaRAID, although the smartctl command (part of smartmontools) is able to read the SMART values through a hardware raid controller.
This happened:
./check_smart -d /dev/sda -i megaraid,8
invalid interface megaraid,8 for /dev/sda!
check_smart uses smartctl in the background, and smartctl itself works fine with megaraid (see http://sourceforge.net/apps/trac/smartmontools/wiki/Supported_RAID-Controllers) :
smartctl -d megaraid,8 -H /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
/dev/sda [megaraid_disk_09] [SAT]: Device open changed type from 'megaraid' to 'sat'
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.
The issue lies in the plugin itself. It verifies if the given arguments contain either ata or scsi as interface types. By doing this, other interface types (like here megaraid) are not working and the plugin stops working.
I took the liberty and patched check_smart to accept hardware raid controllers as interface type.
Take a look at my github repository here: https://github.com/Napsty/check_smart .
I successfully tested it with megaraid, it may of course also work with others:
./check_smart -d /dev/sda -i megaraid,8
OK: no SMART errors detected|Raw_Read_Error_Rate=0 Spin_Up_Time=2958 Start_Stop_Count=13 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Power_On_Hours=603 Spin_Retry_Count=0 Calibration_Retry_Count=0 Power_Cycle_Count=13 Power-Off_Retract_Count=11 Load_Cycle_Count=1 Temperature_Celsius=32 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0 Multi_Zone_Error_Rate=0
Enjoy!
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder