In the last article (Read Disk SMART values on FreeBSD (6.0) behind a HP Raid (cciss)) I wrote how the SMART values of physical disks behind a HP Raid can be read and checked. That was a necessary "manual check" as the Nagios plugin check_smart.pl was not installed on that particular server.
Back at work I was wondering if the plugin would work on this server (DL 380 G4, HP Raid, FreeBSD 6.0). And nope, it did not. But I made the necessary changes and released version 3.3 of check_smart.pl today.
Besides adding support for the HP Raid (cciss), I also wanted to be alerted, as soon as there are "Elements in grown defect list", similar to "Current Pending Sectors".
Please note that cciss is supported by smartctl (smartmontools) only since version 5.38. So make sure you're running at least 5.38. This can be verified by running the following command:
$ smartctl -V | grep release
smartmontools release 5.38 dated 2008/03/10 at 10:44:07 GMT
The new version of check_smart.pl can be downloaded on Github:
https://raw.github.com/Napsty/check_smart/master/check_smart.pl
And this is how it looks in my Nagios/Icinga setup:
Enjoy.
ck from Switzerland wrote on May 14th, 2014:
hmm.. this could be tricky. I have searched for possibilities to "reset" the value of "Airflow_Temperature_Cel" (which contains a In_the_past event in your output) but couldn't find a way. Have you tried to power cycle (power off... wait a bit ...power on) the machine with this disk?
ezra from boston wrote on May 14th, 2014:
Sorry not sure what I\'m looking for but here is the output from the command.
bash-3.2# smartctl -a /dev/sdb
smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.11
Device Model: ST3750330AS
Serial Number: 9QK19F18
Firmware Version: SD45
User Capacity: 750,156,374,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Wed May 14 13:17:20 2014 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
[...]
T Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
190 Airflow_Temperature_Cel 0x0022 073 032 045 Old_age Always In_the_past 27 (1 239 31 23)
[...]
ck from Switzerland wrote on May 13th, 2014:
Hello ezra, did you verify if the SMART values are OK? Launch the following command to see the SMART values:
smartctl -a /dev/sdb
ezra from Boston wrote on May 13th, 2014:
Hello, I'd like very much to use the -b (theshold value) but when I implement it check_smart.pl still output an error "Airflow_Temperature_Cel failed". The command I execute is as follows:
"/etc/nagios/libexec/check_smart.pl -d /dev/sdb -i ata -b 3"
/etc/nagios/libexec/check_smart.pl -h
check_smart v$Revision: 5.2 $ (nagios-plugins 1.4.15)
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder