The monitoring plugin check_smart, to monitor hard drives' and solid state drives' SMART attributes, is out with a new version.
Version 5.11 introduces a new parameter "-e" or "--exclude" which stands for exclude list (aka ignore list).
The exclude list is a list of strings, separated by comma. The exclude list basically tells the plugin which SMART attributes to ignore, even if they are in a failing or failed state.
Let's take a temperature failed in the past error as an example.
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
194 Temperature_Celsius 0x0002 113 113 000 Old_age Always In_the_past 53 (Lifetime Min/Max 25/62)
Without the exclude list, the plugin will return a WARNING when the temperature SMART attribute once failed in the past:
# ./check_smart.pl -d /dev/sda -i sat
WARNING: Attribute Temperature_Celsius failed at In_the_past|Raw_Read_Error_Rate=0 Throughput_Performance=67 Spin_Up_Time=0 Start_Stop_Count=3 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=34 Power_On_Hours=10617 Spin_Retry_Count=0 Power_Cycle_Count=3 Power-Off_Retract_Count=3 Load_Cycle_Count=3 Temperature_Celsius=53 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
It's a nice info that it once failed in the past. But once we know that, we get over it and want the warning to disappear. With the exclude list, the plugin can be told to ignore this attribute "Temperature_Celsius":
# ./check_smart.pl -d /dev/sda -i sat -e Temperature_Celsius
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=67 Spin_Up_Time=0 Start_Stop_Count=3 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=34 Power_On_Hours=10617 Spin_Retry_Count=0 Power_Cycle_Count=3 Power-Off_Retract_Count=3 Load_Cycle_Count=3 Temperature_Celsius=53 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
And hurray, no alert anymore.
But this could also be a bit dangerous. What if the drive has a new (live!) temperature alert? You'd certainly want to know about it. That's why, besides excluding a SMART attribute, it is also possible to exclude certain values in the "When_failed" column. In the following example, the "When_Failed" value "In_the_past" (as seen above) can be used in the exclude list:
# ./check_smart.pl -d /dev/sda -i sat -e "In_the_past"
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=67 Spin_Up_Time=0 Start_Stop_Count=3 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=34 Power_On_Hours=10617 Spin_Retry_Count=0 Power_Cycle_Count=3 Power-Off_Retract_Count=3 Load_Cycle_Count=3 Temperature_Celsius=53 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
As you can see, the plugin doesn't alert anymore on the "Temperature_Celsius" because it detected the "In_the_past" value in the "When_failed" column and successfully ignored it.
To ignore multiple attributes, simply separate them with a comma:
# ./check_smart.pl -d /dev/sda -i sat -e "In_the_past","Current_Pending_Sector"
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=67 Spin_Up_Time=0 Start_Stop_Count=3 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=34 Power_On_Hours=10617 Spin_Retry_Count=0 Power_Cycle_Count=3 Power-Off_Retract_Count=3 Load_Cycle_Count=3 Temperature_Celsius=53 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0
But you better make sure you're not cutting yourself with this. The main reason why the exclude list was created in the first place is clearly the temperature attribute.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder