A new version of check_smart, a monitoring plugin to monitor hard drives, solid state drives and NVMe drives, is available.
The newest release, 6.17.0, contains a fix and an enhancement from two individual open source contributors. Thanks a lot!
Since version 6.15.0, the check_smart plugin also checks the "ATA Error Logs" on ATA drives by default, unless the option --skip-error-log is used.
Here's an example of such a broken disk showing the ATA Error Count (from smartctl output):
SMART Error Log Version: 1
ATA Error Count: 30 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
However the added default check for ATA Error Count introduced a regression. The plugin also alerted, when the line was found but with a value of 0 (ATA Error Count: 0):
$ /usr/bin/sudo /usr/lib/nagios/plugins/check_smart.pl -g '/dev/sd[a-d]' -i 'ata' -w 'Reallocated_Sector_Ct=4,Runtime_Bad_Block=4,Uncorrectable_Error_Cnt=2,Reallocated_Event_Count=2' --skip-self-assessment
WARNING: [/dev/sda] - [/dev/sda] - ata_errors is non-zero (0)[/dev/sda] - --- [/dev/sdc] - [/dev/sdc] - ata_errors is non-zero (0)[/dev/sdc] - --- [/dev/sdd] - [/dev/sdd] - ata_errors is non-zero (0)[/dev/sdd] - --- [/dev/sdb] - Device is clean
Florian Sager discovered this problem and contributed a fix to make sure to skip the ATA Count Error line if the value is 0.
The next contribution comes from Philippe Beaumont and enhances the multi drive check (using the -g / --global) with devices behind an areca raid.
Here's a working example:
$ sudo /usr/lib/nagios/plugins/check_smart.pl -g /dev/sg1 -i "areca,[1-4]"
OK: [areca,1] - Device is clean --- [areca,2] - Device is clean --- [areca,3] - Device is clean --- [areca,4] - Device is clean|
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PHP Perl Personal PostgreSQL PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder