A new version of check_smart, a monitoring plugin to monitor hard drives, solid state drives and NVMe drives, is available.
Starting with version 6.15.0, check_smart.pl additionally checks for errors in the so-called SMART Error Log by default. This is a very important new feature, as the SMART Error Log might contain hints of a failing drive which are not represented in the SMART attributes.
The changes for v6.15.0 were contributed by Tomas Barton. Kudos and thanks!
check_smart.pl with version 6.15.0 and upwards now also checks the so-called SMART Error Log by default. This is a kind of log inside a drive's SMART memory collecting historical ATA alerts. Such alerts are increasing the "ATA Error Count" counter. The most recent errors are shown in the smartcl -a output.
Here's an example how this looks like:
root@linux ~ # smartctl -a /dev/sda
[...]
SMART Error Log Version: 1
ATA Error Count: 80 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 80 occurred at disk power-on lifetime: 2593 hours (108 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 00 ff ff ff 4f 00 36d+09:46:36.488 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 36d+09:46:36.488 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 36d+09:46:36.488 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 36d+09:46:36.487 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 36d+09:46:36.487 SET FEATURES [Set transfer mode]
[...]
To skip checking the SMART Error Log, and therefore ignore logged ATA errors, the new parameter --skip-error-log can be used.
Another new parameter is the -O/--oldage parameter.
If this parameter is used, certain "oldage" attributes, related to the drive's usage, are ignored.
Right now these attributes are:
The mentioned attributes above can issue an alert that the drive's supposed lifetime (percent used) has reached its end. With the -O/--oldage parameter this can be ignored and overridden. Some drives are likely to last longer than the anticipated lifetime.
Be very cautious about using this parameter and only use it if you know what you're doing!
The check_smart documentation has been appended with additional configuration examples for Icinga 2 users.
These examples include:
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder