check_smart 6.15.0 released: SMART Error Log check + usage override possibility

Written by - 0 comments

Published on - Listed in Hardware Monitoring Icinga Nagios


A new version of check_smart, a monitoring plugin to monitor hard drives, solid state drives and NVMe drives, is available.

Starting with version 6.15.0, check_smart.pl additionally checks for errors in the so-called SMART Error Log by default. This is a very important new feature, as the SMART Error Log might contain hints of a failing drive which are not represented in the SMART attributes.

The changes for v6.15.0 were contributed by Tomas Barton. Kudos and thanks!

check_smart monitoring plugin

Checking Smart Error Log

check_smart.pl with version 6.15.0 and upwards now also checks the so-called SMART Error Log by default. This is a kind of log inside a drive's SMART memory collecting historical ATA alerts. Such alerts are increasing the "ATA Error Count" counter. The most recent errors are shown in the smartcl -a output.

Here's an example how this looks like:

root@linux ~ # smartctl -a /dev/sda
[...]
SMART Error Log Version: 1
ATA Error Count: 80 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 80 occurred at disk power-on lifetime: 2593 hours (108 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00  36d+09:46:36.488  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  36d+09:46:36.488  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00  36d+09:46:36.488  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  36d+09:46:36.487  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  36d+09:46:36.487  SET FEATURES [Set transfer mode]
[...]

To skip checking the SMART Error Log, and therefore ignore logged ATA errors, the new parameter --skip-error-log can be used.

Ignore oldage usage attributes (caution!)

Another new parameter is the -O/--oldage parameter.

If this parameter is used, certain "oldage" attributes, related to the drive's usage, are ignored.

Right now these attributes are:

  • 202 (Percent_Lifetime_Used) for ATA SSD drives
  • Critical_Warning with value 0x04 for NVMe drives

The mentioned attributes above can issue an alert that the drive's supposed lifetime (percent used) has reached its end. With the -O/--oldage parameter this can be ignored and overridden. Some drives are likely to last longer than the anticipated lifetime.

Be very cautious about using this parameter and only use it if you know what you're doing!

Icinga 2 documentation added

The check_smart documentation has been appended with additional configuration examples for Icinga 2 users. 

These examples include:

  • CheckCommand object definition for "check_smart"
  • Service object using "nrpe" as command
  • Service object using "check_smart" as command
  • Service apply rule example with physical drives documented as host.vars



Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder