Monitoring plugin check_smart 6.12.1 released: Security fix, NVMe perfdata fix, Erase_Fail_Count_Total

Written by - 0 comments

Published on - last updated on December 10th 2021 - Listed in Hardware Monitoring Security


A new version of check_smart, an open source monitoring plugin to monitor the health of hard drives, solid state drives and NVMe drives, is now available!

Release 6.12.0 adds a couple of important changes to the plugin. All check_smart users are encouraged to update to 6.12.0 as soon as possible.

Security fix in trailing path of pseudo-devices

The plugin allows the usage of so-called pseudo-devices. These devices are (in most cases) physical drives "hiding" behind a RAID controller. Depending on the controller, the Kernel then presents the drives under a path (/dev/bus/N).

By adding the possibility to check pseudo-devices, a security vulnerability was introduced. This gave check_smart the "honour" of its own CVE (CVE-2021-42257). However the security fix in version 6.9.1 only covered a part of the vulnerability. After discussions with Wolfgang Frisch from SUSE and John Runyon, an additional vulnerability was found in the trailing path of pseudo-devices. By appending the trailing path an attacker could break out of the plugin and execute additional commands with sudo privileges:

$ sudo ./check_smart.pl -d '/dev/bus/1 >/dev/null 2>&1; whoami' -i auto
root
UNKNOWN: Drive  S/N : |

The trailing path is now also fixed and the plugin returns the following output:

$ sudo ./check_smart.pl -d '/dev/bus/1 >/dev/null 2>&1; whoami' -i auto
Could not find any valid block/character special device for device /dev/bus/1 >/dev/null 2>&1; whoami  !

Added Erase_Fail_Count_Total to default raw list

In issue #73, additional health monitoring of Samsung SSDs was discussed. This led to additional research on Samsung SSD drives and an official Samsung document revealed four important ATA attributes:

The four SMART attributes listed in the table below are the most important indicators of drive health. if any of the normalized values drop below the 10% threshold, it’s recommended to replace the drive as soon as possible because it’s approaching the end of its life and may become unreliable if used longer.

179 Unused Reserved block Count (Used_Rsvd_Blk_Cnt_Tot)
181 Program fail Count (Program_Fail_Cnt_Total)
182 Erase Fail Count (Erase_Fail_Count_Total)
183 Runtime Bad Count (Runtime_Bad_Block) 

The attributes Program_Fail_Cnt_Total and Runtime_Bad_Block were already part of the default raw list, the Erase_Fail_Count_Total attribute was now added to the default raw list.

Bugfix in NVMe performance data

Where a human codes, there might be errors. This unfortunately happened, when check_smart 6.11.0 was released. The "handling dots in attribute names" request introduced a regression which basically removed the performance data on NVMe drives:

# /usr/lib/nagios/plugins/check_smart.pl -d /dev/nvme1n1 -i nvme
OK: Drive  UCS-SDHPCIE 800GB S/N XXX: no SMART errors detected. |=0x00 =42 =100 =10 =0 =242 =2913064 =12586 =13282120 =26 =57 =4140 =44 =0 =0

Unfortunately I did not test this suggested code change properly (I did not have any NVMe devices at hand back then) - hence this created the regression. Sorry!

Version 6.12.0 now fixes the regression and the performance data are back for NVMe drives:

# /usr/lib/nagios/plugins/check_smart.pl -d /dev/nvme1n1 -i nvme
OK: Drive  UCS-SDHPCIE 800GB S/N XXX: no SMART errors detected. |Temperature=42 Available_Spare=100 Available_Spare_Threshold=10 Percentage_Used=0 Data_Units_Read=242 Data_Units_Written=2913064 Host_Read_Commands=12586 Host_Write_Commands=13282120 Controller_Busy_Time=26 Power_Cycles=57 Power_On_Hours=4141 Unsafe_Shutdowns=44 Media_and_Data_Integrity_Errors=0 Error_Information_Log_Entries=0

Regression in 6.12.0: Invalid interface

Unfortunately 6.12.0 introduced yet another regression. Interfaces with additional comma separated input (for example -i megaraid,1) are ignored by the plugin and the following error message is shown:

# ./check_smart.pl -d /dev/sda -i megaraid,14
invalid interface megaraid,14 for /dev/sda!
[...]

This is fixed in release 6.12.1, released today (December 10th, 2021) as well.



Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder