A new version of check_smart, an open source monitoring plugin to monitor the health of hard drives, solid state drives and NVMe drives, is now available!
Release 6.12.0 adds a couple of important changes to the plugin. All check_smart users are encouraged to update to 6.12.0 as soon as possible.
The plugin allows the usage of so-called pseudo-devices. These devices are (in most cases) physical drives "hiding" behind a RAID controller. Depending on the controller, the Kernel then presents the drives under a path (/dev/bus/N).
By adding the possibility to check pseudo-devices, a security vulnerability was introduced. This gave check_smart the "honour" of its own CVE (CVE-2021-42257). However the security fix in version 6.9.1 only covered a part of the vulnerability. After discussions with Wolfgang Frisch from SUSE and John Runyon, an additional vulnerability was found in the trailing path of pseudo-devices. By appending the trailing path an attacker could break out of the plugin and execute additional commands with sudo privileges:
$ sudo ./check_smart.pl -d '/dev/bus/1 >/dev/null 2>&1; whoami' -i auto
root
UNKNOWN: Drive S/N : |
The trailing path is now also fixed and the plugin returns the following output:
$ sudo ./check_smart.pl -d '/dev/bus/1 >/dev/null 2>&1; whoami' -i auto
Could not find any valid block/character special device for device /dev/bus/1 >/dev/null 2>&1; whoami !
In issue #73, additional health monitoring of Samsung SSDs was discussed. This led to additional research on Samsung SSD drives and an official Samsung document revealed four important ATA attributes:
The four SMART attributes listed in the table below are the most important indicators of drive health. if any of the normalized values drop below the 10% threshold, it’s recommended to replace the drive as soon as possible because it’s approaching the end of its life and may become unreliable if used longer.
179 Unused Reserved block Count (Used_Rsvd_Blk_Cnt_Tot)
181 Program fail Count (Program_Fail_Cnt_Total)
182 Erase Fail Count (Erase_Fail_Count_Total)
183 Runtime Bad Count (Runtime_Bad_Block)
The attributes Program_Fail_Cnt_Total and Runtime_Bad_Block were already part of the default raw list, the Erase_Fail_Count_Total attribute was now added to the default raw list.
Where a human codes, there might be errors. This unfortunately happened, when check_smart 6.11.0 was released. The "handling dots in attribute names" request introduced a regression which basically removed the performance data on NVMe drives:
# /usr/lib/nagios/plugins/check_smart.pl -d /dev/nvme1n1 -i nvme
OK: Drive UCS-SDHPCIE 800GB S/N XXX: no SMART errors detected. |=0x00 =42 =100 =10 =0 =242 =2913064 =12586 =13282120 =26 =57 =4140 =44 =0 =0
Unfortunately I did not test this suggested code change properly (I did not have any NVMe devices at hand back then) - hence this created the regression. Sorry!
Version 6.12.0 now fixes the regression and the performance data are back for NVMe drives:
# /usr/lib/nagios/plugins/check_smart.pl -d /dev/nvme1n1 -i nvme
OK: Drive UCS-SDHPCIE 800GB S/N XXX: no SMART errors detected. |Temperature=42 Available_Spare=100 Available_Spare_Threshold=10 Percentage_Used=0 Data_Units_Read=242 Data_Units_Written=2913064 Host_Read_Commands=12586 Host_Write_Commands=13282120 Controller_Busy_Time=26 Power_Cycles=57 Power_On_Hours=4141 Unsafe_Shutdowns=44 Media_and_Data_Integrity_Errors=0 Error_Information_Log_Entries=0
Unfortunately 6.12.0 introduced yet another regression. Interfaces with additional comma separated input (for example -i megaraid,1) are ignored by the plugin and the following error message is shown:
# ./check_smart.pl -d /dev/sda -i megaraid,14
invalid interface megaraid,14 for /dev/sda!
[...]
This is fixed in release 6.12.1, released today (December 10th, 2021) as well.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder