Recently a Western Digital Green SSD died without any pre-fail indications (see article Leaving a party without saying bye: Western Digital Green SSD dead without pre-fail indications). Can the same be applied to other SSD drives, too? The answer is: Not exactly.
All server drives are constantly being monitored by the check_smart monitoring plugin. This usually helps to detect preeminent failures (pre-failures) of drive. This has worked very well in the past with magnetic disks (hard drives). Almost all HDD failures could be pre-detected by using check_smart. But does the same also apply for SSDs? Only time will tell.
The first SSD drive failure (mentioned above) did not show any pre-failures. The drive just went out of service in an instant. But on a Crucial SSD, check_smart was able to detect something.
The first alert was received on July 2nd by check_smart. 2 reallocated sectors were found.
That's "good" news because that's an indicator that something's not going too well with the drive and that it will most likely die. The big question is: When?
One week later, on July 9th, the reallocated sector count increased to 6 and then even 8 in the same day. The value remained steady for a while, until the counter increased to 12 sectors on July 25th.
On August 6th, it was game over for the drive: It disappeared from the Operating System and check_smart was unable to find the drive anymore. This was also the moment when two additional monitoring checks (Disk Raid Status and Server Hardware) switched to CRITICAL and informed about the failed drive.
For this MX500 SSD drive it took a bit more than one month from the first alert to the drive's end of life. Which gave us enough time to get a replacement drive in advance.
Readers of this article (and potential or existing MX500 owners) are probably interested in one particular fact: How long did the drive run until its EOL? The answer is: 8650 hours (according to the Power_on_hours SMART value).
But use caution using this value as a fixed indicator. Another Crucial MX500 is still running without any reallocated sectors so far and its Power_On_Hours value is as of this writing at 8695 hours.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder