Two years ago I wrote a post (Seagate ST3000DM001-9YN166: A catastrophic disk) about massive failings of a Seagate hard drive model. One of them didn't even last 4 months and already failed.
Now to a good example. In March 2010 I ordered two Western Digital Caviar Blue (7200rpm, 500GB, SATA-II) and used them together as RAID-1 on a self-built NAS server.
Now, five years later, one of the two drives failed. After a runtime of about 40'000 hours. That's quite a good life time for a SATA hard drive being online 24/7!
Here's the smartctl output of the failing drive:
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Blue Serial ATA family
Device Model: WDC WD5000AAKS-00V1A0
Serial Number: XXXXXXXXXXXXX
Firmware Version: 05.01D05
User Capacity: 500,107,862,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Wed May 20 15:25:40 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
[...]
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 177 138 021 Pre-fail Always - 2150
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 88
5 Reallocated_Sector_Ct 0x0033 151 151 140 Pre-fail Always - 391
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 045 045 000 Old_age Always - 40361
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 84
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 29
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 56
194 Temperature_Celsius 0x0022 100 093 000 Old_age Always - 43
196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 359
197 Current_Pending_Sector 0x0032 001 001 000 Old_age Always - 32916
198 Offline_Uncorrectable 0x0030 199 194 000 Old_age Offline - 151
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 197 000 Old_age Offline - 13
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 40174 919347518
As you can see, the drive first re-allocated defect sectors and has no spare sectors left anymore. Another 32k defect sectors are pending to be reallocated - well, that won't happen anymore ;-).
Thanks to the RAID-1 configured in the NAS, everything continues to work. However it is strange that mdadm didn't detect the drive as failed. Both disks seem to be active in /proc/mdstat. Might be because of the old and unpatched SLES11 from anno 2010 though.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder