When the replacement SSD drive is the same model - but with less capacity

Written by Claudio Kuenzler - 0 comments

Published on July 28th 2020 - last updated on July 30th 2020 - Listed in Hardware Linux

In a previous article (see Leaving a party without saying bye: Western Digital Green SSD dead without pre-fail indications), a defect SSD drive was detected by monitoring. A RMA case was issued and the replacement drive was received two days later. So far so good.

Rebuilding the RAID by adding the replacement drive

Once the replacement SSD was unpacked and built into this local test server, it was time to add this drive into the RAID-1 array (which was still running in degraded state since the failure of one drive). The software raid drive replacement guide was followed and first the partition table was copied from the remaining drive (SDB) to the new drive (SDC):

root@irbwsrvp01 ~ # sfdisk -d /dev/sdb | sfdisk /dev/sdc
Checking that no-one is using this disk right now ... OK

Disk /dev/sdc: 223.6 GiB, 240057409536 bytes, 468862128 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Created a new DOS disklabel with disk identifier 0xf50347d9.
/dev/sdc1: Created a new partition 1 of type 'Non-FS data' and of size 223.6 GiB.
/dev/sdc2: Done.

New situation:

Device Boot Start End Sectors Size Id Type
/dev/sdc1 2048 468862127 468860080 223.6G da Non-FS data

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

Verify the current RAID status again:

root@irbwsrvp01 ~ # cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 sda3[0] sdd3[1]
      439426048 blocks super 1.2 [2/2] [UU]
      bitmap: 3/4 pages [12KB], 65536KB chunk

md1 : active raid1 sda2[0] sdd2[1]
      24397824 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda1[0] sdd1[1]
      24396800 blocks super 1.2 [2/2] [UU]

md3 : active raid1 sdb1[3]
      234372096 blocks super 1.2 [2/1] [U_]
      bitmap: 1/1 pages [4KB], 131072KB chunk

unused devices:

md3 still runs in a degraded state. Let's add (join) the replacement drive into this array:

root@irbwsrvp01 ~ # mdadm /dev/md3 -a /dev/sdc1
mdadm: /dev/sdc1 not large enough to join array

Whoops, obviously this didn't work. What happened?!

Same model - different capacity

A quick check using fdisk between drives SDB and SDC indeed showed different values:

root@irbwsrvp01 ~ # fdisk -l /dev/sdb
Disk /dev/sdb: 223.6 GiB, 240065183744 bytes, 468877312 sectors
[...]
root@irbwsrvp01 ~ # fdisk -l /dev/sdc
Disk /dev/sdc: 223.6 GiB, 240057409536 bytes, 468862128 sectors
[...]

The capacity (bytes) shown by fdisk differs. The number of sectors are different, too.

Was there a problem during the partition table transfer? Or was there some other error? Let's see the drive information using smartctl:

root@irbwsrvp01 ~ # smartctl -i /dev/sdb
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.19.0-0.bpo.6-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WDS240G2G0A-00JH30
Serial Number:    XXXX1
LU WWN Device Id: 5 001b44 8b9fb47d1
Firmware Version: UF500000
User Capacity:    240,065,183,744 bytes [240 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jul 24 14:08:07 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

root@irbwsrvp01 ~ # smartctl -i /dev/sdc
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.19.0-0.bpo.6-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WDS240G2G0A-00JH30
Serial Number:    XXXX2
LU WWN Device Id: 5 001b44 4a74ac5d4
Firmware Version: UF400400
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3, ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jul 24 14:08:08 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Both drives are the exact same model: WDC WDS240G2G0A-00JH30. Yet the capacity clearly differs in the amount of available bytes. The only other difference between these two drives could be seen in the firmware version. Unfortunately I was unable to find specific firmware versions for these drives so I was stuck with a replacement drive which didn't help me at all.

Re-create the RAID array

While waiting for an answer from Western Digital support why this happens in the first place (it definitely shouldn't!), I decided to stop the relevant LXC server running Zoneminder, backup the data of this RAID1, remove the LVM PV and then recreate the RAID array.

root@irbwsrvp01 ~ # lvremove /dev/vgssd/zoneminder
Do you really want to remove active logical volume vgssd/zoneminder? [y/n]: y
Logical volume "zoneminder" successfully removed

root@irbwsrvp01 ~ # vgremove vgssd
Volume group "vgssd" successfully removed

root@irbwsrvp01 ~ # pvremove /dev/md3
Labels on physical volume "/dev/md3" successfully wiped.

Time to stop the md3 array:

root@irbwsrvp01 ~ # mdadm --stop /dev/md3
mdadm: stopped /dev/md3

root@irbwsrvp01 ~ # cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 sda3[0] sdd3[1]
      439426048 blocks super 1.2 [2/2] [UU]
      bitmap: 3/4 pages [12KB], 65536KB chunk

md1 : active raid1 sda2[0] sdd2[1]
      24397824 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda1[0] sdd1[1]
      24396800 blocks super 1.2 [2/2] [UU]

unused devices:

Remove partitions from drives SDB and SDC:

root@irbwsrvp01 ~ # fdisk /dev/sdb

Welcome to fdisk (util-linux 2.29.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): d
Selected partition 1
Partition 1 has been deleted.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

root@irbwsrvp01 ~ # fdisk /dev/sdc

Welcome to fdisk (util-linux 2.29.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): d
Selected partition 1
Partition 1 has been deleted.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

Create a new partition in the smaller drive SDC:

root@irbwsrvp01 ~ # fdisk /dev/sdc

Welcome to fdisk (util-linux 2.29.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): n
Partition type
p primary (0 primary, 0 extended, 4 free)
e extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1):
First sector (2048-468862127, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-468862127, default 468862127):

Created a new partition 1 of type 'Linux' and of size 223.6 GiB.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

Then do the same on SDB, using the last sector value from the smaller drive SDC:

root@irbwsrvp01 ~ # fdisk /dev/sdb

Welcome to fdisk (util-linux 2.29.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): n
Partition type
p primary (0 primary, 0 extended, 4 free)
e extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1):
First sector (2048-468877311, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-468877311, default 468877311): 468862127

Created a new partition 1 of type 'Linux' and of size 223.6 GiB.
Partition #1 contains a linux_raid_member signature.

Do you want to remove the signature? [Y]es/[N]o: Y

The signature will be removed by a write command.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

Now the raid array can be created using these two newly created partitions sdb1 and sdc1:

root@irbwsrvp01 ~ # mdadm --create --verbose /dev/md3 --level=1 --raid-devices=2 /dev/sdb1 /dev/sdc1
mdadm: Note: this array has metadata at the start and
    may not be suitable as a boot device. If you plan to
    store '/boot' on this device please ensure that
    your boot-loader understands md/v1.x metadata, or use
    --metadata=0.90
mdadm: size set to 234298944K
mdadm: automatically enabling write-intent bitmap on large array
Continue creating array? yes
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md3 started.

And the automatic resync has started:

root@irbwsrvp01 ~ # cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md3 : active raid1 sdc1[1] sdb1[0]
      234298944 blocks super 1.2 [2/2] [UU]
      [>....................] resync = 0.6% (1601920/234298944) finish=19.3min speed=200240K/sec
      bitmap: 2/2 pages [8KB], 65536KB chunk

md2 : active raid1 sda3[0] sdd3[1]
      439426048 blocks super 1.2 [2/2] [UU]
      bitmap: 3/4 pages [12KB], 65536KB chunk

md1 : active raid1 sda2[0] sdd2[1]
      24397824 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda1[0] sdd1[1]
      24396800 blocks super 1.2 [2/2] [UU]

unused devices:

Once the RAID array was in sync, the PV, VG and LV were recreated and backup restored into the file system.

Still waiting for an explanation

Although WD Green drives are typically consumer drives and (probably) don't run often in a raid setup, they should be exchangeable nevertheless. By offering the same or a tad more capacity, this would allow 1:1 replacement of a defect drive. At least this can be expected when getting the exact same drive model as a replacement.

So far the capacity problem was solved by deleting and creating a smaller RAID array, causing a downtime. As this is a local non-production server, this didn't hurt too much. But the questions weigh in heavy. What if the same happens on enterprise WD drives? And was this problem caused by the different firmware version or was there a problem during the drive's production? I'm currently, as I'm writing this, still waiting for an answer from Western Digital support.

Update: Download latest firmware suggested

Updated July 30th 2020

Meanwhile I received an answer from WD support. Support suggested to do a low-level format of the drive but that would not change anything, of course. On my response that a low-level format would not change anything in the drive's own capacity (seen in SMART table) and my hint that two different firmware versions can be seen, WD support suggested to install Western Digital SSD Dashboard and to do a firmware upgrade.

Unfortunately this software can only be installed on Windows so I needed to grab an old notebook with Windows 7, remove the drive from the test server, plug it into a SATA to USB adapter from Sharkoon (which was detected immediately by Windows as WD Green SSD), and then run Western Digital SSD Dashboard.

Western Digital SSD Dashboard recognized the attached drive. However when Options -> Firmware Upgrade checked for new firmware versions, it said firmware (UF400400) is up to date! Maybe UF500000 is the old version? Doesn't really make sense looking at the naming but who knows. I did the same with the other drive with firmware UF500000 and did the same in SSD Dashboard: Firmware is up to date!

Both drives have different firmware versions, yet both drives have the most recent firmware version. Yet: This is still the same drive model presented by the packaging and also in SMART's information table. To me it looks like Western Digital has produced different kinds of drives but are selling them as the same model. Of course, new packaging, renaming etc means additional costs so why do it? Unfortunately the end customer is left in the dark and runs into problems when a drive needs to be replaced.

Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

Blog Tags:

AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder