How to replace a hard or solid state drive in Linux software raid with mdadm

Written by - 0 comments

Published on - last updated on September 21st 2023 - Listed in Linux Hardware


I'm constantly monitoring the SMART Status of server hard disks and as error rates increase, the chance of a failing disk is imminent. I prefer to replace defect hardware as soon as possible, before it actually fails, if possible. In case of a HDD this is possible.

The following steps explain how to replace a HDD of a software raid unter Linux. These steps also apply to solid state drives (SSD) of course.

Update February 28th 2013: Added commands for GPT disks.

1. Determine the defect or failing HDD -> in my case I already got that information from my monitoring using SMART data: SDB.  If the disk already completely failed, you can see that also with cat /proc/mdstat.

2. Get the current Raid-layout:

# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 sda6[0] sdb6[1]
      688009088 blocks [2/2] [UU]

md3 : active raid1 sda5[0] sdb5[1]
      20971392 blocks [2/2] [UU]

md2 : active raid1 sda3[0] sdb3[1]
      20971456 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
      524224 blocks [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[1]
      2096064 blocks [2/2] [UU]

unused devices:

As you can see, disk SDB is still shown as active in all Raid Arrays.

3. (optional in case the failing disk is still working in the software raid)
Set the failing disk (SDB) as "fail" in the software raid:

# mdadm --manage /dev/md1 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md1
# mdadm --manage /dev/md2 --fail /dev/sdb3
mdadm: set /dev/sdb3 faulty in /dev/md2
# mdadm --manage /dev/md3 --fail /dev/sdb5
mdadm: set /dev/sdb5 faulty in /dev/md3
# mdadm --manage /dev/md4 --fail /dev/sdb6
mdadm: set /dev/sdb6 faulty in /dev/md4

Now the raid status looks like the following (as if SDB failed):

# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 sda6[0] sdb6[2](F)
      688009088 blocks [2/1] [U_]

md3 : active raid1 sda5[0] sdb5[2](F)
      20971392 blocks [2/1] [U_]

md2 : active raid1 sda3[0] sdb3[2](F)
      20971456 blocks [2/1] [U_]

md1 : active raid1 sda2[0] sdb2[2](F)
      524224 blocks [2/1] [U_]    

md0 : active raid1 sda1[0] sdb1[2](F)
      2096064 blocks [2/1] [U_]     

unused devices:

4. Remove all SDB partitions from each Raid Array:

# mdadm /dev/md0 -r /dev/sdb1
mdadm: hot removed /dev/sdb1 from /dev/md0
# mdadm /dev/md1 -r /dev/sdb2
mdadm: hot removed /dev/sdb2 from /dev/md1
# mdadm /dev/md2 -r /dev/sdb3
mdadm: hot removed /dev/sdb3 from /dev/md2
# mdadm /dev/md3 -r /dev/sdb5
mdadm: hot removed /dev/sdb5 from /dev/md3
# mdadm /dev/md4 -r /dev/sdb6
mdadm: hot removed /dev/sdb6 from /dev/md4

Again a verification of the current status of the software Raid - all SDB entries are now removed:

# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 sda6[0]    
      688009088 blocks [2/1] [U_]

md3 : active raid1 sda5[0]      
      20971392 blocks [2/1] [U_]

md2 : active raid1 sda3[0]      
      20971456 blocks [2/1] [U_]

md1 : active raid1 sda2[0]      
      524224 blocks [2/1] [U_]  

md0 : active raid1 sda1[0]      
      2096064 blocks [2/1] [U_] 

unused devices:

5. (optional) Check that on the remaining disk a boot loader is installed:

# dd if=/dev/sda bs=1024 count=1 2>&1 | strings | egrep -i "lilo|grub"
GRUB

6. Shut down server (if necessary) and replace the drive. Then start the server, which should boot from SDA.

7. Copy SDA's partition table to the new SDB HDD (SDA: Good/old, SDB: New empty diks, SDA -> SDB).

Note: If you are going to replace the drive with a larger drive and your goal is to extend the size of the raid array, do not copy the partition table. Instead check out this article: Replace hard or solid state drive with a bigger one and grow software (mdadm) raid.

For disks with the MBR Master Boot Record:

# sfdisk -d /dev/sda | sfdisk /dev/sdb

For drives with the GPT partition table (all drives larger than 2TB):

# sgdisk -R /dev/sdb /dev/sda
# sgdisk -G /dev/sdb

8. Insert new SDB to Raid Arrays:

# mdadm /dev/md0 -a /dev/sdb1
# mdadm /dev/md1 -a /dev/sdb2
# mdadm /dev/md2 -a /dev/sdb3
# mdadm /dev/md3 -a /dev/sdb5
# mdadm /dev/md4 -a /dev/sdb6

9. Check Synchronisation:

# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 sdb6[2] sda6[0]
      688009088 blocks [2/1] [U_]
        resync=DELAYED

md3 : active raid1 sdb5[2] sda5[0]
      20971392 blocks [2/1] [U_]
      [>....................]  recovery =  1.2% (271936/20971392) finish=5.0min speed=67984K/sec

md2 : active raid1 sdb3[2] sda3[0]
      20971456 blocks [2/1] [U_]
        resync=DELAYED

md1 : active raid1 sdb2[2] sda2[0]
      524224 blocks [2/1] [U_]
        resync=DELAYED

md0 : active raid1 sdb1[1] sda1[0]
      2096064 blocks [2/2] [UU]

unused devices:

10. Once the synchronisation is finished, don't forget to install the boot loader also on SDB. If SDA fails and you reboot the server, SDB wouldn't have a boot loader and therefore the server wouldn't start up. With Grub V2 it's pretty easy:

# grub-install /dev/sdb
Installation finished. No error reported.

# dd if=/dev/sdb bs=1024 count=1 2>&1 | strings | egrep -i "lilo|grub"
GRUB


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder