Solaris: Replace defect HDD with hpacucli and zpool

Written by - 1 comments

Published on - Listed in Unix Solaris Hardware ZFS


This week I had to replace a defect hard disk on an HP Proliant DL 380 G6 server running with Solaris 10. Most interesting fact here: The internal raid controller was not used - the raid itself was rather created in the system with zfs pools (don't ask me why).

To add an additional layer, the HP disk controller was still used somehow by the system, so the controller presented the disks to the OS - it was not just zfs handling the disks. Turned out the disk replacement was pretty complicated, not really expected by me to simply replace a disk.

After a lot of try'n'err and help of my colleague Alex, the following steps work.

1. Get the information of the HP tool hpacucli for the disk status:

# /opt/HPQacucli/sbin/hpacucli
=> ctrl slot=0 show config

Smart Array P410i in Slot 0 (Embedded)    (sn: 500143800899D1B0)

[...]

logicaldrive 8 (279.4 GB, RAID 0, Failed)

physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 300 GB, Failed)

=> exit

2. Get the ZFS pool information:

# zpool status

        NAME          STATE     READ WRITE CKSUM
        zonepool      DEGRADED     0     0     0
          mirror      ONLINE       0     0     0
            c7t2d0s0  ONLINE       0     0     0
            c7t4d0s0  ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c7t3d0s0  ONLINE       0     0     0
            c7t5d0s0  ONLINE       0     0     0
          mirror      DEGRADED     0     0     0
            c7t6d0s0  ONLINE       0     0     0
            c7t8d0s0  UNAVAIL     54   716    11  cannot open

This gives the information that the physical disk in bay #8 is defect (can be verified in ILO or physically in front of the server). zpool gives the disks its own ID, in this case this is c7t8d0 which is defect. c7t8d0s0 is the partition/slice itself (s0).

3. Physically replace the defect disk.

4. Re-enable the disk in hpacucli:

# /opt/HPQacucli/sbin/hpacucli
=> ctrl slot=0 ld 8 modify reenable

Warning: Any previously existing data on the logical drive may not be valid or
         recoverable. Continue? (y/n) y

=> exit

5. Use the 'format' command to partition and label the disk:

# format c7t8d0
selecting c7t8d0

format> fdisk
No fdisk table exists. The default partition for the disk is:

  a 100% "SOLARIS System" partition

Type "y" to accept the default partition,  otherwise type "n" to edit the
 partition table.
y

format> p

partition> p
Part      Tag    Flag     Cylinders         Size            Blocks
  0 unassigned    wm       0                0         (0/0/0)             0
  1 unassigned    wm       0                0         (0/0/0)             0
  2     backup    wu       0 - 36464      279.34GB    (36465/0/0) 585810225
[...]

partition> 0
Part      Tag    Flag     Cylinders         Size            Blocks
  0 unassigned    wm       0                0         (0/0/0)             0

Enter partition id tag[unassigned]:
Enter partition permission flags[wm]:
Enter new starting cyl[0]: 1
Enter partition size[0b, 0c, 1e, 0.00mb, 0.00gb]: 36464e

partition> p
Part      Tag    Flag     Cylinders         Size            Blocks
  0 unassigned    wm       1 - 36464      279.33GB    (36464/0/0) 585794160
  1 unassigned    wm       0                0         (0/0/0)             0
  2     backup    wu       0 - 36464      279.34GB    (36465/0/0) 585810225

partition> label
Ready to label disk, continue? y

partition> quit

format> quit

5b: Optionally you can use prtvtoc to copy the partition table from the/a working disk:

# prtvtoc /dev/rdsk/c7t6d0s2 | fmthard -s - /dev/rdsk/c7t8d0s2
fmthard:  New volume table of contents now in place.

6. Replace the disk with zpool:

# zpool replace -f zonepool c7t8d0s0

7. Check the ZFS status again. zpool will start the replace the disk:

# zpool status
        NAME                STATE     READ WRITE CKSUM
        zonepool            DEGRADED     0     0     0
          mirror            ONLINE       0     0     0
            c7t2d0s0        ONLINE       0     0     0
            c7t4d0s0        ONLINE       0     0     0
          mirror            ONLINE       0     0     0
            c7t3d0s0        ONLINE       0     0     0
            c7t5d0s0        ONLINE       0     0     0
          mirror            DEGRADED     0     0     0
            c7t6d0s0        ONLINE       0     0     0
            replacing       DEGRADED    17     0     0
              c7t8d0s0/old  FAULTED     54   716    11  corrupted data
              c7t8d0s0      DEGRADED     0     0     0  (resilvering)

After a while the /old disk disappears. If there are remaining errors they can be cleared with "zpool clear".


Add a comment

Show form to leave a comment

Comments (newest first)

Bjoern from wrote on Jan 13th, 2016:

Thank you for the good example! Exactly what I needed.


RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder