This week I had to replace a defect hard disk on an HP Proliant DL 380 G6 server running with Solaris 10. Most interesting fact here: The internal raid controller was not used - the raid itself was rather created in the system with zfs pools (don't ask me why).
To add an additional layer, the HP disk controller was still used somehow by the system, so the controller presented the disks to the OS - it was not just zfs handling the disks. Turned out the disk replacement was pretty complicated, not really expected by me to simply replace a disk.
After a lot of try'n'err and help of my colleague Alex, the following steps work.
1. Get the information of the HP tool hpacucli for the disk status:
# /opt/HPQacucli/sbin/hpacucli
=> ctrl slot=0 show config
Smart Array P410i in Slot 0 (Embedded) (sn: 500143800899D1B0)
[...]
logicaldrive 8 (279.4 GB, RAID 0, Failed)
physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 300 GB, Failed)
=> exit
2. Get the ZFS pool information:
# zpool status
NAME STATE READ WRITE CKSUM
zonepool DEGRADED 0 0 0
mirror ONLINE 0 0 0
c7t2d0s0 ONLINE 0 0 0
c7t4d0s0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c7t3d0s0 ONLINE 0 0 0
c7t5d0s0 ONLINE 0 0 0
mirror DEGRADED 0 0 0
c7t6d0s0 ONLINE 0 0 0
c7t8d0s0 UNAVAIL 54 716 11 cannot open
This gives the information that the physical disk in bay #8 is defect (can be verified in ILO or physically in front of the server). zpool gives the disks its own ID, in this case this is c7t8d0 which is defect. c7t8d0s0 is the partition/slice itself (s0).
3. Physically replace the defect disk.
4. Re-enable the disk in hpacucli:
# /opt/HPQacucli/sbin/hpacucli
=> ctrl slot=0 ld 8 modify reenable
Warning: Any previously existing data on the logical drive may not be valid or
recoverable. Continue? (y/n) y
=> exit
5. Use the 'format' command to partition and label the disk:
# format c7t8d0
selecting c7t8d0
format> fdisk
No fdisk table exists. The default partition for the disk is:
a 100% "SOLARIS System" partition
Type "y" to accept the default partition, otherwise type "n" to edit the
partition table.
y
format> p
partition> p
Part Tag Flag Cylinders Size Blocks
0 unassigned wm 0 0 (0/0/0) 0
1 unassigned wm 0 0 (0/0/0) 0
2 backup wu 0 - 36464 279.34GB (36465/0/0) 585810225
[...]
partition> 0
Part Tag Flag Cylinders Size Blocks
0 unassigned wm 0 0 (0/0/0) 0
Enter partition id tag[unassigned]:
Enter partition permission flags[wm]:
Enter new starting cyl[0]: 1
Enter partition size[0b, 0c, 1e, 0.00mb, 0.00gb]: 36464e
partition> p
Part Tag Flag Cylinders Size Blocks
0 unassigned wm 1 - 36464 279.33GB (36464/0/0) 585794160
1 unassigned wm 0 0 (0/0/0) 0
2 backup wu 0 - 36464 279.34GB (36465/0/0) 585810225
partition> label
Ready to label disk, continue? y
partition> quit
format> quit
5b: Optionally you can use prtvtoc to copy the partition table from the/a working disk:
# prtvtoc /dev/rdsk/c7t6d0s2 | fmthard -s - /dev/rdsk/c7t8d0s2
fmthard: New volume table of contents now in place.
6. Replace the disk with zpool:
# zpool replace -f zonepool c7t8d0s0
7. Check the ZFS status again. zpool will start the replace the disk:
# zpool status
NAME STATE READ WRITE CKSUM
zonepool DEGRADED 0 0 0
mirror ONLINE 0 0 0
c7t2d0s0 ONLINE 0 0 0
c7t4d0s0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c7t3d0s0 ONLINE 0 0 0
c7t5d0s0 ONLINE 0 0 0
mirror DEGRADED 0 0 0
c7t6d0s0 ONLINE 0 0 0
replacing DEGRADED 17 0 0
c7t8d0s0/old FAULTED 54 716 11 corrupted data
c7t8d0s0 DEGRADED 0 0 0 (resilvering)
After a while the /old disk disappears. If there are remaining errors they can be cleared with "zpool clear".
Bjoern from wrote on Jan 13th, 2016:
Thank you for the good example! Exactly what I needed.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder