smartctl on FreeBSD with CCISS (HP SmartArray) raid: Watch out!

Written by - 1 comments

Published on - Listed in Hardware FreeBSD


Last week I wrote several posts about S.M.A.R.T. checks on FreeBSD. Well they work, they can definitely be used for monitoring on production servers, but there is one issue which needs to be addressed: The drives order used in smartctl (cciss,N) is not forcibly the physical order!

Let's go to some detail. Last week I got an alert from check_smart.pl that a disk on a HP Proliant DL380 G5 running with FreeBSD 9.1 got defect sectors (elements in grown defect list). I verified this manually with the smartctl command:

smartctl -d cciss,0 /dev/ciss0 -a
smartctl 6.0 2012-10-10 r3643 [FreeBSD 9.1-RELEASE-p4 amd64] (local build)
Copyright (C) 2012-12, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/ciss0 [cciss_disk_00] [SCSI]: Device open changed type from 'sat,auto' to 'cciss'
Vendor:             HP
[...]
Serial number:      123450000999VE
Device type:        disk
Transport protocol: SAS
Local Time is:      Fri Nov  8 13:59:19 2013 CET
[...]
Elements in grown defect list: 12

Logically, to me, "cciss,0" means the very first disk of the server. So that would be drive slot #1.
I exchanged the drive and ran smartctl again:

smartctl -d cciss,0 /dev/ciss0 -a
smartctl 6.0 2012-10-10 r3643 [FreeBSD 9.1-RELEASE-p4 amd64] (local build)
Copyright (C) 2012-12, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/ciss0 [cciss_disk_00] [SCSI]: Device open changed type from 'sat,auto' to 'cciss'
Vendor:             HP
[...]
Serial number:      123450000999VE
Device type:        disk
Transport protocol: SAS
Local Time is:      Fri Nov  8 15:27:33 2013 CET
[...]
Elements in grown defect list: 14

Did you notice the exact same serial number of the drive behind cciss,0? So that means that I have replaced the wrong disk.

After some research, I found this archived FreeBSD mailing list article from 2008: http://lists.freebsd.org/pipermail/freebsd-ports/2008-April/048312.html
The author of the post describes the exact same phenomenon on his FreeBSD machine:

The recent incorporation of the FreeBSD CISS SMART support into the
mainstream smartmontools distribution has had some unexpected results on
several HP ProLiant DL380 G3 machines.  I have five DL380/G3s with four
drives each; all have the same symptoms now: querying a given ciss/scsi
target gives results for the wrong drive

It seems the correct disk labeling/numbering worked before smartmontools 5.38. Unfortunately FreeBSD does not have tools to list all physical drives. camcontrol devlist only shows the logical drive's raid controller.

As stupid as it sounds... but labeling the drives' serial number with a sticker can help you identify the disk in the physical slots. You can find the serial number of the disk in the smartctl output and match it against the physical drive.

So if you use FreeBSD behind a CCISS (HP SmartArray) Raid Controller, be extra careful and don't trust the cciss numbering!

Update, still Nov 11th 2013:
After some replacement tests, it seems that FreeBSD is seeing the disk the other way around. So cciss,0 is the last disk, cciss,3 the first (in a server with 4 physical disks). If it is always like this, the physical disk can be identified. But what happens if a new disk is inserted? Is a recount necessary when disk #5 appears as cciss,0 or will it appear as cciss,5? I have no idea...

Update 2, again Nov 11th 2013:
I just came across the command cciss_vol_status which can be compiled on FreeBSD and Linux from http://sourceforge.net/projects/cciss/files/cciss_vol_status/. So I gave it a shot and installed it:

cd /tmp
fetch http://downloads.sourceforge.net/project/cciss/cciss_vol_status/cciss_vol_status-1.11.tar.gz
tar -xzf cciss_vol_status-1.11.tar.gz
cd cciss_vol_status-1.11
./configure
make
make install

Then I ran the command against the /dev/ciss0 device and at first I was disappointed - again:

cciss_vol_status -s /dev/ciss0
/dev/ciss0: (Smart Array P400) RAID 1 Volume 0 status: OK.
/dev/ciss0: (Smart Array P400) RAID 1 Volume 1 status: OK.

My face brightened up when I tried the verbose option (-V):

cciss_vol_status -V /dev/ciss0
Controller: Smart Array P400
  Board ID: 0x3234103c
  Logical drives: 2
  Running firmware: 5.20
  ROM firmware: 5.20
/dev/ciss0: (Smart Array P400) RAID 1 Volume 0 status: OK.
/dev/ciss0: (Smart Array P400) RAID 1 Volume 1 status: OK.
  Physical drives: 4
   connector 2I box 1 bay 4  HP DG072ABAB3  XXXXXXXX00009732RCV7   HPDD OK
   connector 2I box 1 bay 3  HP DG072BB975  XXXXXXXX00009907Q0VR   HPDC OK
   connector 2I box 1 bay 2  HP DG072BB975  XXXXXXXX00009906P4DN   HPDC OK
   connector 2I box 1 bay 1  HP DG072BB975  XXXXXXXX00009907RPKW   HPDC OK
/dev/ciss0(Smart Array P400:0): Non-Volatile Cache status:
                   Cache configured: Yes
                  Read cache memory: 52 MiB
                 Write cache memory: 156 MiB
                Write cache enabled: Yes

So THIS is exactly what I needed! I can now finally compare the serial number from smartctl output and match it against the correct physical slot. Problem solved! 


Add a comment

Show form to leave a comment

Comments (newest first)

macan from wrote on Jun 8th, 2016:

cciss_vol_status is in ports:
/usr/ports/sysutils/cciss_vol_status/


RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder