Read Disk SMART values on FreeBSD (6.0) behind a HP Raid (cciss)

Written by - 0 comments

Published on - Listed in Hardware BSD Monitoring


On an old HP Proliant DL380 G4 server running on FreeBSD 6.0, I discovered a strange behavior when the machine booted:

FreeBSD Incorrect Block Count

Out of the blue I'd say it looks like a file system check. I'm no BSD expert, but this assumption makes sense. Because I suspected a disk failure, I wanted to check the SMART values of all disks. But that's easier said than done. First of all, the disks run on an HP Raid Controller, therefore they're presented to the FreeBSD OS as cciss devices.
Now to the next downer: cciss support in smartmontools exists since version 5.38. Guess what? The smartmontools package for FreeBSD 6.0 is version 5.33 (see FreeBSD's FTP-Archive for 6.0). Fortunately in 6.4 the smartmontools package was updated to 5.38 (see FreeBSD's FTP-Archive for 6.4) and it can be installed on FreeBSD 6.0, too.

So I downloaded and installed smartmontools:

pkg_add smartmontools-5.38.tbz
smartmontools has been installed
To check the status of drives, use the following:

        /usr/local/sbin/smartctl -a /dev/ad0            for first ATA drive
        /usr/local/sbin/smartctl -a /dev/da0            for first SCSI drive

To enable monitor of drives, you can use /usr/local/sbin/smartd
A sample configuration file has been installed as
/usr/local/etc/smartd.conf.sample
Copy this file to /usr/local/etc/smartd.conf and edit appropriately

To have smartd start at boot
        echo 'smartd_enable="YES"' >> /etc/rc.conf

It took me a while to figure out the syntax for disks behind cciss, but eventually I got the first results:

smartctl -iH -d cciss,0 /dev/ciss0
smartctl version 5.38 [i386-portbld-freebsd6.4] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: COMPAQ   BF0368A4CA       Version: HPB5
Serial number: 3WQ18WXXXXXXXXXXXQQ
Device type: disk
Transport protocol: Parallel SCSI (SPI-4)
Local Time is: Sun Nov  3 20:40:59 2013 CET
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

smartctl -iH -d cciss,1 /dev/ciss0
smartctl version 5.38 [i386-portbld-freebsd6.4] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: COMPAQ   BF03688284       Version: HPB5
Serial number: 3WQ15KZMWXXDFDFWWWV
Device type: disk
Transport protocol: Parallel SCSI (SPI-4)
Local Time is: Sun Nov  3 20:41:12 2013 CET
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

 The important part here is to see that the "block device" at the end is always /dev/ciss0 which is the raid controller. To get the SMART information for all disks attached to /dev/ciss0, "-d cciss,N" must be used. In this server there are 6 drives, so I could go from "cciss,0" up to "cciss,5".
The parameters -iH at the begin mean "show me the disk's information" and "show me the disk's health status".

To read more values (e.g. temperature, read errors, etc.), the parameter -a need to be used:

smartctl -d cciss,3 /dev/ciss0 -a
smartctl version 5.38 [i386-portbld-freebsd6.4] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: COMPAQ   BD1468A4C5       Version: HPB4
Serial number: 3KS2TRV30000762072WC
Device type: disk
Transport protocol: Parallel SCSI (SPI-4)
Local Time is: Sun Nov  3 21:17:59 2013 CET
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature:     23 C
Drive Trip Temperature:        68 C
Elements in grown defect list: 48
Vendor (Seagate) cache information
  Blocks sent to initiator = 1088366961
  Blocks received from initiator = 3948371350
  Blocks read from cache and sent to initiator = 794704138
  Number of read and write commands whose size <= segment size = 3304384398
  Number of read and write commands whose size > segment size = 0
Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 67077.17
  number of minutes until next internal SMART test = 78

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0          0.000           0
write:         0        0         0         0          0          0.000           0

Non-medium error count:      218

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -       0                 - [-   -    -]

Long (extended) Self Test duration: 2643 seconds [44.0 minutes]

Take a look at the following line: Elements in grown defect list: 48

Disk 4 (cciss,3) was the only disk with elements in the defect list. Looks like I found the bad guy.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder