grub2 boot issues after hdd replacement

Written by - 0 comments

Published on - Listed in Linux Shell Hardware Rant


A couple of weeks ago, I ran into a very strange and at first sight complicated problem. A physical server, running with Debian Squeeze and Software Raid, didn't start up anymore after a reboot. The troubleshooting was much more complicated too, because I didn't have a console access to this server - so I was kind of doing blind troubleshooting.

First I thought (and hoped), that fsck is probably still running as the server still wasn't up. I gave it adequate time before I contacted someone in the data center to physically take a look at the console. Then the answer from the data center guy came back: The server doesn't boot and hangs on the grub boot screen. Oh golly...

I booted the server into a rescue mode with SSH activated so I could at least take a look at the current grub configuration. I've already ran into grub2 issues in the past (see Kernel upgrade problem on Debian Squeeze) so I made my connaissance with the "device.map" file. I mounted the boot file system and took a look at it:

root@rescue /mnt/boot/grub # cat device.map
(hd0)   /dev/disk/by-id/ata-ST3000DM001-9YN166_S1F08LYF
(hd1)   /dev/disk/by-id/ata-ST3000DM001-9YN166_S1F03NRC

These entries mean that grub2 looks for these disks to boot on. I remembered that a couple of weeks ago I replaced a defect HDD - and that one of these entries are probably still from the old HDD. So I needed to replace the entries with new entries. I decided to completely remove the grub2 bootloader and reinstall it, to make sure, grub is also written to the first sectors of the disks:

root@rescue ~ # mkdir /mnt/rescue
root@rescue ~ # mkdir /mnt/rescue/boot
root@rescue ~ # mount /dev/vg0/root /mnt/rescue/
root@rescue ~ # mount /dev/md1 /mnt/rescue/boot
root@rescue ~ # mount /dev/vg0/var /mnt/rescue/var
root@rescue ~ # mount --bind /dev /mnt/rescue/dev/
root@rescue ~ # mount --bind /proc /mnt/rescue/proc/
root@rescue ~ # mount --bind /sys /mnt/rescue/sys/
root@rescue ~ # chroot /mnt/rescue /bin/bash

If you wonder, why I mounted the var file system: This is needed if one wants to use apt-get. And that's what I did:

root@rescue / # apt-get remove grub; apt-get purge grub; apt-get install grub

It was necessary to use "purge", otherwise some of the grub config files were still hanging around... After I answered the install questions (I installed grub on /dev/sda, /dev/sdb and /dev/md1 which, as you see, was my boot file system), I checked the device.map file again:

root@rescue / # cat /boot/grub/device.map
(hd0)   /dev/disk/by-id/scsi-35000c5005271765b
(hd1)   /dev/disk/by-id/scsi-35000c5004a2aa7ce

After these changes, the system booted again.

But how could this happen? I investigated on another, pretty similar server, which also had a recent disk replacement. The device.map also contained one old HDD entry so I ran update-grub, to update the grub configuration:

update-grub

But there were no changes made to the device.map file; the old HDD entry still existed. If I were to reboot that server, it probably wouldn't start up anymore, too!

I continued some tests and got aware that if I removed device.map and _then_ launched update-grub, the file was created by update-grub and the entries were _now_ correct.
So if you replace a HDD, make sure you delete the /boot/device.map file before launching the update-grub command!

Shortly after this discovery, I filed a Debian bug report, which can be seen here: grub-update does not update device.map when hdd was replaced. Hopefully this bug will be fixed soon - or was already fixed as Debian Squeeze uses grub2 package version 1.98 and Wheezy uses 1.99.

Update March 5th 2013: I had a similar issue today when I just updated a Debian Squeeze with the latest patches and also a Kernel upgrade. The update itself went through without any error, but at the reboot, grub didn't correctly start up. Besides the steps mentioned in this post, I additionally had to manually reinstall grub on the disks:

grub-install /dev/sda; grub-install /dev/sdb

I've had several boot issues after Debian updates (not even distro upgrade!) now... I'm kind of getting scared :-/

Update February 2nd 2014:
Wow - one year later and I've had a similar experience on Debian Wheezy. See Debian not booting: ALERT /dev/disk/by-uuid does not exist.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder