Linux server crash due to defect memory

Written by - 0 comments

Published on - Listed in Linux Rant


Just recently I had to handle two crashes of the same Linux server. As soon as I launched some I/O intensive process (rsync in my case), the machine crashed.

The following log entries were written in the kern.log.

First crash:

Apr 25 20:12:15  kernel: [12156.863672] BUG: unable to handle kernel NULL pointer dereference at (null)
Apr 25 20:12:15  kernel: [12156.863728] IP: [] writeback_inodes_wb+0xf6/0x4ff
Apr 25 20:12:15  kernel: [12156.863765] PGD 0
Apr 25 20:12:15  kernel: [12156.863787] Oops: 0002 [#1] SMP
Apr 25 20:12:15  kernel: [12156.863812] last sysfs file: /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor
Apr 25 20:12:15  kernel: [12156.863862] CPU 4
Apr 25 20:12:15  kernel: [12156.863883] Modules linked in: acpi_cpufreq cpufreq_conservative cpufreq_powersave cpufreq_stats cpufreq_userspace ext3 jbd loop snd_pcm snd_timer i2c_i801 snd soundcore snd_page_alloc i2c_core video wmi button output pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod aacraid 3w_9xxx 3w_xxxx raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 md_mod sata_nv sata_sil sata_via sd_mod crc_t10dif ahci libata ehci_hcd r8169 xhci scsi_mod usbcore thermal nls_base mii processor thermal_sys [last unloaded: scsi_wait_scan]
Apr 25 20:12:15  kernel: [12156.864195] Pid: 9876, comm: flush-253:1 Not tainted 2.6.32-5-amd64 #1 System Product Name
Apr 25 20:12:15  kernel: [12156.864246] RIP: 0010:[]  [] writeback_inodes_wb+0xf6/0x4ff
Apr 25 20:12:15  kernel: [12156.864298] RSP: 0018:ffff88043b4c9d00  EFLAGS: 00010286

Second crash, very similar log entries:

Apr 26 11:11:12 kernel: [ 2942.917788] BUG: unable to handle kernel NULL pointer dereference at (null)
Apr 26 11:11:12 kernel: [ 2942.917838] IP: [<(null)>] (null)
Apr 26 11:11:12 kernel: [ 2942.917862] PGD 0
Apr 26 11:11:12 kernel: [ 2942.917884] Oops: 0010 [#1] SMP
Apr 26 11:11:12 kernel: [ 2942.917907] last sysfs file: /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor
Apr 26 11:11:12 kernel: [ 2942.917952] CPU 0
Apr 26 11:11:12 kernel: [ 2942.917971] Modules linked in: acpi_cpufreq cpufreq_conservative cpufreq_powersave cpufreq_stats cpufreq_userspace ext3 jbd loop i2c_i801 i2c_core video snd_pcm evdev output wmi snd_timer snd soundcore snd_page_alloc pcspkr button ext4 mbcache jbd2 crc16 dm_mod aacraid 3w_9xxx 3w_xxxx raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 md_mod sata_nv sata_sil sata_via sd_mod crc_t10dif ahci libata ehci_hcd scsi_mod xhci r8169 mii thermal usbcore nls_base processor thermal_sys [last unloaded: scsi_wait_scan]
Apr 26 11:11:12 kernel: [ 2942.918246] Pid: 1288, comm: flush-253:1 Not tainted 2.6.32-5-amd64 #1 System Product Name
Apr 26 11:11:12 kernel: [ 2942.918292] RIP: 0010:[<0000000000000000>]  [<(null)>] (null)
Apr 26 11:11:12 kernel: [ 2942.918320] RSP: 0018:ffff88043b651c28  EFLAGS: 00010087

First I assumed a bug in the kernel for EXT4 file systems but after an extended hardware stress test, a defect memory dimm was found.

After replacing the dimm I launched the same rsync process again and no problems (and therefore no crashes) occured this time.



Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder