Hardware Error: Parity error during data load. Or: Clean me!!!

Written by - 1 comments

Published on - Listed in Hardware Linux


For a couple of months I've always wondered about the following error messages appearing on my NAS, a HP Proliant N40L Microserver running Debian 7 Wheezy, every five minutes:

[Hardware Error]: CPU:0 (10:6:3) MC2_STATUS[-|CE|-|-|AddrV|CECC]: 0x940040000000018a
[Hardware Error]: MC2_ADDR: 0x00000000d3b42540
[Hardware Error]: MC2 Error: : SNP error during data copyback.
[Hardware Error]: cache level: L2, tx: GEN, mem-tx: SNP
[Hardware Error]: Corrected error, no action required.

I came across some articles, like the following:

But none offered real solutions to the problem. Some even said this logged error messages could simply be ignored...

A couple of days ago, I upgraded the NAS server from Debian Wheezy to Jessie (as a mid-way upgrad to Stretch) and realized after the successful OS upgrade, that the log entries now happen ALL THE TIME. I couldn't even use the terminal anymore because it was flooded by these messages:

[ 1026.904428] [Hardware Error]: CPU:0 (10:6:3) MC2_STATUS[-|CE|-|-|AddrV|CECC]: 0x940040000000018a
[ 1026.910229] [Hardware Error]: MC2_ADDR: 0x00000000d3b42540
[ 1026.915945] [Hardware Error]: MC2 Error: : SNP error during data copyback.
[ 1026.921690] [Hardware Error]: cache level: L2, tx: GEN, mem-tx: SNP
[ 1027.182836] [Hardware Error]: Corrected error, no action required.
[ 1027.188553] [Hardware Error]: CPU:0 (10:6:3) MC2_STATUS[-|CE|-|-|AddrV|CECC]: 0x940040000000018a
[ 1027.194345] [Hardware Error]: MC2_ADDR: 0x0000000001af2540
[ 1027.200132] [Hardware Error]: MC2 Error: : SNP error during data copyback.
[ 1027.205915] [Hardware Error]: cache level: L2, tx: GEN, mem-tx: SNP
[ 1027.338890] [Hardware Error]: Corrected error, no action required.
[ 1027.344632] [Hardware Error]: CPU:0 (10:6:3) MC1_STATUS[-|CE|-|-|AddrV]: 0x9400000000000151
[ 1027.350428] [Hardware Error]: MC1_ADDR: 0x0000ffff81012550
[ 1027.356222] [Hardware Error]: MC1 Error: Parity error during data load.
[ 1027.361997] [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD
[ 1027.430924] [Hardware Error]: Corrected error, no action required.
[ 1027.436645] [Hardware Error]: CPU:0 (10:6:3) MC1_STATUS[-|CE|-|-|AddrV]: 0x9400000000000151
[ 1027.442419] [Hardware Error]: MC1_ADDR: 0x0000ffff810b2550
[ 1027.448216] [Hardware Error]: MC1 Error: Parity error during data load.
[ 1027.453960] [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD
[ 1027.939102] [Hardware Error]: Corrected error, no action required.

Damn. It's time to dig into that problem again. This time I got luckier and came across this forum thread:

The most interesting posted text there was:

"It is most likely a CPU fan dust bunny. That's the signal from the kernel to clean those out."

As easy as this sounds, it made sense. The microserver has been running day and night since it became my NAS server in December 2012 (see article Building a home file server with HP Proliant N40L). That's more than 5 years of total run time. As you might be aware of, the motherboard of this Microserver is under the drive cage and not easily accessible. And therefore not easily cleanable either.

I gave it a shot, shut down the server, removed the cables from the motherboard and pulled it out.

Dust on the heat sink causing hardware errors in kernel log

There it is. A thick layer of dust sitting on the CPU's heat sink.

I cleaned the motherboard (vacuumed the dust off), re-attached the cable and pushed the motherboard back in position. Time of truth. I booted the server.

Checking syslog, you can easily see when I turned off (15:28) and booted the server again (15:42):

May 24 15:28:04 nas kernel: [77872.129490] [Hardware Error]: CPU:0 (10:6:3) MC1_STATUS[-|CE|-|-|AddrV]: 0x9400000000000151
May 24 15:28:04 nas kernel: [77872.135237] [Hardware Error]: MC1_ADDR: 0x0000ffff810b2550
May 24 15:28:04 nas kernel: [77872.140955] [Hardware Error]: MC1 Error: Parity error during data load.
May 24 15:28:04 nas kernel: [77872.146656] [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD
May 24 15:28:04 nas kernel: [77872.263866] [Hardware Error]: Corrected error, no action required.
May 24 15:28:04 nas kernel: [77872.269509] [Hardware Error]: CPU:0 (10:6:3) MC2_STATUS[-|CE|-|-|AddrV|CECC]: 0x940040000000018a
May 24 15:28:04 nas kernel: [77872.275283] [Hardware Error]: MC2_ADDR: 0x0000000001af2540
May 24 15:28:04 nas kernel: [77872.280990] [Hardware Error]: MC2 Error: : SNP error during data copyback.
May 24 15:28:04 nas kernel: [77872.286694] [Hardware Error]: cache level: L2, tx: GEN, mem-tx: SNP
May 24 15:28:04 nas kernel: [77872.323890] [Hardware Error]: Corrected error, no action required.
May 24 15:28:04 nas kernel: [77872.329552] [Hardware Error]: CPU:0 (10:6:3) MC1_STATUS[-|CE|-|-|AddrV]: 0x9400000000000151
May 24 15:28:04 nas kernel: [77872.335294] [Hardware Error]: MC1_ADDR: 0x0000ffff810b2550
May 24 15:28:04 nas kernel: [77872.341013] [Hardware Error]: MC1 Error: Parity error during data load.
May 24 15:28:04 nas kernel: [77872.346716] [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD
May 24 15:28:04 nas kernel: [77872.371793] [Hardware Error]: Corrected error, no action required.
May 24 15:28:04 nas kernel: [77872.377085] [Hardware Error]: CPU:0 (10:6:3) MC1_STATUS[-|CE|-|-|AddrV]: 0x9400000000000151
May 24 15:28:04 nas kernel: [77872.382397] [Hardware Error]: MC1_ADDR: 0x0000ffff810b2540
May 24 15:28:04 nas kernel: [77872.387718] [Hardware Error]: MC1 Error: Parity error during data load.
May 24 15:28:04 nas kernel: [77872.393030] [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD
May 24 15:42:13 nas kernel: [    0.000000] Initializing cgroup subsys cpuset
May 24 15:42:13 nas kernel: [    0.000000] Initializing cgroup subsys cpu
May 24 15:42:13 nas kernel: [    0.000000] Initializing cgroup subsys cpuacct
May 24 15:42:13 nas kernel: [    0.000000] Linux version 3.16.0-6-amd64 (debian-kernel@lists.debian.org) (gcc version 4.9.2 (Debian 4.9.2-10+deb8u1) ) #1 SMP Debian 3.16.56-1+deb8u1 (2018-05-08)
May 24 15:42:13 nas kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.16.0-6-amd64 root=UUID=e00b8ddf-5247-4b9f-834c-d557df90f575 ro quiet
May 24 15:42:13 nas kernel: [    0.000000] e820: BIOS-provided physical RAM map:

Then, I waited. From the logs above (which flooded my terminal) you can see that already after 1026 seconds of uptime the hardware errors appeared.

Now, after 1200 seconds of uptime, still no hardware errors:

root@nas:~# uptime
 16:03:00 up 20 min,  1 user,  load average: 0.04, 0.15, 0.09

root@nas:~# echo $((20 * 60 ))
1200

root@nas:~# dmesg | tail
[   10.257700] RPC: Registered named UNIX socket transport module.
[   10.257706] RPC: Registered udp transport module.
[   10.257709] RPC: Registered tcp transport module.
[   10.257711] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   10.272263] FS-Cache: Loaded
[   10.321299] FS-Cache: Netfs 'nfs' registered for caching
[   10.376030] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[   11.809469] tg3 0000:02:00.0 eth0: Link is up at 1000 Mbps, full duplex
[   11.809478] tg3 0000:02:00.0 eth0: Flow control is on for TX and on for RX
[   11.809506] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

Even after now 41 minutes (=2460 seconds) of uptime, still no errors:

root@nas:~# uptime && dmesg |tail
 16:23:49 up 41 min,  1 user,  load average: 0.02, 0.03, 0.01
[   10.257700] RPC: Registered named UNIX socket transport module.
[   10.257706] RPC: Registered udp transport module.
[   10.257709] RPC: Registered tcp transport module.
[   10.257711] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   10.272263] FS-Cache: Loaded
[   10.321299] FS-Cache: Netfs 'nfs' registered for caching
[   10.376030] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[   11.809469] tg3 0000:02:00.0 eth0: Link is up at 1000 Mbps, full duplex
[   11.809478] tg3 0000:02:00.0 eth0: Flow control is on for TX and on for RX
[   11.809506] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

These error messages really turned out to be a warning from the OS to clean the server. Who'd have thought that looking at these hardware error messages...


Add a comment

Show form to leave a comment

Comments (newest first)

Alex from DE wrote on May 28th, 2018:

You can use -T option with dmesg command to make DateTime more readable:
alex:~$ dmesg -T | head -10
[Mon May 21 11:19:55 2018] Initializing cgroup subsys cpuset
[Mon May 21 11:19:55 2018] Initializing cgroup subsys cpu
[Mon May 21 11:19:55 2018] Initializing cgroup subsys cpuacct
[Mon May 21 11:19:55 2018] Linux version 3.10.0-693.21.1.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Wed Mar 7 19:03:37 UTC 2018
[Mon May 21 11:19:55 2018] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-693.21.1.el7.x86_64 root=/dev/mapper/vg.01-lv_root ro crashkernel=auto rd.lvm.lv=vg.01/lv_root rd.lvm.lv=vg.02/lv_swap noquiet ipv6.disable=1 net.ifnames=0 elevator=deadline user_namespace.enable=1 biosdevname=0 fsck.repair=yes LANG=en_US.UTF-8
[Mon May 21 11:19:55 2018] Disabled fast string operations
[Mon May 21 11:19:55 2018] e820: BIOS-provided physical RAM map:

[Mon May 21 11:19:55 2018] BIOS-e820: [mem 0x0000000000000000-0x000000000009ebff] usable
[Mon May 21 11:19:55 2018] BIOS-e820: [mem 0x000000000009ec00-0x000000000009ffff] reserved
[Mon May 21 11:19:55 2018] BIOS-e820: [mem 0x00000000000dc000-0x00000000000fffff] reserved


RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder