On my Linux desktop (currently running Linux Mint 18.3) I've been using a dedicated SSD drive for virtual machine testing using VMware Player. This has worked well for the past years and I haven't seen any issues - until last week.
After I setup a new Windows 10 virtual machine with a single vmdk file on this dedicated SSD (mounted as ext4 file system), the VM seemed to work fine. I was able to do some testing and software installations in Windows without any problems. But after only the fourth usage of this VM, the VM started to hang during a shutdown (with Windows updates installing).
The problem: Not only VMware Player seemed to be frozen, the physical machine (the VM host) started to become very laggy, too.
As the desktop itself was unable to be operated anymore, luckily I was still able to SSH into my workstation. Obviously the interesting VMware Player releated processes were:
ckadm 32730 84.7 18.4 10716464 3032084 ? Ssl 13:46 69:44 /usr/lib/vmware/bin/vmware-vmx -ssnapshot.numRollingTiers=0 -s vmx.stdio.keep=TRUE -# product=4;name=VMware Player;version=14.1.7;buildnumber=12989993;licensename=VMware Pla
ckadm 321 0.0 0.0 76792 2032 ? S 13:46 0:00 \_ /usr/lib/vmware/bin/thnuclnt -pdir /tmp/vmware-ckadm/thnuclnt-32730 -fg
ckadm 323 0.0 0.0 85052 1472 ? S 13:46 0:00 \_ /usr/lib/vmware/bin/thnuclnt -pdir /tmp/vmware-ckadm/thnuclnt-32730 -fg
ckadm 324 0.0 0.0 76264 1444 ? S 13:46 0:01 \_ /usr/lib/vmware/bin/thnuclnt -pdir /tmp/vmware-ckadm/thnuclnt-32730 -fg
ckadm 325 0.0 0.0 76264 780 ? S 13:46 0:00 \_ /usr/lib/vmware/bin/thnuclnt -pdir /tmp/vmware-ckadm/thnuclnt-32730 -fg
ckadm 340 0.0 0.0 76264 1628 ? S 13:46 0:01 \_ /usr/lib/vmware/bin/thnuclnt -pdir /tmp/vmware-ckadm/thnuclnt-32730 -fg
A quick check of the path of the opened vmdk revealed that the main process (PID 32730) was keeping this vmdk file open:
ckadm@mintp ~ $ sudo lsof -p 32730 | grep vmdk
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
Output information may be incomplete.
vmware-vm 32730 ckadm 82u REG 8,33 28983820288 3670019 /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10/Windows 10 x64.vmdk
Obviously /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa is the mount point of the dedicated virtual machine SSD. A manual directory listing of this path caused a freeze of the SSH session, too:
ckadm@mintp ~ $ cd /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/
ckadm@mintp /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa $ ll
total 28
drwxrwxrwx 4 root root 4096 Jan 4 11:19 ./
drwxr-x---+ 5 root root 4096 Jan 11 13:46 ../
drwx------ 2 root root 16384 Jan 4 11:17 lost+found/
drwxrwxr-x 5 ckadm ckadm 4096 Jan 11 14:42 Windows10/
ckadm@mintp /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa $ cd Windows10/
ckadm@mintp /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10 $ ls -ltr
^C^C^C
Not even [CTRL]+[C] allowed to return back to the shell, so another shell session had to be opened. That's proof that there is definitely something wrong with either the mounted filesystem or with the physical SSD.
Now that the SSD was suspected to be the cause of this "freeze", the SMART information from this drive was analyzed:
ckadm@mintp ~ $ sudo smartctl -a /dev/sdc
[sudo] password for ckadm:
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.15.0-126-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: SandForce Driven SSDs
Device Model: ADATA SSD S510 120GB
Serial Number: redacted
LU WWN Device Id: 0 000000 000000000
Firmware Version: 5.0.2c
User Capacity: 120'034'123'776 bytes [120 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Mon Jan 11 15:21:14 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 48) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x0021) SCT Status supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 110 110 050 Pre-fail Always - 0/32858128
5 Retired_Block_Count 0x0033 100 100 003 Pre-fail Always - 1
9 Power_On_Hours_and_Msec 0x0032 000 000 000 Old_age Always - 186h+00m+35.610s
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 756
171 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline - 80
177 Wear_Range_Delta 0x0000 000 000 000 Old_age Offline - 1
181 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0
182 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 030 041 000 Old_age Always - 30 (Min/Max 16/41)
195 ECC_Uncorr_Error_Count 0x001c 120 120 000 Old_age Offline - 0/32858128
196 Reallocated_Event_Count 0x0033 100 100 003 Pre-fail Always - 1
201 Unc_Soft_Read_Err_Rate 0x001c 120 120 000 Old_age Offline - 0/32858128
204 Soft_ECC_Correct_Rate 0x001c 120 120 000 Old_age Offline - 0/32858128
230 Life_Curve_Status 0x0013 100 100 000 Pre-fail Always - 100
231 SSD_Life_Left 0x0013 100 100 010 Pre-fail Always - 0
233 SandForce_Internal 0x0000 000 000 000 Old_age Offline - 2608
234 SandForce_Internal 0x0032 000 000 000 Old_age Always - 2970
241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 2970
242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 3267
SMART Error Log not supported
SMART Self-test Log not supported
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
There are a couple of indications that there are problems with this drive:
Attribute 5, here noted as Retired Block Count (also known as Re-Allocated Sector Count), as well as attribute 196 (Reallocated_Event_Count) shows a value of one. This means the SSD had to re-allocate at least one defect block/sector. That alone is a symptom of a "slowly" dying drive, but not forcibly indicating a drive in a very bad state.
The more interesting attribute is attribute 231 (SSD_Life_Left) which has a value of 0. According to its own SMART information, this SSD is not expected to run correctly anymore. From the ADATA SSD Toolbox User guide:
SSD Life Left-Indicates the approximate SSD life left, in terms of program/erase cycles or Flash blocks currently available for use
That's definitely a clear sign: Replace this SSD drive!
Curious however are the attributes with a "double" value, such as attribute 1 (Raw_Read_Error) which shows a value of 0/32858128. How this have to be interpreted, I am honestly not sure - even after working with SMART information for over 10 years. To me it looks like the second value is the real value which would make sense with a "hanging" SSD drive (unable to read).
Note: Before even installing this SSD I was already aware that it was a used SSD drive and only used it for virtual machine testing. But I wouldn't have thought it would affect the host. :-(
Usually when something in the system is hanging, the Linux Kernel (should) logs something. And this situation was no exception. dmesg clearly showed a problem including a stack trace:
[Mon Jan 11 14:42:41 2021] WARNING: CPU: 3 PID: 318 at /build/linux-hwe-vJCqK6/linux-hwe-4.15.0/fs/ext4/inode.c:3932 ext4_set_page_dirty+0x4e/0x60
[Mon Jan 11 14:42:41 2021] Modules linked in: bluetooth ecdh_generic uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media joydev snd_usb_audio snd_usbmidi_lib vmnet(OE) vmw_vsock_vmci_transport vsock vmw_vmci vmmon(OE) binfmt_misc input_leds intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd_hda_codec_realtek snd_hda_codec_generic aesni_intel snd_hda_codec_hdmi aes_x86_64 crypto_simd glue_helper cryptd snd_hda_intel intel_cstate snd_hda_codec intel_rapl_perf snd_hda_core snd_hwdep serio_raw snd_pcm eeepc_wmi asus_wmi sparse_keymap wmi_bmof snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer lpc_ich snd soundcore mei_me mei shpchp mac_hid nct6775 hwmon_vid coretemp parport_pc
[Mon Jan 11 14:42:41 2021] ppdev lp parport autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear dm_mirror dm_region_hash dm_log raid1 amdkfd amd_iommu_v2 hid_generic amdgpu chash i2c_algo_bit psmouse ttm usbhid drm_kms_helper hid syscopyarea firewire_ohci pata_acpi sysfillrect sysimgblt r8169 pata_marvell fb_sys_fops mii ahci drm firewire_core libahci crc_itu_t wmi video
[Mon Jan 11 14:42:41 2021] CPU: 3 PID: 318 Comm: vmx-vcpu-1 Tainted: G W OE 4.15.0-126-generic #129~16.04.1-Ubuntu
[Mon Jan 11 14:42:41 2021] Hardware name: System manufacturer System Product Name/P8H67-M EVO, BIOS 3703 05/03/2013
[Mon Jan 11 14:42:41 2021] RIP: 0010:ext4_set_page_dirty+0x4e/0x60
[Mon Jan 11 14:42:41 2021] RSP: 0018:ffffab7e0a49fd08 EFLAGS: 00010246
[Mon Jan 11 14:42:41 2021] RAX: 000fffffc0000028 RBX: fffffbb341e43280 RCX: 0000000000000000
[Mon Jan 11 14:42:41 2021] RDX: 0000000000000000 RSI: 0000000000000041 RDI: fffffbb341e43280
[Mon Jan 11 14:42:41 2021] RBP: ffffab7e0a49fd08 R08: 0000000008f99d01 R09: 0000000100400001
[Mon Jan 11 14:42:41 2021] R10: ffffab7e0a49fd20 R11: ffffffffa7a62900 R12: 0000000000000001
[Mon Jan 11 14:42:41 2021] R13: 0000000000000041 R14: ffff8b412fdb3060 R15: 0000000000000000
[Mon Jan 11 14:42:41 2021] FS: 00007ff3e25b9700(0000) GS:ffff8b429ed80000(0000) knlGS:0000000000000000
[Mon Jan 11 14:42:41 2021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mon Jan 11 14:42:41 2021] CR2: 00007ff4d2bb76d8 CR3: 00000002a4660004 CR4: 00000000000606e0
[Mon Jan 11 14:42:41 2021] Call Trace:
[Mon Jan 11 14:42:41 2021] set_page_dirty+0x5b/0xb0
[Mon Jan 11 14:42:41 2021] qp_release_pages+0x6b/0xb0 [vmw_vmci]
[Mon Jan 11 14:42:41 2021] qp_host_unregister_user_memory.isra.21+0x27/0x80 [vmw_vmci]
[Mon Jan 11 14:42:41 2021] vmci_qp_broker_detach+0x243/0x3f0 [vmw_vmci]
[Mon Jan 11 14:42:41 2021] vmci_host_unlocked_ioctl+0x490/0xad0 [vmw_vmci]
[Mon Jan 11 14:42:41 2021] ? __switch_to_asm+0x41/0x70
[Mon Jan 11 14:42:41 2021] do_vfs_ioctl+0xa4/0x600
[Mon Jan 11 14:42:41 2021] ? handle_mm_fault+0xcc/0x1f0
[Mon Jan 11 14:42:41 2021] ? __do_page_fault+0x2ae/0x4d0
[Mon Jan 11 14:42:41 2021] ? SyS_futex+0x7f/0x180
[Mon Jan 11 14:42:41 2021] SyS_ioctl+0x79/0x90
[Mon Jan 11 14:42:41 2021] do_syscall_64+0x73/0x130
[Mon Jan 11 14:42:41 2021] entry_SYSCALL_64_after_hwframe+0x41/0xa6
[Mon Jan 11 14:42:41 2021] RIP: 0033:0x7ff602818007
[Mon Jan 11 14:42:41 2021] RSP: 002b:00007ff3e25b0b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Mon Jan 11 14:42:41 2021] RAX: ffffffffffffffda RBX: 00007ff3e25b0b30 RCX: 00007ff602818007
[Mon Jan 11 14:42:41 2021] RDX: 00007ff3e25b0b30 RSI: 00000000000007aa RDI: 0000000000000064
[Mon Jan 11 14:42:41 2021] RBP: 0000559dbdb5b4f0 R08: 0000000000000000 R09: 00007ff3e25b0c60
[Mon Jan 11 14:42:41 2021] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000020
[Mon Jan 11 14:42:41 2021] R13: 00000000fffffffe R14: 0000000000000020 R15: 00007ff604904000
[Mon Jan 11 14:42:41 2021] Code: 48 8d 42 ff 83 e2 01 48 0f 44 c7 48 8b 00 a8 10 74 0f 48 8b 07 f6 c4 10 74 11 e8 de ae f8 ff 5d c3 0f 0b 48 8b 07 f6 c4 10 75 ef <0f> 0b e8 cb ae f8 ff 5d c3 66 0f 1f 84 00 00 00 00 00 66 66 66
[Mon Jan 11 14:42:41 2021] ---[ end trace d18ed4deb9afa17e ]---
[Mon Jan 11 14:42:54 2021] ------------[ cut here ]------------
[Mon Jan 11 14:42:54 2021] kernel BUG at /build/linux-hwe-vJCqK6/linux-hwe-4.15.0/fs/ext4/inode.c:2697!
[Mon Jan 11 14:42:54 2021] invalid opcode: 0000 [#1] SMP PTI
[Mon Jan 11 14:42:54 2021] Modules linked in: bluetooth ecdh_generic uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media joydev snd_usb_audio snd_usbmidi_lib vmnet(OE) vmw_vsock_vmci_transport vsock vmw_vmci vmmon(OE) binfmt_misc input_leds intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd_hda_codec_realtek snd_hda_codec_generic aesni_intel snd_hda_codec_hdmi aes_x86_64 crypto_simd glue_helper cryptd snd_hda_intel intel_cstate snd_hda_codec intel_rapl_perf snd_hda_core snd_hwdep serio_raw snd_pcm eeepc_wmi asus_wmi sparse_keymap wmi_bmof snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer lpc_ich snd soundcore mei_me mei shpchp mac_hid nct6775 hwmon_vid coretemp parport_pc
[Mon Jan 11 14:42:54 2021] ppdev lp parport autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear dm_mirror dm_region_hash dm_log raid1 amdkfd amd_iommu_v2 hid_generic amdgpu chash i2c_algo_bit psmouse ttm usbhid drm_kms_helper hid syscopyarea firewire_ohci pata_acpi sysfillrect sysimgblt r8169 pata_marvell fb_sys_fops mii ahci drm firewire_core libahci crc_itu_t wmi video
[Mon Jan 11 14:42:54 2021] CPU: 2 PID: 3867 Comm: kworker/u8:0 Tainted: G W OE 4.15.0-126-generic #129~16.04.1-Ubuntu
[Mon Jan 11 14:42:54 2021] Hardware name: System manufacturer System Product Name/P8H67-M EVO, BIOS 3703 05/03/2013
[Mon Jan 11 14:42:54 2021] Workqueue: writeback wb_workfn (flush-8:32)
[Mon Jan 11 14:42:54 2021] RIP: 0010:mpage_prepare_extent_to_map+0x2be/0x2f0
[Mon Jan 11 14:42:54 2021] RSP: 0018:ffffab7e096bf960 EFLAGS: 00010246
[Mon Jan 11 14:42:54 2021] RAX: 000fffffc0000039 RBX: ffffab7e096bf990 RCX: 000000000011b9a2
[Mon Jan 11 14:42:54 2021] RDX: ffffab7e096bfc38 RSI: 0000000000000000 RDI: fffffbb341f23880
[Mon Jan 11 14:42:54 2021] RBP: ffffab7e096bfa30 R08: 0000000000000000 R09: 0000000000000000
[Mon Jan 11 14:42:54 2021] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000002970
[Mon Jan 11 14:42:54 2021] R13: ffffab7e096bf9e8 R14: fffffbb341f238c0 R15: ffffab7e096bfad0
[Mon Jan 11 14:42:54 2021] FS: 0000000000000000(0000) GS:ffff8b429ed00000(0000) knlGS:0000000000000000
[Mon Jan 11 14:42:54 2021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mon Jan 11 14:42:54 2021] CR2: 00003f0d00225000 CR3: 00000003fe00a001 CR4: 00000000000606e0
[Mon Jan 11 14:42:54 2021] Call Trace:
[Mon Jan 11 14:42:54 2021] ext4_writepages+0x55b/0xe50
[Mon Jan 11 14:42:54 2021] ? __switch_to_asm+0x41/0x70
[Mon Jan 11 14:42:54 2021] ? __switch_to_asm+0x35/0x70
[Mon Jan 11 14:42:54 2021] ? __switch_to_asm+0x41/0x70
[Mon Jan 11 14:42:54 2021] ? __switch_to_asm+0x35/0x70
[Mon Jan 11 14:42:54 2021] ? __switch_to_asm+0x41/0x70
[Mon Jan 11 14:42:54 2021] do_writepages+0x1f/0x70
[Mon Jan 11 14:42:54 2021] ? do_writepages+0x1f/0x70
[Mon Jan 11 14:42:54 2021] __writeback_single_inode+0x45/0x350
[Mon Jan 11 14:42:54 2021] writeback_sb_inodes+0x275/0x5d0
[Mon Jan 11 14:42:54 2021] __writeback_inodes_wb+0x92/0xc0
[Mon Jan 11 14:42:54 2021] wb_writeback+0x26f/0x330
[Mon Jan 11 14:42:54 2021] wb_workfn+0x35d/0x450
[Mon Jan 11 14:42:54 2021] ? wb_workfn+0x35d/0x450
[Mon Jan 11 14:42:54 2021] process_one_work+0x14d/0x410
[Mon Jan 11 14:42:54 2021] worker_thread+0x4b/0x460
[Mon Jan 11 14:42:54 2021] kthread+0x105/0x140
[Mon Jan 11 14:42:54 2021] ? process_one_work+0x410/0x410
[Mon Jan 11 14:42:54 2021] ? kthread_bind+0x40/0x40
[Mon Jan 11 14:42:54 2021] ret_from_fork+0x35/0x40
[Mon Jan 11 14:42:54 2021] Code: 59 cc 68 00 48 8b 85 30 ff ff ff 48 3b 85 48 ff ff ff 0f 83 08 fe ff ff 31 c0 eb ab 4c 89 f7 e8 d9 14 ea ff e9 8d fe ff ff 0f 0b <0f> 0b 48 8d bd 50 ff ff ff 89 85 40 ff ff ff e8 fe 78 eb ff 8b
[Mon Jan 11 14:42:54 2021] RIP: mpage_prepare_extent_to_map+0x2be/0x2f0 RSP: ffffab7e096bf960
[Mon Jan 11 14:42:54 2021] ---[ end trace d18ed4deb9afa17f ]---
[Mon Jan 11 14:42:54 2021] ------------[ cut here ]------------
As this happened during an ext4 write operation, it showed yet another proof that something is wrong with either this filesystem or the physical drive behind it.
After waiting a long time (at least 45 minutes) and hoping that the VM would eventually shut down and make the host usable again, I finally decided to kill the VMware Player process.
ckadm@mintp ~ $ kill 32730
After a couple of minutes, the process was still there, so I tried to SEGKILL it:
ckadm@mintp ~ $ kill -9 32730
But even this didn't work. The processes were still running, which could also be seen with lsof on the SSD path:
ckadm@mintp ~ $ sudo lsof /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
thnuclnt 321 ckadm cwd DIR 8,33 4096 3670017 /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10
thnuclnt 323 ckadm cwd DIR 8,33 4096 3670017 /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10
thnuclnt 324 ckadm cwd DIR 8,33 4096 3670017 /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10
thnuclnt 325 ckadm cwd DIR 8,33 4096 3670017 /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10
thnuclnt 340 ckadm cwd DIR 8,33 4096 3670017 /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10
bash 9370 ckadm cwd DIR 8,33 4096 3670017 /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10
vmplayer 9393 ckadm 27r REG 8,33 3385 3670025 /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10/Windows 10 x64.vmx
ls 10021 ckadm cwd DIR 8,33 4096 3670017 /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10
ls 10021 ckadm 3r DIR 8,33 4096 3670017 /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10
vmware-vm 32730 ckadm cwd DIR 8,33 4096 3670017 /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10
vmware-vm 32730 ckadm mem REG 8,33 8589934592 3670030 /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10/564d7a86-7e31-e160-210f-c999a05cd210.vmem
vmware-vm 32730 ckadm 14u REG 8,33 215171 3670026 /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10/vmware.log
vmware-vm 32730 ckadm 80u REG 8,33 8589934592 3670030 /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10/564d7a86-7e31-e160-210f-c999a05cd210.vmem
vmware-vm 32730 ckadm 82u REG 8,33 28983820288 3670019 /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10/Windows 10 x64.vmdk
vmware-vm 32730 ckadm 85u REG 8,33 270840 3670028 /media/ckadm/414bba3a-e9e2-4d70-a86b-e307428a56aa/Windows10/Windows 10 x64.nvram
All these PIDs were killed one after another:
ckadm@mintp ~ $ kill 321 323 324 325 340 9293 10021 32730
But the (main) process remained:
ckadm 32730 68.2 18.6 10716464 3060500 ? Dsl 13:46 69:44 /usr/lib/vmware/bin/vmware-vmx -ssnapshot.numRollingTiers=0 -s vmx.stdio.keep=TRUE -# product=4;name=VMware Pla
ckadm 321 0.0 0.0 0 0 ? Z 13:46 0:00 \_ [thnuclnt] <defunct>
At this time I realized the problem: The main VMware Player process (PID 32730) entered the uninterruptable sleep status, which can be seen on the major "D" in the process sate. And this is the only process state which cannot be killed. Actually whenever a process needs to be killed, the Kernel queues the kill task and will execute it, when ready. When ready means in this case when the process is NOT in an uninterruptable sleep. But other sites describe it better.
Geeker's Digest describes the problem with this process state in the article When you hit the end of the kill road: The uninterruptable sleep state as:
The kill command is received by the Kernel, scheduling the kill task on the relevant process(es). This scheduling happens within microseconds, but it still happens. The execution of this scheduled kill task only happens, when the process is ready to receive a command. This is the case in any process state with one exception: The uninterruptable sleep state.
Ciro S. Costa describes on his page why you just can't kill a Linux process that is frozen:
However, when a process is frozen (e.g., because it got to the FROZEN state via the freezer cgroup, or a hibernation triggered by a device), the task switches to the UNINTERRUPTIBLE state, where it’ll not get the opportunity to be ever scheduled to run until it gets switched to another state, meaning that it’ll not be killable until so.
The SUSE knowledge base mentions a sysrq-trigger to obtain more information about processes in this uninterruptable sleep status:
echo w > /proc/sysrq-trigger
This command produces a report and a list of all processes in D state and a full kernel stack trace to /var/log/messages. This shows much more information than the first option described above.
As the machine was still unuasable, there was nothing to lose and the command was executed:
mintp ~ # echo w > /proc/sysrq-trigger
mintp ~ # cat /var/log/syslog
[...]
Jan 11 15:53:43 mintp kernel: [30854.491760] sysrq: Show Blocked State
Jan 11 15:53:43 mintp kernel: [30854.491765] task PC stack pid father
Jan 11 15:53:43 mintp kernel: [30854.492001] jbd2/sdc1-8 D 0 31744 2 0x80000000
Jan 11 15:53:43 mintp kernel: [30854.492005] Call Trace:
Jan 11 15:53:43 mintp kernel: [30854.492014] __schedule+0x3d6/0x8b0
Jan 11 15:53:43 mintp kernel: [30854.492018] schedule+0x36/0x80
Jan 11 15:53:43 mintp kernel: [30854.492023] jbd2_journal_commit_transaction+0x26e/0x1870
Jan 11 15:53:43 mintp kernel: [30854.492027] ? __switch_to_asm+0x41/0x70
Jan 11 15:53:43 mintp kernel: [30854.492030] ? __switch_to_asm+0x35/0x70
Jan 11 15:53:43 mintp kernel: [30854.492033] ? __switch_to_asm+0x41/0x70
Jan 11 15:53:43 mintp kernel: [30854.492036] ? __switch_to_asm+0x35/0x70
Jan 11 15:53:43 mintp kernel: [30854.492039] ? __switch_to_asm+0x41/0x70
Jan 11 15:53:43 mintp kernel: [30854.492042] ? __switch_to_asm+0x35/0x70
Jan 11 15:53:43 mintp kernel: [30854.492045] ? __switch_to_asm+0x35/0x70
Jan 11 15:53:43 mintp kernel: [30854.492048] ? __switch_to_asm+0x41/0x70
Jan 11 15:53:43 mintp kernel: [30854.492051] ? __switch_to_asm+0x35/0x70
Jan 11 15:53:43 mintp kernel: [30854.492054] ? __switch_to_asm+0x35/0x70
Jan 11 15:53:43 mintp kernel: [30854.492059] ? wait_woken+0x80/0x80
Jan 11 15:53:43 mintp kernel: [30854.492062] ? __switch_to_asm+0x35/0x70
Jan 11 15:53:43 mintp kernel: [30854.492066] ? lock_timer_base+0x75/0xa0
Jan 11 15:53:43 mintp kernel: [30854.492070] kjournald2+0xc8/0x250
Jan 11 15:53:43 mintp kernel: [30854.492073] ? kjournald2+0xc8/0x250
Jan 11 15:53:43 mintp kernel: [30854.492076] ? wait_woken+0x80/0x80
Jan 11 15:53:43 mintp kernel: [30854.492080] kthread+0x105/0x140
Jan 11 15:53:43 mintp kernel: [30854.492083] ? commit_timeout+0x20/0x20
Jan 11 15:53:43 mintp kernel: [30854.492086] ? kthread_bind+0x40/0x40
Jan 11 15:53:43 mintp kernel: [30854.492089] ret_from_fork+0x35/0x40
Jan 11 15:53:43 mintp kernel: [30854.492092] vmware-vmx D 0 32730 1 0x00000004
Jan 11 15:53:43 mintp kernel: [30854.492095] Call Trace:
Jan 11 15:53:43 mintp kernel: [30854.492098] __schedule+0x3d6/0x8b0
Jan 11 15:53:43 mintp kernel: [30854.492102] ? remove_wait_queue+0x4d/0x60
Jan 11 15:53:43 mintp kernel: [30854.492105] schedule+0x36/0x80
Jan 11 15:53:43 mintp kernel: [30854.492107] wait_transaction_locked+0x8a/0xd0
Jan 11 15:53:43 mintp kernel: [30854.492111] ? wait_woken+0x80/0x80
Jan 11 15:53:43 mintp kernel: [30854.492114] add_transaction_credits+0x1cd/0x2b0
Jan 11 15:53:43 mintp kernel: [30854.492117] start_this_handle+0x103/0x410
Jan 11 15:53:43 mintp kernel: [30854.492120] ? _cond_resched+0x1a/0x50
Jan 11 15:53:43 mintp kernel: [30854.492125] ? kmem_cache_alloc+0x1ad/0x1c0
Jan 11 15:53:43 mintp kernel: [30854.492128] jbd2__journal_start+0xdb/0x1f0
Jan 11 15:53:43 mintp kernel: [30854.492132] ? ext4_dirty_inode+0x32/0x70
Jan 11 15:53:43 mintp kernel: [30854.492135] __ext4_journal_start_sb+0x6d/0x120
Jan 11 15:53:43 mintp kernel: [30854.492138] ext4_dirty_inode+0x32/0x70
Jan 11 15:53:43 mintp kernel: [30854.492143] __mark_inode_dirty+0x187/0x3b0
Jan 11 15:53:43 mintp kernel: [30854.492147] generic_update_time+0x7b/0xd0
Jan 11 15:53:43 mintp kernel: [30854.492150] ? current_time+0x32/0x70
Jan 11 15:53:43 mintp kernel: [30854.492154] file_update_time+0xbe/0x110
Jan 11 15:53:43 mintp kernel: [30854.492158] __generic_file_write_iter+0x9d/0x1f0
Jan 11 15:53:43 mintp kernel: [30854.492162] ? recalc_sigpending+0x1b/0x50
Jan 11 15:53:43 mintp kernel: [30854.492165] ext4_file_write_iter+0xc4/0x3b0
Jan 11 15:53:43 mintp kernel: [30854.492168] ? signal_setup_done+0x6b/0xb0
Jan 11 15:53:43 mintp kernel: [30854.492172] new_sync_write+0xe5/0x140
Jan 11 15:53:43 mintp kernel: [30854.492175] __vfs_write+0x29/0x40
Jan 11 15:53:43 mintp kernel: [30854.492178] vfs_write+0xb8/0x1b0
Jan 11 15:53:43 mintp kernel: [30854.492181] SyS_write+0x5c/0xe0
Jan 11 15:53:43 mintp kernel: [30854.492185] do_syscall_64+0x73/0x130
Jan 11 15:53:43 mintp kernel: [30854.492189] entry_SYSCALL_64_after_hwframe+0x41/0xa6
Jan 11 15:53:43 mintp kernel: [30854.492192] RIP: 0033:0x7ff602af54bd
Jan 11 15:53:43 mintp kernel: [30854.492194] RSP: 002b:00007ffed6242350 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Jan 11 15:53:43 mintp kernel: [30854.492197] RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 00007ff602af54bd
Jan 11 15:53:43 mintp kernel: [30854.492199] RDX: 0000000000000064 RSI: 0000559dbd7f9bb0 RDI: 000000000000000e
Jan 11 15:53:43 mintp kernel: [30854.492201] RBP: 0000559dbd7f9bb0 R08: 00007ffed62420b0 R09: 00007ffed6241fc8
Jan 11 15:53:43 mintp kernel: [30854.492202] R10: 00007ffed6242040 R11: 0000000000000246 R12: 0000559dbd7f1b60
Jan 11 15:53:43 mintp kernel: [30854.492204] R13: 0000000000000064 R14: 00007ffed62423c0 R15: 00007ffed62423c0
Jan 11 15:53:43 mintp kernel: [30854.492207] vmx-vthread-2 D 0 32750 1 0x00000004
Jan 11 15:53:43 mintp kernel: [30854.492210] Call Trace:
Jan 11 15:53:43 mintp kernel: [30854.492213] __schedule+0x3d6/0x8b0
Jan 11 15:53:43 mintp kernel: [30854.492216] ? __switch_to_asm+0x35/0x70
Jan 11 15:53:43 mintp kernel: [30854.492219] ? __switch_to_asm+0x41/0x70
Jan 11 15:53:43 mintp kernel: [30854.492222] schedule+0x36/0x80
Jan 11 15:53:43 mintp kernel: [30854.492225] rwsem_down_write_failed+0x1fc/0x390
Jan 11 15:53:43 mintp kernel: [30854.492228] ? __switch_to_asm+0x35/0x70
Jan 11 15:53:43 mintp kernel: [30854.492231] ? __schedule+0x3de/0x8b0
Jan 11 15:53:43 mintp kernel: [30854.492235] call_rwsem_down_write_failed+0x17/0x30
Jan 11 15:53:43 mintp kernel: [30854.492238] ? call_rwsem_down_write_failed+0x17/0x30
Jan 11 15:53:43 mintp kernel: [30854.492242] down_write+0x2d/0x40
Jan 11 15:53:43 mintp kernel: [30854.492244] ext4_file_write_iter+0x98/0x3b0
Jan 11 15:53:43 mintp kernel: [30854.492248] do_iter_readv_writev+0x111/0x180
Jan 11 15:53:43 mintp kernel: [30854.492251] do_iter_write+0x87/0x1a0
Jan 11 15:53:43 mintp kernel: [30854.492254] vfs_writev+0x98/0x110
Jan 11 15:53:43 mintp kernel: [30854.492257] ? do_futex+0x129/0x590
Jan 11 15:53:43 mintp kernel: [30854.492261] do_pwritev+0xb1/0xd0
Jan 11 15:53:43 mintp kernel: [30854.492263] ? do_pwritev+0xb1/0xd0
Jan 11 15:53:43 mintp kernel: [30854.492267] SyS_pwritev+0x11/0x20
Jan 11 15:53:43 mintp kernel: [30854.492269] do_syscall_64+0x73/0x130
Jan 11 15:53:43 mintp kernel: [30854.492273] entry_SYSCALL_64_after_hwframe+0x41/0xa6
Jan 11 15:53:43 mintp kernel: [30854.492275] RIP: 0033:0x7ff602818210
Jan 11 15:53:43 mintp kernel: [30854.492276] RSP: 002b:00007ff5ff42ab40 EFLAGS: 00000293 ORIG_RAX: 0000000000000128
Jan 11 15:53:43 mintp kernel: [30854.492279] RAX: ffffffffffffffda RBX: 0000000000000052 RCX: 00007ff602818210
Jan 11 15:53:43 mintp kernel: [30854.492280] RDX: 0000000000000001 RSI: 00007ff3bc095a20 RDI: 0000000000000052
Jan 11 15:53:43 mintp kernel: [30854.492282] RBP: 00007ff3bc095a20 R08: 0000000000000000 R09: 0000000000000000
Jan 11 15:53:43 mintp kernel: [30854.492284] R10: 00000000d48a3000 R11: 0000000000000293 R12: 0000000000000001
Jan 11 15:53:43 mintp kernel: [30854.492286] R13: 00000000d48a3000 R14: 00007ff3bc095a20 R15: 00000000d48a3000
Jan 11 15:53:43 mintp kernel: [30854.492288] vmx-vcpu-1 D 0 318 1 0x00000004
Jan 11 15:53:43 mintp kernel: [30854.492291] Call Trace:
Jan 11 15:53:43 mintp kernel: [30854.492294] __schedule+0x3d6/0x8b0
Jan 11 15:53:43 mintp kernel: [30854.492297] schedule+0x36/0x80
Jan 11 15:53:43 mintp kernel: [30854.492300] wait_transaction_locked+0x8a/0xd0
Jan 11 15:53:43 mintp kernel: [30854.492303] ? wait_woken+0x80/0x80
Jan 11 15:53:43 mintp kernel: [30854.492306] add_transaction_credits+0x1cd/0x2b0
Jan 11 15:53:43 mintp kernel: [30854.492309] ? enqueue_task_fair+0xcf/0x2f0
Jan 11 15:53:43 mintp kernel: [30854.492312] start_this_handle+0x103/0x410
Jan 11 15:53:43 mintp kernel: [30854.492315] ? _cond_resched+0x1a/0x50
Jan 11 15:53:43 mintp kernel: [30854.492318] ? kmem_cache_alloc+0x1ad/0x1c0
Jan 11 15:53:43 mintp kernel: [30854.492321] jbd2__journal_start+0xdb/0x1f0
Jan 11 15:53:43 mintp kernel: [30854.492324] ? ext4_dirty_inode+0x32/0x70
Jan 11 15:53:43 mintp kernel: [30854.492326] __ext4_journal_start_sb+0x6d/0x120
Jan 11 15:53:43 mintp kernel: [30854.492329] ext4_dirty_inode+0x32/0x70
Jan 11 15:53:43 mintp kernel: [30854.492332] __mark_inode_dirty+0x187/0x3b0
Jan 11 15:53:43 mintp kernel: [30854.492335] generic_update_time+0x7b/0xd0
Jan 11 15:53:43 mintp kernel: [30854.492339] ? current_time+0x32/0x70
Jan 11 15:53:43 mintp kernel: [30854.492342] file_update_time+0xbe/0x110
Jan 11 15:53:43 mintp kernel: [30854.492345] ext4_page_mkwrite+0x76/0x4e0
Jan 11 15:53:43 mintp kernel: [30854.492349] do_page_mkwrite+0x35/0x90
Jan 11 15:53:43 mintp kernel: [30854.492353] do_wp_page+0x280/0x470
Jan 11 15:53:43 mintp kernel: [30854.492357] __handle_mm_fault+0x85a/0xea0
Jan 11 15:53:43 mintp kernel: [30854.492361] handle_mm_fault+0xcc/0x1f0
Jan 11 15:53:43 mintp kernel: [30854.492365] __do_page_fault+0x291/0x4d0
Jan 11 15:53:43 mintp kernel: [30854.492368] do_page_fault+0x2e/0xf0
Jan 11 15:53:43 mintp kernel: [30854.492371] ? page_fault+0x2f/0x50
Jan 11 15:53:43 mintp kernel: [30854.492373] page_fault+0x45/0x50
Jan 11 15:53:43 mintp kernel: [30854.492375] RIP: 0033:0x559dbb4f0f6b
Jan 11 15:53:43 mintp kernel: [30854.492376] RSP: 002b:00007ff3e25b8bc0 EFLAGS: 00010206
Jan 11 15:53:43 mintp kernel: [30854.492379] RAX: 00007ff402f16000 RBX: 0000000000012f16 RCX: 00007ff3e25b8c38
Jan 11 15:53:43 mintp kernel: [30854.492380] RDX: 0000559dbd914920 RSI: 0000000000012f16 RDI: 00007ff3e25b8bd0
Jan 11 15:53:43 mintp kernel: [30854.492382] RBP: 00007ff5fd1cf8b0 R08: 0000000000000000 R09: 0000559dbc478d80
Jan 11 15:53:43 mintp kernel: [30854.492384] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff402f16000
Jan 11 15:53:43 mintp kernel: [30854.492385] R13: 00007ff3e25b8bd0 R14: 00000000001bb9c8 R15: 0000559dbc5b5078
Jan 11 15:53:43 mintp kernel: [30854.492388] vmx-vthread-17 D 0 350 1 0x00000004
Jan 11 15:53:43 mintp kernel: [30854.492390] Call Trace:
Jan 11 15:53:43 mintp kernel: [30854.492393] __schedule+0x3d6/0x8b0
Jan 11 15:53:43 mintp kernel: [30854.492398] ? __switch_to+0x128/0x4e0
Jan 11 15:53:43 mintp kernel: [30854.492400] schedule+0x36/0x80
Jan 11 15:53:43 mintp kernel: [30854.492403] wait_transaction_locked+0x8a/0xd0
Jan 11 15:53:43 mintp kernel: [30854.492406] ? wait_woken+0x80/0x80
Jan 11 15:53:43 mintp kernel: [30854.492409] add_transaction_credits+0x1cd/0x2b0
Jan 11 15:53:43 mintp kernel: [30854.492413] ? schedule_timeout+0x16a/0x390
Jan 11 15:53:43 mintp kernel: [30854.492416] start_this_handle+0x103/0x410
Jan 11 15:53:43 mintp kernel: [30854.492419] ? _cond_resched+0x1a/0x50
Jan 11 15:53:43 mintp kernel: [30854.492422] ? kmem_cache_alloc+0x1ad/0x1c0
Jan 11 15:53:43 mintp kernel: [30854.492425] jbd2__journal_start+0xdb/0x1f0
Jan 11 15:53:43 mintp kernel: [30854.492427] ? ext4_dirty_inode+0x32/0x70
Jan 11 15:53:43 mintp kernel: [30854.492430] __ext4_journal_start_sb+0x6d/0x120
Jan 11 15:53:43 mintp kernel: [30854.492432] ext4_dirty_inode+0x32/0x70
Jan 11 15:53:43 mintp kernel: [30854.492436] __mark_inode_dirty+0x187/0x3b0
Jan 11 15:53:43 mintp kernel: [30854.492439] ? __switch_to_asm+0x41/0x70
Jan 11 15:53:43 mintp kernel: [30854.492442] generic_update_time+0x7b/0xd0
Jan 11 15:53:43 mintp kernel: [30854.492445] ? current_time+0x32/0x70
Jan 11 15:53:43 mintp kernel: [30854.492449] file_update_time+0xbe/0x110
Jan 11 15:53:43 mintp kernel: [30854.492452] __generic_file_write_iter+0x9d/0x1f0
Jan 11 15:53:43 mintp kernel: [30854.492454] ? _cond_resched+0x1a/0x50
Jan 11 15:53:43 mintp kernel: [30854.492457] ext4_file_write_iter+0xc4/0x3b0
Jan 11 15:53:43 mintp kernel: [30854.492460] do_iter_readv_writev+0x111/0x180
Jan 11 15:53:43 mintp kernel: [30854.492463] do_iter_write+0x87/0x1a0
Jan 11 15:53:43 mintp kernel: [30854.492466] vfs_writev+0x98/0x110
Jan 11 15:53:43 mintp kernel: [30854.492469] ? do_futex+0x129/0x590
Jan 11 15:53:43 mintp kernel: [30854.492473] do_pwritev+0xb1/0xd0
Jan 11 15:53:43 mintp kernel: [30854.492476] ? do_pwritev+0xb1/0xd0
Jan 11 15:53:43 mintp kernel: [30854.492479] SyS_pwritev+0x11/0x20
Jan 11 15:53:43 mintp kernel: [30854.492482] do_syscall_64+0x73/0x130
Jan 11 15:53:43 mintp kernel: [30854.492485] entry_SYSCALL_64_after_hwframe+0x41/0xa6
Jan 11 15:53:43 mintp kernel: [30854.492487] RIP: 0033:0x7ff602818210
Jan 11 15:53:43 mintp kernel: [30854.492488] RSP: 002b:00007ff3cb7fcb40 EFLAGS: 00000293 ORIG_RAX: 0000000000000128
Jan 11 15:53:43 mintp kernel: [30854.492490] RAX: ffffffffffffffda RBX: 0000000000000052 RCX: 00007ff602818210
Jan 11 15:53:43 mintp kernel: [30854.492492] RDX: 0000000000000001 RSI: 00007ff3c4011060 RDI: 0000000000000052
Jan 11 15:53:43 mintp kernel: [30854.492494] RBP: 00007ff3c4011060 R08: 0000000000000000 R09: 0000000000000000
Jan 11 15:53:43 mintp kernel: [30854.492496] R10: 0000000000d0d000 R11: 0000000000000293 R12: 0000000000000001
Jan 11 15:53:43 mintp kernel: [30854.492497] R13: 0000000000d0d000 R14: 00007ff3c4011060 R15: 0000000000d0d000
Jan 11 15:53:43 mintp kernel: [30854.492518] vmplayer D 0 9393 1 0x00000004
Jan 11 15:53:43 mintp kernel: [30854.492520] Call Trace:
Jan 11 15:53:43 mintp kernel: [30854.492523] __schedule+0x3d6/0x8b0
Jan 11 15:53:43 mintp kernel: [30854.492526] schedule+0x36/0x80
Jan 11 15:53:43 mintp kernel: [30854.492529] wait_transaction_locked+0x8a/0xd0
Jan 11 15:53:43 mintp kernel: [30854.492532] ? wait_woken+0x80/0x80
Jan 11 15:53:43 mintp kernel: [30854.492535] add_transaction_credits+0x1cd/0x2b0
Jan 11 15:53:43 mintp kernel: [30854.492538] start_this_handle+0x103/0x410
Jan 11 15:53:43 mintp kernel: [30854.492542] ? dput+0x34/0x210
Jan 11 15:53:43 mintp kernel: [30854.492544] ? _cond_resched+0x1a/0x50
Jan 11 15:53:43 mintp kernel: [30854.492547] ? kmem_cache_alloc+0x1ad/0x1c0
Jan 11 15:53:43 mintp kernel: [30854.492550] jbd2__journal_start+0xdb/0x1f0
Jan 11 15:53:43 mintp kernel: [30854.492553] ? ext4_dirty_inode+0x32/0x70
Jan 11 15:53:43 mintp kernel: [30854.492556] __ext4_journal_start_sb+0x6d/0x120
Jan 11 15:53:43 mintp kernel: [30854.492558] ext4_dirty_inode+0x32/0x70
Jan 11 15:53:43 mintp kernel: [30854.492561] __mark_inode_dirty+0x187/0x3b0
Jan 11 15:53:43 mintp kernel: [30854.492565] generic_update_time+0x7b/0xd0
Jan 11 15:53:43 mintp kernel: [30854.492568] ? current_time+0x32/0x70
Jan 11 15:53:43 mintp kernel: [30854.492570] touch_atime+0xb2/0xe0
Jan 11 15:53:43 mintp kernel: [30854.492573] generic_file_read_iter+0x950/0xbb0
Jan 11 15:53:43 mintp kernel: [30854.492576] ? do_filp_open+0xad/0x110
Jan 11 15:53:43 mintp kernel: [30854.492580] ? _copy_to_user+0x2a/0x40
Jan 11 15:53:43 mintp kernel: [30854.492583] ext4_file_read_iter+0x56/0xf0
Jan 11 15:53:43 mintp kernel: [30854.492586] new_sync_read+0xe2/0x130
Jan 11 15:53:43 mintp kernel: [30854.492590] __vfs_read+0x29/0x40
Jan 11 15:53:43 mintp kernel: [30854.492592] vfs_read+0x93/0x130
Jan 11 15:53:43 mintp kernel: [30854.492595] SyS_read+0x5c/0xe0
Jan 11 15:53:43 mintp kernel: [30854.492598] do_syscall_64+0x73/0x130
Jan 11 15:53:43 mintp kernel: [30854.492602] entry_SYSCALL_64_after_hwframe+0x41/0xa6
Jan 11 15:53:43 mintp kernel: [30854.492603] RIP: 0033:0x7fce46fb833d
Jan 11 15:53:43 mintp kernel: [30854.492605] RSP: 002b:00007ffe51ae73a0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
Jan 11 15:53:43 mintp kernel: [30854.492607] RAX: ffffffffffffffda RBX: 0000561882a5e4c0 RCX: 00007fce46fb833d
Jan 11 15:53:43 mintp kernel: [30854.492609] RDX: 0000000000001000 RSI: 0000561882a6b050 RDI: 000000000000001b
Jan 11 15:53:43 mintp kernel: [30854.492611] RBP: 0000000000000000 R08: 00007fce47285c78 R09: 00007ffe51ae7430
Jan 11 15:53:43 mintp kernel: [30854.492612] R10: 00007fce47285b78 R11: 0000000000000293 R12: 0000000000000003
Jan 11 15:53:43 mintp kernel: [30854.492614] R13: 0000000000000003 R14: 00007ffe51ae7430 R15: 00000000ffffffff
Jan 11 15:53:43 mintp kernel: [30854.492617] ls D 0 10021 1 0x00000004
Jan 11 15:53:43 mintp kernel: [30854.492619] Call Trace:
Jan 11 15:53:43 mintp kernel: [30854.492622] __schedule+0x3d6/0x8b0
Jan 11 15:53:43 mintp kernel: [30854.492625] schedule+0x36/0x80
Jan 11 15:53:43 mintp kernel: [30854.492628] wait_transaction_locked+0x8a/0xd0
Jan 11 15:53:43 mintp kernel: [30854.492631] ? wait_woken+0x80/0x80
Jan 11 15:53:43 mintp kernel: [30854.492634] add_transaction_credits+0x1cd/0x2b0
Jan 11 15:53:43 mintp kernel: [30854.492639] ? fscrypt_fname_free_buffer+0x1a/0x30
Jan 11 15:53:43 mintp kernel: [30854.492642] start_this_handle+0x103/0x410
Jan 11 15:53:43 mintp kernel: [30854.492645] ? _cond_resched+0x1a/0x50
Jan 11 15:53:43 mintp kernel: [30854.492648] ? kmem_cache_alloc+0x1ad/0x1c0
Jan 11 15:53:43 mintp kernel: [30854.492651] jbd2__journal_start+0xdb/0x1f0
Jan 11 15:53:43 mintp kernel: [30854.492653] ? ext4_dirty_inode+0x32/0x70
Jan 11 15:53:43 mintp kernel: [30854.492656] __ext4_journal_start_sb+0x6d/0x120
Jan 11 15:53:43 mintp kernel: [30854.492658] ext4_dirty_inode+0x32/0x70
Jan 11 15:53:43 mintp kernel: [30854.492662] __mark_inode_dirty+0x187/0x3b0
Jan 11 15:53:43 mintp kernel: [30854.492665] generic_update_time+0x7b/0xd0
Jan 11 15:53:43 mintp kernel: [30854.492668] ? current_time+0x32/0x70
Jan 11 15:53:43 mintp kernel: [30854.492671] touch_atime+0xb2/0xe0
Jan 11 15:53:43 mintp kernel: [30854.492673] iterate_dir+0x12d/0x1a0
Jan 11 15:53:43 mintp kernel: [30854.492676] SyS_getdents+0x9f/0x120
Jan 11 15:53:43 mintp kernel: [30854.492679] ? compat_filldir+0x150/0x150
Jan 11 15:53:43 mintp kernel: [30854.492682] do_syscall_64+0x73/0x130
Jan 11 15:53:43 mintp kernel: [30854.492685] ? do_syscall_64+0x73/0x130
Jan 11 15:53:43 mintp kernel: [30854.492688] entry_SYSCALL_64_after_hwframe+0x41/0xa6
Jan 11 15:53:43 mintp kernel: [30854.492690] RIP: 0033:0x7f90888ef96b
Jan 11 15:53:43 mintp kernel: [30854.492691] RSP: 002b:00007ffd27fdbed0 EFLAGS: 00000202 ORIG_RAX: 000000000000004e
Jan 11 15:53:43 mintp kernel: [30854.492693] RAX: ffffffffffffffda RBX: 0000000001946ab0 RCX: 00007f90888ef96b
Jan 11 15:53:43 mintp kernel: [30854.492695] RDX: 0000000000008000 RSI: 0000000001946ab0 RDI: 0000000000000003
Jan 11 15:53:43 mintp kernel: [30854.492697] RBP: 0000000001946ab0 R08: 0000000000000000 R09: 0000000000000000
Jan 11 15:53:43 mintp kernel: [30854.492699] R10: 000000000000008f R11: 0000000000000202 R12: fffffffffffffeb8
Jan 11 15:53:43 mintp kernel: [30854.492700] R13: 0000000000000000 R14: 000000000193eff0 R15: 0000000000404ad0
Interestingly this output shows more information on the physical device being in that uninterruptable sleep state: /dev/sdc:
Jan 11 15:53:43 mintp kernel: [30854.492001] jbd2/sdc1-8 D 0 31744 2 0x80000000
And yes, you guessed it, sdc is the SSD for the virtual machines.
Sadly there was nothing else to do anymore - except launching a reboot of the physical machine.
If you have a process locking up your machine, it is in an uninterruptable sleep state and this is caused by a device (such as a physical hard or solid state drive), the only way is to physically reboot or even reset your physical machine.
Yes, I'd have wished to write another conclusion or a fix for this - but nope, unfortunately not.
Jose from wrote on Apr 16th, 2021:
Hi,
I had the same issue (while suspending a VMware Windows 10 machine) on Ubuntu 20.04.
I believe it could be a bug of the kernel
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder