System freeze

Bug #1677491 reported by Walter Garcia-Fontes
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

I'm getting a system freeze, consisting of all applications slowing down and stopping to respond. This starts happening after some hours of normal functioning of the system. A reboot recovers the system until the next freeze. It doesn't seem that any application is exhausting the memory. Here is an excerpt of my syslog, there are some messages about the kernel and that's why I file this bug against the kernel:

Mar 30 06:48:25 walter kernel: [67016.893820] RIP: 0033:0x7f70ba6dc246
Mar 30 06:48:25 walter kernel: [67016.893821] RSP: 002b:00007f70a92e8d70 EFLAGS: 00010206
Mar 30 06:48:25 walter kernel: [67016.893822] RAX: 00007f709cafffe8 RBX: 00007f709ca00c20 RCX: 0000000000018de8
Mar 30 06:48:25 walter kernel: [67016.893823] RDX: 0000000000000637 RSI: 0000000000000000 RDI: 00007f70a92e8da0
Mar 30 06:48:25 walter kernel: [67016.893824] RBP: 00007f709ca00000 R08: 0000000000000000 R09: 0000000000442c30
Mar 30 06:48:25 walter kernel: [67016.893825] R10: 002cbc87c178d3be R11: 00007f70a92e8df0 R12: 00007f70a92e8da0
Mar 30 06:48:25 walter kernel: [67016.893826] R13: 0000000000000001 R14: 00007f70a3561c80 R15: 00007f70b3890000
Mar 30 06:48:25 walter kernel: [67016.893827] Code: c0 74 e6 4d 85 c9 c6 07 01 74 30 41 c7 41 08 01 00 00 00 e9 52 ff ff ff 83 fa 01 0f 84 b0 fe ff ff 8b 07 84 c0 74 08 f3 90 8b 07 <84> c0 75 f8 b8 01 00 00 00 66 89 07 5d c3 f3 90 4c 8b 09 4d 85
Mar 30 06:48:35 walter fetchmail[3089]: getaddrinfo("imap.gmail.com","imaps") error: Name or service not known
Mar 30 06:48:35 walter fetchmail[3089]: IMAP connection to imap.gmail.com failed: Resource temporarily unavailable
Mar 30 06:48:35 walter fetchmail[3089]: Query status=2 (SOCKET)
Mar 30 06:48:53 walter kernel: [67044.895723] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [JS Helper:19669]
Mar 30 06:48:53 walter kernel: [67044.895724] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c pci_stub xt_tcpudp bridge stp llc iptable_filter vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) snd_hrtimer usblp binfmt_misc nls_iso8859_1 snd_usb_audio snd_usbmidi_lib intel_rapl sb_edac edac_core gspca_zc3xx gspca_main v4l2_common videodev media snd_hda_codec_realtek input_leds x86_pkg_temp_thermal snd_hda_codec
_hdmi snd_hda_codec_generic intel_powerclamp snd_hda_intel coretemp snd_hda_codec snd_hda_core snd_hwdep kvm_intel snd_pcm kvm irqbypass snd_seq_midi snd_seq_midi_event crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel snd_rawmidi snd_seq aes_x86_64 crypto_simd glue_helper dcdbas snd_seq_device
Mar 30 06:48:53 walter kernel: [67044.895757] snd_timer cryptd snd intel_cstate soundcore dell_smm_hwmon intel_rapl_perf shpchp mei_me mei lpc_ich mac_hid parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs xor raid6_pq amdgpu hid_generic usbhid hid ums_cypress uas usb_storage amdkfd amd_iommu_v2 radeon i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect e1000e sysimgblt fb_sys_fops drm ptp ahci pps_core pata_acpi libahci wmi fjes
Mar 30 06:48:53 walter kernel: [67044.895775] CPU: 3 PID: 19669 Comm: JS Helper Tainted: G D OEL 4.10.0-14-generic #16-Ubuntu
Mar 30 06:48:53 walter kernel: [67044.895776] Hardware name: Dell Inc. Precision Tower 7810/0GWHMW, BIOS A07 04/14/2015
Mar 30 06:48:53 walter kernel: [67044.895777] task: ffff9fcab54ac380 task.stack: ffffbaf56602c000
Mar 30 06:48:53 walter kernel: [67044.895780] RIP: 0010:native_queued_spin_lock_slowpath+0x17b/0x1a0
Mar 30 06:48:53 walter kernel: [67044.895781] RSP: 0000:ffffbaf56602fd48 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
Mar 30 06:48:53 walter kernel: [67044.895783] RAX: 0000000000000101 RBX: ffffeb799edbe070 RCX: 0000000000000001
Mar 30 06:48:53 walter kernel: [67044.895784] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffffeb799edbe070
Mar 30 06:48:53 walter kernel: [67044.895784] RBP: ffffbaf56602fd48 R08: 0000000000000101 R09: ffff9fcab80d5540
Mar 30 06:48:53 walter kernel: [67044.895785] R10: 002cbc87c178d3be R11: 00007f70a92e8df0 R12: ffff9fca76f817f8
Mar 30 06:48:53 walter kernel: [67044.895786] R13: 3e000000004144ff R14: ffffbaf56602fe30 R15: ffff9fc898e33900
Mar 30 06:48:53 walter kernel: [67044.895788] FS: 00007f70a92e9700(0000) GS:ffff9fcaff2c0000(0000) knlGS:0000000000000000
Mar 30 06:48:53 walter kernel: [67044.895789] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 30 06:48:53 walter kernel: [67044.895790] CR2: 00007f709cafffe8 CR3: 00000007bc274000 CR4: 00000000001406e0
Mar 30 06:48:53 walter kernel: [67044.895791] Call Trace:
Mar 30 06:48:53 walter kernel: [67044.895794] _raw_spin_lock+0x20/0x30
Mar 30 06:48:53 walter kernel: [67044.895797] __migration_entry_wait+0x1c/0x180
Mar 30 06:48:53 walter kernel: [67044.895799] migration_entry_wait+0x74/0x80
Mar 30 06:48:53 walter kernel: [67044.895802] do_swap_page+0x5b3/0x770
Mar 30 06:48:53 walter kernel: [67044.895804] handle_mm_fault+0x873/0x1360
Mar 30 06:48:53 walter kernel: [67044.895807] __do_page_fault+0x23e/0x4e0
Mar 30 06:48:53 walter kernel: [67044.895808] do_page_fault+0x22/0x30
Mar 30 06:48:53 walter kernel: [67044.895811] page_fault+0x28/0x30
Mar 30 06:48:53 walter kernel: [67044.895812] RIP: 0033:0x7f70ba6dc246

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: linux-image-4.10.0-14-generic 4.10.0-14.16
ProcVersionSignature: Ubuntu 4.10.0-14.16-generic 4.10.3
Uname: Linux 4.10.0-14-generic x86_64
ApportVersion: 2.20.4-0ubuntu2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC2: wgarcia 4042 F.... pulseaudio
 /dev/snd/controlC0: wgarcia 4042 F.... pulseaudio
 /dev/snd/controlC1: wgarcia 4042 F.... pulseaudio
CurrentDesktop: Unity:Unity7
Date: Thu Mar 30 09:03:48 2017
HibernationDevice: RESUME=UUID=8841b8fe-8f2d-4897-8a67-f9404df2ed83
InstallationDate: Installed on 2015-10-17 (529 days ago)
InstallationMedia: Ubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
IwConfig:
 eth0 no wireless extensions.

 lxcbr0 no wireless extensions.

 lo no wireless extensions.
MachineType: Dell Inc. Precision Tower 7810
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic root=UUID=63c3c29d-24d9-4ecb-8511-3f2291792bc5 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-4.10.0-14-generic N/A
 linux-backports-modules-4.10.0-14-generic N/A
 linux-firmware 1.164
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to zesty on 2017-03-02 (27 days ago)
dmi.bios.date: 04/14/2015
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A07
dmi.board.name: 0GWHMW
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 7
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA07:bd04/14/2015:svnDellInc.:pnPrecisionTower7810:pvr01:rvnDellInc.:rn0GWHMW:rvrA00:cvnDellInc.:ct7:cvr:
dmi.product.name: Precision Tower 7810
dmi.product.version: 01
dmi.sys.vendor: Dell Inc.

Revision history for this message
Walter Garcia-Fontes (walter-garcia) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.11 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc5

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Walter Garcia-Fontes (walter-garcia) wrote :

I'm using the previous kernel now, 4.10.0-13-generic #15-Ubuntu, and after some hours I haven't seen the freeze. I saw it again using the kernel reported in this bug:

linux-image-4.10.0-14-generic 4.10.0-14.16

I will run the 4.10.0-13 kernel a while longer to see if I see the freeze again.

Revision history for this message
Walter Garcia-Fontes (walter-garcia) wrote :

I did not get these freezes with the kernels 4.10.0-13 or the upstream kernel 4.11.0-041100rc5-generic which I'm running right now.

tags: added: kernel-fixed-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Walter Garcia-Fontes (walter-garcia) wrote :

I've been running 4.10.0-15 for a couple of days and I don't see this freeze any more, so I'm closing for the time being.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Changed in linux (Ubuntu):
status: Invalid → Confirmed
Revision history for this message
Walter Garcia-Fontes (walter-garcia) wrote :

I got now this freeze with 4.10.0-19. I'm seeing this message in syslog:

NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [ksmd:64]

Please see this forum message #5 and #10 just in case it may help, there is a patch applied in 4.11 RC 6:

https://ubuntuforums.org/showthread.php?t=2354739

Revision history for this message
Doug Smythies (dsmythies) wrote :

@Walter: On our forums exchanges, I missed that your "stuck" messages have "[ksmd:64]" at the end. I get that often also, but I also sometimes get "[qemu-system-x86:8549]" and "[qemu-system-x86:8552]".

I do not know what commits may or may not have been migrated back to kernel 4.10. At least for my issue, which we are not sure is the same as yours or not, these are the related commits:

3fe87967c536e828bf1ea14b3ec3827d1101152e mm: convert remove_migration_pte() to use page_vma_mapped_walk()
4b0ece6fa0167b22c004ff69e137dc94ee2e469e mm: migrate: fix remove_migration_pte() for ksm pages
ace71a19cec5eb430207c3269d8a2683f0574306 mm: introduce page_vma_mapped_walk()
d75450ff40df0199bf13dfb19f435519ff947138 mm: fix page_vma_mapped_walk() for ksm pages

Because these events are so rare, I would suggest trying kernel 4.11-rc6 for an extended time.

Revision history for this message
Walter Garcia-Fontes (walter-garcia) wrote :

Thanks dsmythies, I have actually not much idea on what is going on. I will try 4.11 longer.

Revision history for this message
Walter Garcia-Fontes (walter-garcia) wrote :

Sorry, but checking my logs my stuck message has not "[ksmd:64]" at the end, but something different, I think "js" something. Right now I don't have any to put the exact string.

I keep getting lockups with the 4.10 kernel series, I've updated to 4.10.0.20 now but I got one with 4.10.0.19.

I ran 4.11 RC8 for a week and did not get the freeze.

Revision history for this message
Walter Garcia-Fontes (walter-garcia) wrote :

I got this freeze again with 4.10.0-21-generic x8_64 kernel. When it freezes I have the following message in syslog:

May 31 02:32:03 walter kernel: [409916.803508] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [JS Helper:23856]

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.