kernel BUG at /build/linux-7LGLH_/linux-4.10.0/include/linux/swapops.h:129

Bug #1674838 reported by Mathieu Marquer
916
This bug affects 200 people
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
High
Joseph Salisbury
Zesty
Fix Released
High
Joseph Salisbury
linux-hwe-edge (Ubuntu)
Fix Released
Undecided
Unassigned
Zesty
Fix Released
Undecided
Unassigned

Bug Description

Randomly, khugepaged process will take 100% CPU, and I can only restart the computer to recover it.

Relevant dmesg attached (dmesg_crash.txt).

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: linux-image-4.10.0-14-generic 4.10.0-14.16
ProcVersionSignature: Ubuntu 4.10.0-14.16-generic 4.10.3
Uname: Linux 4.10.0-14-generic x86_64
ApportVersion: 2.20.4-0ubuntu2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: mathieu 2221 F.... pulseaudio
 /dev/snd/pcmC1D0p: mathieu 2221 F...m pulseaudio
 /dev/snd/controlC1: mathieu 2221 F.... pulseaudio
CurrentDesktop: Unity:Unity7
Date: Tue Mar 21 23:03:23 2017
HibernationDevice: RESUME=UUID=67e78e4c-94ee-447c-ae60-4387dae296dd
InstallationDate: Installed on 2016-01-31 (415 days ago)
InstallationMedia: Ubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160131)
MachineType: LENOVO 20344
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic root=UUID=b982929e-11d0-4984-885c-6c9daba24836 ro noprompt quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-4.10.0-14-generic N/A
 linux-backports-modules-4.10.0-14-generic N/A
 linux-firmware 1.164
SourcePackage: linux
UpgradeStatus: Upgraded to zesty on 2017-03-02 (19 days ago)
dmi.bios.date: 10/16/2014
dmi.bios.vendor: LENOVO
dmi.bios.version: 96CN29WW(V1.15)
dmi.board.asset.tag: 31900058WIN
dmi.board.name: INVALID
dmi.board.vendor: LENOVO
dmi.board.version: 31900058WIN
dmi.chassis.asset.tag: 31900058WIN
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Lenovo Yoga 2 13
dmi.modalias: dmi:bvnLENOVO:bvr96CN29WW(V1.15):bd10/16/2014:svnLENOVO:pn20344:pvrLenovoYoga213:rvnLENOVO:rnINVALID:rvr31900058WIN:cvnLENOVO:ct10:cvrLenovoYoga213:
dmi.product.name: 20344
dmi.product.version: Lenovo Yoga 2 13
dmi.sys.vendor: LENOVO

CVE References

Revision history for this message
Mathieu Marquer (slasher-fun) wrote :
description: updated
tags: added: kernel-bug
tags: added: regression-release
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.11 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc3

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Mathieu Marquer (slasher-fun) wrote :

Hi,

I don't remember encountering this bug with Linux 4.8, happens about every 1-3 hours with Linux 4.10 (although I couldn't figure out a way to reproduce it).

I'll try with Linux 4.11 RC3 and tell you how it goes.

Revision history for this message
Mathieu Marquer (slasher-fun) wrote :

So I *thnk* it's fixed in 4.11 RC3, although I'm not fully sure because I was encountering bug https://bugs.freedesktop.org/show_bug.cgi?id=100181 which made display crash about every 30 minutes, but after a few hours testing I didn't encounter this kernel bug, while it appeared after ~45 minutes when back on 4.10.

tags: added: kernel-fixed-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
JockeTF (jocketf) wrote :

This started happening for me after upgrading to Zesty.

Revision history for this message
Mathieu Marquer (slasher-fun) wrote :

Update from me, the patch for my display crash has been included in 4.11 RC5, and I'm not encountering the kernel bug anymore there, so it's 99% definitely fixed in 4.11 branch.

tags: added: kernel-da-key needs-reverse-bisect
Revision history for this message
kiney (jannik-winkel) wrote :

This problem made my system mostly unuseable since upgreading to zesty (4.10.0-19). One crash every 1-4 Hours.
Some workloads tend to trigger it faster (firefox + youtube) but the crash is unavoidable.
Desktop usage becomes completely impossible.
Normally i can sill ssh in, but only certain things work:
- top works fine, I always see one kernel thread hogging cpu. Sometimes additionally a userspace process (mostly firefox). htop hangs on exit.
- kill -9 on cpu hogging userspace process does not work - this seems weird
- trying to reboot gracefully hangs. SysRq works.

I just switched to 4.11-rc7 mainline, but its too soon to make any conclusions. I will report tomorrow.

Looking through the clones of this bug this seems to happen with quite different hardware.
My affected system is AMD x370 chipset with RyZen 7 1700X cpu.
The system was perfectly stable with yakkety (kernel 4.8.0-??)

Revision history for this message
Bryan Quigley (bryanquigley) wrote :

I thought it was related to my system being a brand new Ryzen with ZRam and 32GB of memory (no real swap) but apparently not.

A BIOS update bricked that motherboard so I'm back on my older Phenom(tm) II X4 945 with 12 GB of RAM (now no ZRAM). Just got the issue again. Now, I have *no* swap enabled at all and still got it.

Revision history for this message
kiney (jannik-winkel) wrote :

I also have no swap.

Revision history for this message
JockeTF (jocketf) wrote :

I'm on a laptop with Intel Ivy Bridge.

I had no swap enabled. I haven't experienced this issue since I created a small 1GB swap file. It may be too soon to tell for sure if it's related though.

Revision history for this message
Christian Sarrasin (sxc731) wrote :

My laptop (Kaby Lake) has 16 GB swap configured and I have encountered the issue 5 times so far; see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1682184/comments/27 for exact details, including which kernels were concerned.

Revision history for this message
Brannon C Bowden (bbowden) wrote : Re: [Bug 1674838] Re: kernel BUG at /build/linux-7LGLH_/linux-4.10.0/include/linux/swapops.h:129

Was on multiple i7 2600 with 2 gig swap. Took longer to occur, but still
occurred.

On Apr 18, 2017 7:31 AM, "JockeTF" <email address hidden> wrote:

> I'm on a laptop with Intel Ivy Bridge.
>
> I had no swap enabled. I haven't experienced this issue since I created
> a small 1GB swap file. It may be too soon to tell for sure if it's
> related though.
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (1677611).
> https://bugs.launchpad.net/bugs/1674838
>
> Title:
> kernel BUG at /build/linux-
> 7LGLH_/linux-4.10.0/include/linux/swapops.h:129
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> Randomly, khugepaged process will take 100% CPU, and I can only
> restart the computer to recover it.
>
> Relevant dmesg attached (dmesg_crash.txt).
>
> ProblemType: Bug
> DistroRelease: Ubuntu 17.04
> Package: linux-image-4.10.0-14-generic 4.10.0-14.16
> ProcVersionSignature: Ubuntu 4.10.0-14.16-generic 4.10.3
> Uname: Linux 4.10.0-14-generic x86_64
> ApportVersion: 2.20.4-0ubuntu2
> Architecture: amd64
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: mathieu 2221 F.... pulseaudio
> /dev/snd/pcmC1D0p: mathieu 2221 F...m pulseaudio
> /dev/snd/controlC1: mathieu 2221 F.... pulseaudio
> CurrentDesktop: Unity:Unity7
> Date: Tue Mar 21 23:03:23 2017
> HibernationDevice: RESUME=UUID=67e78e4c-94ee-447c-ae60-4387dae296dd
> InstallationDate: Installed on 2016-01-31 (415 days ago)
> InstallationMedia: Ubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64
> (20160131)
> MachineType: LENOVO 20344
> ProcFB: 0 inteldrmfb
> ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic
> root=UUID=b982929e-11d0-4984-885c-6c9daba24836 ro noprompt quiet splash
> vt.handoff=7
> RelatedPackageVersions:
> linux-restricted-modules-4.10.0-14-generic N/A
> linux-backports-modules-4.10.0-14-generic N/A
> linux-firmware 1.164
> SourcePackage: linux
> UpgradeStatus: Upgraded to zesty on 2017-03-02 (19 days ago)
> dmi.bios.date: 10/16/2014
> dmi.bios.vendor: LENOVO
> dmi.bios.version: 96CN29WW(V1.15)
> dmi.board.asset.tag: 31900058WIN
> dmi.board.name: INVALID
> dmi.board.vendor: LENOVO
> dmi.board.version: 31900058WIN
> dmi.chassis.asset.tag: 31900058WIN
> dmi.chassis.type: 10
> dmi.chassis.vendor: LENOVO
> dmi.chassis.version: Lenovo Yoga 2 13
> dmi.modalias: dmi:bvnLENOVO:bvr96CN29WW(V1.15):bd10/16/2014:svnLENOVO:
> pn20344:pvrLenovoYoga213:rvnLENOVO:rnINVALID:
> rvr31900058WIN:cvnLENOVO:ct10:cvrLenovoYoga213:
> dmi.product.name: 20344
> dmi.product.version: Lenovo Yoga 2 13
> dmi.sys.vendor: LENOVO
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/
> 1674838/+subscriptions
>

Revision history for this message
Andrea Bernabei (faenil) wrote :

same bug here, after upgrading to latest Zesty packages as of yesterday. (kernel 4.10.0-19-generic)

I use firefox-trunk, the nightly build.
At one point firefox-trunk goes 100% cpu and it can't even be killed.

After a couple of minutes, the whole system freezes.

Revision history for this message
Drascus (enchantedvisionsband) wrote :

I am also having this issue. Ever time it occurs my whole system locks up and I have to hard reset to get things working again.

Revision history for this message
kiney (jannik-winkel) wrote :

ok. With 4.11-rc7 mainline _this_ problem seems to be fixed.
But I had another (probably) unrelated crash/reboot with no useful traces in the logs.

Revision history for this message
Oliver Egginger (lau6chpad) wrote :

Hi,

come from:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1682427

cause Launchpad told me that 1682427 is a duplicate of 1674838.

I've see the same behavior with Thunderbird. See Bug Description of 1682427. But I think that is coincidence. The problem seems to be more general. I have no I idea at the moment but can give you the dmesg output of my system. See the attached file.

I have been using Ubuntu for a year on this system. First with 16.4, then 16.10 and since some days 17.4.

Before 17.4. I never have seen such a problem. The system was stable. It's a skylake system with a 6700K CPU. I had updated my board to BIOS version 2003 half a year ago. But as I said, I could not observe the error before Zesty.

I'm curious now what's going on here.

Regards
Oliver

Revision history for this message
Dennis Sheil (dennis-sheil) wrote :

I upgraded from 16.10 to 17.04 three days ago. I have been hit with this three times in three days. I am using my desktop, and then everything freezes.

This last time I had a little more freedom. I was using firefox when it became unresponsive. I opened up a terminal and ran "ps axu" and it hung halfway through. I did a dmesg and saw "kernel: [52312.170678] kernel BUG at /build/linux-Fk60NP/linux-4.10.0/include/linux/swapops.h:129!"

Then I tried to close Firefox by hitting the close button. It popped up a force quit button which I hit. This froze my desktop GUI, even the cursor.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you see if this bug also exists in the latest upstream stable 4.10 kernel? It can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.11/

Revision history for this message
kiney (jannik-winkel) wrote :

That was already tested here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1682184/comments/21

 -> seems to be fixed/not present in upstream 4.10

Revision history for this message
Jeffery Painter (jeff-painter) wrote :

I can report that mainline seems fine. I have not tried 4.10.11 but stopped at 4.10.8 as it is working well for me the past couple days.

painter@merlin:~$ uname -a
Linux merlin 4.10.8-041008-generic #201703310531 SMP Fri Mar 31 09:33:56 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

painter@merlin:~$ uptime
 17:41:20 up 11:09, 1 user, load average: 0.20, 0.06, 0.02

No more crashes!

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

Joseph, kiney;

I was having the problem yesterday with 4.10.0-19-generic whose vmlinuz is dated 8 April.

I installed 4.11.0-rc7 and ran it for awhile today with no problems, but I don't have a good feeling for how long is necessary to say "I think the problem is solved".

I have now installed 4.10.11 and am running it. We'll see...

Revision history for this message
Oliver Egginger (lau6chpad) wrote :

I also have the problem with 4.10.0-19-generic.

This is the actual kernel in Zesty.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hmm, it sounds like this bug is only happening in Ubuntu kernels and not any of the upstream kernels. That indicates this is due to a SAUCE patch. We next need to identify the last Ubuntu kernel version that did not have the bug and the first that did.

Can those affected test the following early Zesty kernel:
https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/12001523

Note with this test kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Changed in linux (Ubuntu):
importance: Medium → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Zesty):
status: Confirmed → In Progress
tags: added: performing-bisect
removed: needs-reverse-bisect
Revision history for this message
Mitchell Tasman (tasman) wrote :

I am also experiencing the problem with 4.10.0-19-generic.

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

Joseph, for us rookies out here, can you confirm that these are the only files we need to install with the 4.10.0-8 kernel you would like tested?

linux-headers-4.10.0-8_4.10.0-8.10_all.deb
linux-headers-4.10.0-8-generic_4.10.0-8.10_amd64.deb
linux-image-4.10.0-8-generic_4.10.0-8.10_amd64.deb
linux-image-extra-4.10.0-8-generic_4.10.0-8.10_amd64.deb

Thanks in advance!

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@Chris Hermansen, you only need to install the following two:
linux-image-4.10.0-8-generic_4.10.0-8.10_amd64.deb
linux-image-extra-4.10.0-8-generic_4.10.0-8.10_amd64.deb

Revision history for this message
Jeffery Painter (jeff-painter) wrote :

Trying to test as recommended and installed linux-image-4.10.0-8-generic_4.10.0-8.10_amd64.deb linux-image-extra-4.10.0-8-generic_4.10.0-8.10_amd64.deb

Note to others, if you are using any proprietary drivers, you should also download and install both:

linux-headers-4.10.0-8_4.10.0-8.10_all.deb
linux-headers-4.10.0-8-generic_4.10.0-8.10_amd64.deb

Required to continue using VirtualBox but will need to run /sbin/vboxconfig after installing.

I will try this one out for today and post back my results this afternoon.

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

Joseph, I'm running 4.10.0-8-generic #10-Ubuntu SMP Mon Feb 13 14:04:59 UTC 2017 x86_64 x86_64 x86_64 now. I'll keep you posted.

BTW no problems yesterday with 4.10.11. My wife has been using the computer for a photo project on a web-based photo book service and that is what brought about this problem in the first place.

Revision history for this message
Christian Sarrasin (sxc731) wrote :

Hi Joseph,

As previously reported:

4.10.0-15: affected
4.10.0-14: issue not experienced in over a week

Obviously "not experienced" doesn't mean the bug isn't present. All I can say is that it was first experienced with 4.10.0-15.

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

@Joseph Salisbury my computer, running 4.10.0-8, hung just now (display frozen etc). I was running "stress". This is similar to what happened previously with respect to this bug, but there seems to be nothing but a bunch of nulls in syslog this time. Not sure how to determine whether the same bug was actividated or not... any thoughts?

Thanks in advance.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for testing, Chris. Lets see if other folks hit this bug running 4.10.0-8. It might be that there are multiple bugs being hit here.

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

@Joseph Salisbury,

Running the same test on 4.10.11-041011-generic did not wedge the system.

This was the test:

for t in 1 2 3 4 5 6 7 8 9 10; do echo iter $t; stress -c 2 -m 2 -t 60; dmesg; done

So I'm leaving this running for awhile and will report back.

Revision history for this message
Jeffery Painter (jeff-painter) wrote :

I've been working pretty heavily throughout the day (Eclipse, Chrome, Thunderbird, MySQL, etc) with the 4.10.0-8 and haven't hit the bug. I will run a couple more days on this version and see if it pops up.

painter@merlin:~$ uname -a
Linux merlin 4.10.0-8-generic #10-Ubuntu SMP Mon Feb 13 14:04:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

painter@merlin:~$ uptime
 16:48:33 up 6:59, 1 user, load average: 0.36, 0.52, 0.64

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

Still no problems with 4.10.11-041011.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for testing Chris. So this bug really only seems to be happening with Ubuntu kernels and not Upstream ones. We should test some earlier Zesty kernels, so we can get a last good version and first bad version. That will allow us to bisect. Can you next test the following last 4.9 based Zesty kernel:

https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/11948001

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

@Joseph Salisbury, installed the 4.9.0-16 recommended and running now (continuing to use stress). More later...

Revision history for this message
Colin Ian King (colin-king) wrote :

I hit this same issue today and it seems like a hugepage scanning lockup to me.

Revision history for this message
Thomas M Steenholdt (tmus) wrote :

There must be a better way to do this...

All of these issues seem to arise from a BUG event in swapops.h:129. That particular spot is a section that's only active, when the kernel was built with CONFIG_MIGRATION=y. So first step is probably to verify that CONFIG_MIGRATION is even enabled for the mainline kernel (the configs are not the same, I'm told). So for all we know, the bug could still be upstream.
If somebody running the mainline kernel could post the output of the following command, that would be useful:

cat /boot/config-$(uname -r) |grep CONFIG_MIGRATION

If CONFIG_MIGRATION is enabled on mainline (CONFIG_MIGRATION=y in the output above), next step should be to check if some of the Ubuntu modifications touch in the source in any relevant places. The BUG event in swapops.h:129 seems to be hit if migration_entry_to_page() is called with an unlocked page. Grepping through the source, this function is only called from a handful of places, so it should be possible cross-reference with the Ubuntu modifications.

Perhaps this will bring us closer to the problem a bit faster?

Revision history for this message
Jeffery Painter (jeff-painter) wrote :

I have been running this kernel a 3-4 days now without any problems.

root@merlin:~# uname -a
Linux merlin 4.10.8-041008-generic #201703310531 SMP Fri Mar 31 09:33:56 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

root@merlin:~# cat /boot/config-$(uname -r) |grep CONFIG_MIGRATION
CONFIG_MIGRATION=y

Thanks!

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

@Joseph Salisbury,

No system hangs yet with kernel 4.9.0-16. I am seeing this:

[ 8034.959275] nouveau 0000:01:00.0: gr: TRAP ch 2 [003fa69000 Xorg[1191]]
[ 8034.959288] nouveau 0000:01:00.0: gr: GPC0/TPC2/TEX: 80000009

which I am trying to ignore...

Revision history for this message
zubozrout (zubozrout) wrote :

Hello,
I suspect there is nothing new here but I was able to watch my system for a few minutes before it completely died and here is a gif containing output from top - with khugepaged process staying on the top of the list taking 100% of CPU resources most of the time.

Revision history for this message
Ken Haase (kh) wrote :

I'm able to reliably reproduce it, but I haven't been able to generate a small test case (I have to reboot every time I reproduce it). FWIW, the job which reproduces it (after a few minutes) does a lot of SMP and a fair amount of MMAPing. I'm attaching a section of the syslog that could be relevant.

I'm happy to try to run one of the intermediate 4.10.0 kernels if it would help, but I don't see them in the standard repos. Do I need to add a repository to get them? (I haven't messed with the kernel for 15 years or so.)

For actually getting work done, I'll try backing out to 4.8, but I can readily try other kernels if it would help to get this resolved.

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

@Ken Haase, here is what I do:

1. make sure you aren't running any proprietary drivers and if you are and you can back them out (use the Software > Alternative Drivers tool for this)

2. look back at the kernels @Joseph Salisbury has requested we test

https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/11948001 (# 36)

files I used

linux-headers-4.9.0-16_4.9.0-16.17_all.deb
linux-headers-4.9.0-16-generic_4.9.0-16.17_amd64.deb
linux-image-4.9.0-16-generic_4.9.0-16.17_amd64.deb
linux-image-extra-4.9.0-16-generic_4.9.0-16.17_amd64.deb

https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/12001523 (# 24)

files I used

linux-headers-4.10.0-8_4.10.0-8.10_all.deb
linux-headers-4.10.0-8-generic_4.10.0-8.10_amd64.deb
linux-image-4.10.0-8-generic_4.10.0-8.10_amd64.deb
linux-image-extra-4.10.0-8-generic_4.10.0-8.10_amd64.deb

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.11/ (# 19)

files I used

linux-headers-4.10.11-041011_4.10.11-041011.201704180310_all.deb
linux-headers-4.10.11-041011-generic_4.10.11-041011.201704180310_amd64.deb
linux-image-4.10.11-041011-generic_4.10.11-041011.201704180310_amd64.deb

3. Basically what I do is:
  a. download the relevant group of kernel files (above)
  b. into a subdirectory that contains nothing else
  c. install as sudo dpkg -i *.deb
  d. reboot with my finger on the left shift key to bring up the grub menu
  e. select the advanced options to get the kernel I want to test
  f. when booted check with uname -a
  g. run my tests

Revision history for this message
JockeTF (jocketf) wrote :

Just had another encounter with this bug on 4.10.0-19-generic.

Swap was enabled this time through a 1 GB swap file.

Revision history for this message
Thomas M Steenholdt (tmus) wrote :

Okay, so MIGRATION is indeed enabled on the mainline kernel too and that possibility has been ruled out - So we're not chasing ghosts. :)

Revision history for this message
Ken Haase (kh) wrote :

Thanks @Chris Hermansen. As @Joseph Salisbury suggested, I tried 4.9.0-16, 4.10.0-8, 4.10.0-11, and 4.10.0-19 and the problem doesn't crop up until 4.10.0-19 and then does so reliably (so to speak).

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

@Ken Haase congratulations, you are a kernel-testing machine and I gladly stand in your shadow!

Your experience jibes with mine but I feel less certain since I have a hard time provoking evidence in syslog.

Revision history for this message
Harald Hannelius (harald-arcada) wrote :
Download full text (5.7 KiB)

I'm having the same error. I upgraded from 16.04 -> 16.10 -> 17.04 on the same day and now my computer freezes during night. There's no reply from the dhcpd running on the computer anymore, and the screen shows the background but no login window appears. I can't ping the computer, though AltSysRq s-u-b works.

This computer has been perfectly stable up to the upgrade day, running months a a time without hickups.

Apr 23 02:54:15 morran kernel: [62573.594338] kernel BUG at /build/linux-Fk60NP/linux-4.10.0/include/linux/swapops.h:129!
Apr 23 02:54:15 morran kernel: [62573.594355] invalid opcode: 0000 [#1] SMP
Apr 23 02:54:15 morran kernel: [62573.594364] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ebtable_filter ebtables msr bnep nfnetlink_queue nfnetlink_log nfnetlink bluetooth bridge stp llc xt_nat nf_log_ipv4 xt_multiport ipt_REJECT nf_reject_ipv4 nf_log_ipv6 nf_log_common xt_LOG xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter binfmt_misc joydev input_leds nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf snd_soc_rt5640 snd_hda_codec_hdmi snd_hda_codec_realtek snd_soc_rl6231 snd_soc_ssm4567 snd_hda_codec_generic
Apr 23 02:54:15 morran kernel: [62573.594504] snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_seq_midi snd_seq_midi_event snd_hda_intel snd_rawmidi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq ie31200_edac lpc_ich mei_me edac_core mei nuvoton_cir rc_core snd_seq_device shpchp snd_timer snd_soc_sst_acpi snd acpi_pad elan_i2c soundcore dw_dmac dw_dmac_core snd_soc_sst_match mac_hid 8250_dw i2c_designware_platform spi_pxa2xx_platform i2c_designware_core cuse coretemp nct6775 nfsd hwmon_vid auth_rpcgss nfs_acl lockd parport_pc grace ppdev sunrpc lp parport ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear hid_lenovo hid_generic uas usb_storage usbhid raid1 i915 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
Apr 23 02:54:15 morran kernel: [62573.594644] igb e1000e drm ahci dca libahci ptp i2c_algo_bit pps_core sdhci_acpi sdhci video i2c_hid hid fjes
Apr 23 02:54:15 morran kernel: [62573.594667] CPU: 1 PID: 12615 Comm: JS Helper Not tainted 4.10.0-19-generic #21-Ubuntu
Apr 23 02:54:15 morran kernel: [62573.594682] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C226M WS, BIOS P2.00 05/22/2015
Apr 23 02:54:15 morran kernel: [62573.594701] task: ffff9ec485271680 task.stack: ffffb43505ca0000
Apr 23 02:54:15 morran kernel: [62573.594716] RIP: 0010:__migration_entry_wait+0x16a/0x180
Apr 23 02:54:15 morran kernel: [62573.594726] RSP: 0000:ffffb43505ca3d68 EFLAGS: 00010246
Apr 23 02:54:15 morran kernel: [62573.594737] RAX: 0017ffffc0048078 RBX: ffffe067e0204070 RCX: ffffe067e0204070
Apr 23 02:54:15 morran kernel: [62573.594751] RDX: 0000000000000001 RSI: ffff9ec48...

Read more...

Revision history for this message
Sven (s-v-e-n) wrote :

+1, experiencing this since I've upgraded from 16.10 to 17.04, every few hours the system freezes. Really annoying ...

kernel: [ 4075.895491] kernel BUG at /build/linux-Fk60NP/linux-4.10.0/include/linux/swapops.h:129!
kernel: [ 4075.895523] invalid opcode: 0000 [#2] SMP

Revision history for this message
Olivier Febwin (febcrash) wrote :

same issue here!
Thunderbird freeze and after few seconds, system freeze

Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote :

the only big mm changes pulled in from 4.11.x that I could find with a quick look through the history are related to KSM, but those are missing a later fixup (from 4.11.x as well):

d75450ff40df0199bf13dfb19f435519ff947138 which fixes ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")

ace71a19cec5 was first contained in Ubuntu-4.10.0-14.16, which AFAICT fits nicely into the working/non-working kernels reported by various users here?

there are also a few other commits by the same upstream author, which do not explicitly contain any followup/fixes tags but touch similar code, some of which were picked.

since I cannot reproduce the problem at hand, I cannot tell whether including that fixup helps, but it might be worth a shot.

Revision history for this message
Andrey Arapov (andrey-arapov) wrote :

Hi there.

I have encountered the same issue as Dennis Sheil in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674838/comments/18 post.

The same way, was using firefox which in turn became unresponsive.

The difference is that I am running firefox v53.0 in docker (v17.04.0-ce) using ``--kernel-memory=2G`` to limit it. Then I've increased that limit once it has been reached up to 4G. (``docker update --kernel-memory=4G firefox``). Shortly after that firefox became unresponsive.

Please find attached dmesg logs in dmesg-firefox-kernel-swap-bug.txt
Linux kernel ver. 4.10.0-19-generic

And below are some outputs while I've hit that problem, hopefully some useful info in there:

```
$ df (hanged)

root@sony:~# ps auxww |grep Z
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
arno 8866 5.0 0.0 0 0 ? Zl 00:18 70:14 [firefox] <defunct>
arno 8908 10.2 0.0 0 0 ? Z 00:18 142:19 [Web Content] <defunct>

arno@sony:~$ mount |grep -i docker
rpool/docker on /var/lib/docker type zfs (rw,nodev,noatime,xattr,noacl)
rpool/docker on /var/lib/docker/zfs type zfs (rw,nodev,noatime,xattr,noacl)
rpool/docker/96f7e9089ac7e330e9868d3ff4530b39364e574a3a32479d7c3ca41f2ad76476 on /var/lib/docker/zfs/graph/96f7e9089ac7e330e9868d3ff4530b39364e574a3a32479d7c3ca41f2ad76476 type zfs (rw,relatime,xattr,noacl)
shm on /var/lib/docker/containers/7e253eea4f68195d1085aeb69d7f0428f0106cb414c66a27e03dab27a814b37f/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=2097152k)
nsfs on /run/docker/netns/c1194198f0d2 type nsfs (rw)
```

Very strange part is that my SWAP file got reduced itself down to 16MiB:

```
root@sony:~# swapoff -a
hanged ...
though reduced swap usage down to 16M (and Swap Total !!)

arno@sony:~$ free -mh
              total used free shared buff/cache available
Mem: 7.7G 1.9G 3.5G 710M 2.3G 3.6G
Swap: 16M 16M 0B

arno@sony:~$ free -b
              total used free shared buff/cache available
Mem: 8270503936 2048671744 3625005056 745644032 2596827136 3855400960
Swap: 17055744 17055744 0
```

Whilst, I should normally have 8GiB swap:

```
root@sony:~# free -mh
              total used free shared buff/cache available
Mem: 7.7G 2.0G 4.1G 473M 1.6G 4.4G
Swap: 8.0G 0B 8.0G
root@sony:~# free -b
              total used free shared buff/cache available
Mem: 8270512128 2095005696 4448485376 496779264 1727021056 4701388800
Swap: 8589930496 0 8589930496
root@sony:~#
```

Revision history for this message
Mathieu Marquer (slasher-fun) wrote :

Also affects linux-hwe-edge per #1685833

Changed in linux-hwe-edge (Ubuntu):
status: New → In Progress
Changed in linux-hwe-edge (Ubuntu Zesty):
status: New → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Yakkety test kernel with a pick of commit d75450ff40df0199bf13dfb19f435519ff947138 as suggested in comment #52. The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1674838/

Can those affected by this bug test this kernel?

Thanks in advance!

Revision history for this message
Seth Forshee (sforshee) wrote :

Fabian: I've had my eye on ace71a19cec5 too, though I haven't seen the stack trace from that commit in any of the reports. However I'm very suspicious that "mm: introduce page_vma_mapped_walk()" or a related commit is to blame here, as it looks like the problem is related to huge pages and page migration.

I really just wish we had a very reliable means to reproduce the bug. I've been trying to find a more reliable way to reproduce, no luck so far.

Revision history for this message
pauljohn32 (pauljohn) wrote :

I have this as well. For me, this is always correlated with having Thunderbird or Firefox open.

Sometimes, I also see messages like this

   NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s!

Can you say if these are superficial or not?

In my kernel log, the freeze happened Apr 22 13:41:51 and I re-started on Apr 23 14:22:38.

Revision history for this message
pauljohn32 (pauljohn) wrote :

To @jsalisbury.

Ubuntu updates just uploaded 4.10.0.20.22. Can you guess if that has fixes similar to ones you offer in http://kernel.ubuntu.com/~jsalisbury/lp1674838?

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

@Joseph Salisbury;

Just getting ready to try out your kernel test. During the dpkg -i there were some messages I did not care for, for instance:

dpkg: warning: downgrading linux-headers-4.10.0-20 from 4.10.0-20.22 to 4.10.0-20.22~lp1674838

and

Not updating initrd symbolic links since we are being updated/reinstalled
(4.10.0-20.22 was configured last, according to dpkg)

Among others. In any case, I'm proceeding with the testing now...

Revision history for this message
Dennis Sheil (dennis-sheil) wrote :

I was hitting the bug with "4.10.0-19-generic #21-Ubuntu". With a standard apt-get update and upgrade (no special kernel install) I am now on "4.10.0-20-generic #22-Ubuntu".

Got hit with the same bug on this new kernel. So the most recent general kernel update did not fix anything for me. Incidentally, like many people here, firefox usage probably helped trigger the bug.

[21100.999019] ------------[ cut here ]------------
[21100.999090] kernel BUG at /build/linux-2NWldV/linux-4.10.0/include/linux/swapops.h:129!
[21100.999193] invalid opcode: 0000 [#1] SMP
[...]
[21101.000376] CPU: 0 PID: 9914 Comm: firefox Not tainted 4.10.0-20-generic #22-Ubuntu

Revision history for this message
Nazar Mokrynskyi (nazar-pc) wrote :

As an additional information: this issue is primarily triggered by Firefox when running GPU-related workloads, in my case either video playback or live SVG charts on stock exchange. When those are not used issue either doesn't happen at all or very rarely, not even every day. Video playback and SVG charts together seem to greatly improve probability of triggering this issue.

Revision history for this message
Henning Kulander (hennikul) wrote :

To @pauljohn.

I've been running the 4.10.0-20-generic #22~lp1674838 kernel you linked to in post #58 for 6 hours at work today. No crash so far.

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

@Joseph Salisbury,

I've had the 4.10.0-20.22~lp1674838 kernel running overnight with stress and dmesg running in a loop and Firefox open to my gmail account, and no problems (neither evident nor in dmesg output).

Revision history for this message
Colin Ian King (colin-king) wrote :

So, I think the issue occurs because of the pte_lockptr lock being used on a PMD and is incontention with the lock on the same PMD. There are several possible points where this can happen, for example, pages being migrated on NUMA systems (unlikely in most of these bug reports) or even mremap() being used remap a mapped region to a new location. It may be because a page fault occurs on a page that is being migrated or being remapped while it is being accessed and it's swapped out; the latter case may introduce a lot of latency if there are lots of pending I/Os during the swap.

Revision history for this message
Christian Nassau (nassau) wrote :

I'm seeing the same bug with the current kernel "4.10.0-20-generic #22-Ubuntu". I've attached a dmesg output in case this might be helpful.

Revision history for this message
Fighter19 (littlefighter1996) wrote :

I've also just now experience this error when starting a video in Firefox
(right after the ad started to display, the whole desktop froze, only the mouse could be moved. No ALT+F1 combo etc).
However, I could log in via ssh.
The hard drive LED stood on for a good while after the freeze occurred.
The error is very inconsistent.

Revision history for this message
Strntydog (strntydog) wrote :

@Joseph Salisbury;

I have been testing your kernel all night and not a single problem has occurred. Before this, I couldn't run for two hours without a lock up. So far my uptime on this kernel is 15 hours 30 minutes. With a problem like this its difficult to say categorically "its fixed" but it certainly feels that way.

I notice Kernel 4.10.0.20.22 is available for installation from the repos now, I assume this fix isn't in that version?

Revision history for this message
cmeerw (cmeerw) wrote :

4.10.0-20-generic #22~lp1674838 seems to work fine for me as well so far.

Revision history for this message
pauljohn32 (pauljohn) wrote :

To @jsalisbury

So far so good! 28 hours with http://kernel.ubuntu.com/~jsalisbury/lp1674838.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Commit d75450ff40df0199bf13dfb1 is not in any Ubuntu kernel as of yet.

It sounds like this commit fixes the issue. If possible please test with that kernel a little longer to confirm it resolves the bug. If it does, I'll submit an SRU request to include that commit in all affected releases.

Revision history for this message
Daniel (enoch85) wrote :

daniel@XPS-13:~$ uname -r
4.10.0-20-generic

My computer just froze with this kernel. This issue is _not_ fixed in this kernel.

As you can see I run a Dell XPS-13 with the Intel network card if that information helps..? Like others say it freezes randomly or on heavy load for a longer period of time. I don't use Firefox , but I have always my mail open which is Thunderbird.

Revision history for this message
Daniel (enoch85) wrote :
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@Daniel, can you try testing the kernel posted in comment #55? The uname output should have the bug number in it:

4.10.0-20-generic #22~lp1674838

Revision history for this message
Mitchell Tasman (tasman) wrote :

Joseph,

Hi. I have been running with your test kernel:

$ uname -a
Linux titanic 4.10.0-20-generic #22~lp1674838 SMP Mon Apr 24 18:50:06 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Unfortunately, it appears that this kernel is still subject to the BUG. Please see the attached dmesg snippet.

Revision history for this message
Mitchell Tasman (tasman) wrote :

@Joseph,

Hi again. FYI, the bug never triggered when I tried your suggestion from #24 to test the following early Zesty kernel:

https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/12001523

But as noted in my previous comment, the BUG did alas trigger in my environment while running 4.10.0-20-generic #22~lp1674838.

Revision history for this message
pauljohn32 (pauljohn) wrote :

Bad news. Just had the kernel crash while using #22~lp1674838

I am attaching a kern.log file that shows this morning I woke from suspend, suspended again, work up again, and then had a freeze. Very surprising to me is that while the machine was locked to outside (cursor moved, but no key board response, no response from ssh to log in), it appears log was still accumulating feedback.

The hard reset happened around 12PM

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I will attempt to reproduce the bug. If I can, I'll bisect down to the commit that introduced this regression.

Revision history for this message
Christian Sarrasin (sxc731) wrote :

BTW for those who encounter this issue (as I've just had the pleasure today), it might be useful to reboot your box using the "magic SysRq" sequence since it's supposed to be more gentle than the power key: https://en.wikipedia.org/wiki/Magic_SysRq_key

In order to enable this feature, I have this in my /etc/rc.local:

echo 1 > /proc/sys/kernel/sysrq

Revision history for this message
Oxedions (oxedions) wrote :

Hello,

I encounter the bug also. It appends nearly once a day.
Full trace is at: https://ubuntuforums.org/showthread.php?t=2358975&page=2&p=13638621#post13638621

I tested a configuration without and with a swap, it occurs in both cases.
Also, I nearly always had the bug when using Firefox (but also append when launching some other memory intensive task). I often reach full memory with VMs, but it just kill a VM, never crash.

If I can help you in any way please ask, this bug force me to use screens everywhere and I want my scrolling back.

Ox

Revision history for this message
kiney (jannik-winkel) wrote :

>If I can help you in any way please ask, this bug force me to use screens everywhere and I want my scrolling back.

as a workaround you can just run the mainline kernel.
I'm running 4.11.0rc7 mainline and my uptime is 9 days now. With the ubuntu kernel i had crashes every few hours.

Revision history for this message
Oxedions (oxedions) wrote :

@kiney

Thanks for the tip. I installed 4.11.0rc8, will try this one.
I will keep you informed if I encounter the crash again.

Ox

Revision history for this message
Strntydog (strntydog) wrote :

@Joseph Salisbury;

Touchwood, using your test kernel, I have been running for 2 days, 20 hours without a problem. (havent shutdown once in that time) Been using firefox and thunderbird the whole time. The stock kernels wouldn't run for me for more than an hour or two. I see other people still get the fault, but it would appear that your kernel partially fixes it, at least. Perhaps there are multiple paths that trigger it.

Revision history for this message
Oxedions (oxedions) wrote :

I confirm you it works with this kernel. I went into terrible situations for system, but it survived. However, what I find strange is that I don't have swap on the disk, and still when I reach maximum memory, system write A LOT on the disk while lagging.

Revision history for this message
Edwin (edwin-v) wrote :

Same here. Interesting is that I can run games for hours at high cpu/gpu/memory load without problems, yet I can sometimes lock firefox in a couple of minutes.

I had a quick look at the commit and it fixed a problem in a file that was not introduced until 4.11rc1. Looks like the kernel team has been a little too eager to pull in new patches.

I'll test the updated kernel and see if it helps.

Revision history for this message
Dennis Sheil (dennis-sheil) wrote :

So, I have been getting hit with this every day or so on my rather old HP Pavilion desktop. I thought I might be getting hit with this problem because the desktop was a few years old.

But now I just got hit with it on a new Dell Inspiron laptop which I bought this year. Same error in my kern.log - "kernel BUG at /build/linux-Fk60NP/linux-4.10.0/include/linux/swapops.h:129!"

Both machines currently running the most up to date kernel, 4.10.0-20-generic #22-Ubuntu.

Revision history for this message
Christian Nassau (nassau) wrote :

I also had a crash with the #22~lp1674838 kernel. This happend after 2-3 days, over night when the machine was supposedly mainly idle.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

It seems I can reproduce this bug now. I'm in the process of "Reverse" bisecting now, which will identify the commit(s) in mainline that fix this and are needed in Zesty.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can folks affected by this bug test the 4.10.0-21 kernel? It can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1674838/

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

@Joseph Salisbury, I'm running the 4.10.0-21 kernel now. Let's see what happens.

Revision history for this message
hackel (hackel) wrote :

FYI, I switched to the mainline build of 4.10.11 and did *not* experience this issue. Today I tried to switch back to linux-image-4.10.0-20-generic (4.10.0-20.22) to see if it had been fixed, and hit the bug again within a few minutes. Trying out jsalisbury's -21 build now.

Running the mainline kernel is borking up my LXCs and snaps. :(

Revision history for this message
David Bierce (cppe-david) wrote :

Using Joseph's kernel, I'm still running into the same issue.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@David Bierce, can you give the 4.10.13 kernel a test? It is available from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.13/

Revision history for this message
David Bierce (cppe-david) wrote :

Looks like those builds are failing.

Revision history for this message
Chris Hermansen (c-hermansen) wrote :

@Joseph Salisbury, the 4.10.0-21 kernel crapped out on me but with a different error:

May 1 18:42:25 vancouver kernel: [28120.190864] BUG: unable to handle kernel NULL pointer dereference at 0000000000000021
May 1 18:42:25 vancouver kernel: [28120.190895] IP: dma_fence_wait_timeout+0x36/0xf0
May 1 18:42:25 vancouver kernel: [28120.190901] PGD 0
May 1 18:42:25 vancouver kernel: [28120.190902]
May 1 18:42:25 vancouver kernel: [28120.190909] Oops: 0000 [#1] SMP

etc.

Did you want the rest of the bug info from my syslog?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@David, maybe test 4.10.12 while I review that build failure:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.12/

Others have reported this bug is fixed in mainline, so I was just curious if the fix was also cc'd to upstream stable.

Revision history for this message
David Bierce (cppe-david) wrote :

Will try the mainline kernel 4.10.12 With the load failure usually <45 minutes. Will let you know. Thanks!

Revision history for this message
Mitchell Tasman (tasman) wrote :

@Joseph Salisbury,

Hi.

It looks as if the Ubuntu kernel source has diverged from stable in a significant way. In particular,
as has already been noted, on 3/10/2017, Canonical cherry-picked several patches from 4.11-pre, apparently relating to:

http://bugs.launchpad.net/bugs/1671613

Somewhere in there, I suspect, is the underlying cause of the present BUG.

You already tried overlaying an essential fix-up, d75450ff40df0199bf13dfb19f435519ff947138, as suggested in comment #52, and that did appear to improve stability, but not completely resolve the issue for myself and others.

Looking at mainline, I see a rather large number of memory memory management-related patches from Kirill A. Shutemov and others that follow ace71a19cec5eb430207c3269d8a2683f0574306 "mm: introduce page_vma_mapped_walk()" from 2/24/2017. It could be that adding one or more of those patches would stabilize the backported series relating to http://bugs.launchpad.net/bugs/1671613.

The reason for this note is to observe that the absence of the present BUG in 4.10.x stable may well be due to the absence of the Canonical backported patches, rather than due to some additive fix.

Reverting the http://bugs.launchpad.net/bugs/1671613 patch series might be another path to isolating the source of the BUG.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@Mitchell Tasman, Thanks for the feedback. I think we now know for sure that this bug does not happen with the upstream 4.10 or 4.11 kernels. Any commits that are in upstream 4.10 are also in the Ubuntu Zesty kernel. The Ubuntu 4.10.0-21 kernel I requested in comment #88 had all the upstream 4.10 commits in it, up to 4.10.11. The Ubuntu 4.10.0-21 kernel was confirmed to have this bug. However, folks have tested several of the upstream 4.10 kernels and never hit the bug. This is leading me to believe the bug is due to an Ubuntu specific SAUCE patch or patches, so I think your correct.

It would be very helpful if we could identify the last good Ubuntu kernel. However, there has been varying test results.

Comment #30 had the following results:
4.10.0-15: affected
4.10.0-14: issue not experienced in over a week

However, comment #47 disagreed with this.

I think it would be good to have 4.10.0-14 tested again by as many as possible, to see if that is in fact a version that does not have the bug. If it is good, it will give us a good starting point for a bisect. It can be downloaded from:

https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/12139078

Revision history for this message
Loïc (opensource-loic) wrote :
Download full text (6.9 KiB)

Hi, I'm having the same randome freezes using kernel 4.10.0-20.22 (trace follows).
I'm running Linux without swap space, if that's relevant information.

It mostly happens when I'm listening to music / YT using chromium or amarok while doing something else. The computer starts heating, venting, and being slow but still windows can be changed for a very short time. Then it's gone and I have to either power down or use AltGr SysRq commands.

If I can help in any way please ask.
Loïc

[182948.603342] kernel BUG at /build/linux-2NWldV/linux-4.10.0/include/linux/swapops.h:129!
[182948.603366] invalid opcode: 0000 [#1] SMP
[182948.603380] Modules linked in: rfcomm veth ccm xt_nat xfrm_user aufs ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_recent xt_comment ipt_REJECT nf_reject_ipv4 xt_physdev br_netfilter xt_mark iptable_mangle xt_addrtype xt_tcpudp xt_CT iptable_raw xt_conntrack xt_NFLOG nfnetlink_log xt_LOG nf_log_ipv4 nf_log_common nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_multiport ebtable_filter ebtables pci_stub vboxpci(OE) ip6table_filter ip6_tables vboxnetadp(OE) vboxnetflt(OE)
[182948.603570] vboxdrv(OE) deflate twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic lrw blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common ablk_helper des_generic cmac xcbc rmd160 af_key xfrm_algo bnep iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c bbswitch(OE) nls_iso8859_1 asus_nb_wmi asus_wmi sparse_keymap mxm_wmi bridge arc4 stp llc btusb btrtl btbcm uvcvideo btintel videobuf2_vmalloc videobuf2_memops bluetooth videobuf2_v4l2 videobuf2_core videodev media intel_rapl x86_pkg_temp_thermal nvidia_uvm(POE) snd_soc_rt5640 intel_powerclamp
[182948.603761] coretemp kvm_intel snd_soc_rl6231 snd_soc_core snd_compress ac97_bus kvm iwlmvm snd_pcm_dmaengine irqbypass mac80211 intel_cstate snd_seq_midi snd_seq_midi_event intel_rapl_perf snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic iwlwifi snd_rawmidi snd_hda_intel input_leds joydev intel_pch_thermal serio_raw snd_hda_codec cfg80211 snd_seq mei_me snd_hda_core snd_hwdep snd_seq_device lpc_ich mei snd_pcm snd_timer shpchp snd wmi dw_dmac dw_dmac_core int3406_thermal int3402_thermal elan_i2c snd_soc_sst_acpi snd_soc_sst_match soundcore 8250_dw int3400_thermal processor_thermal_device acpi_als spi_pxa2xx_platform i2c_designware_platform kfifo_buf i2c_designware_core int340x_thermal_zone industrialio intel_soc_dts_iosf acpi_thermal_rel mac_hid intel_smartconnect asus_wireless dummy
[182948.603964] parport_pc ppdev lp parport ip_tables x_tables autofs4 algif_skcipher af_alg dm_crypt hid_generic usb...

Read more...

Revision history for this message
Patrik Lundquist (patrik-lundquist) wrote :
Download full text (3.7 KiB)

I've had 4.10.0-14 crash before.

[1217206.617030] ------------[ cut here ]------------
[1217206.617055] kernel BUG at /build/linux-7LGLH_/linux-4.10.0/include/linux/swapops.h:129!
[1217206.617077] invalid opcode: 0000 [#1] SMP
[1217206.617089] Modules linked in: macvtap macvlan ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs cpuid xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ipmi_devintf ipmi_msghandler binfmt_misc nls_iso8859_1 hp_wmi sparse_keymap snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel intel_rapl snd_hda_codec x86_pkg_temp_thermal snd_hda_core intel_powerclamp snd_hwdep coretemp snd_pcm kvm_intel kvm snd_seq_midi input_leds snd_seq_midi_event irqbypass snd_rawmidi intel_cstate intel_rapl_perf snd_seq snd_seq_device snd_timer snd soundcore ie31200_edac lpc_ich edac_core
[1217206.617273] tpm_infineon mac_hid parport_pc ppdev lp parport ip_tables x_tables autofs4 algif_skcipher af_alg dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log btrfs xor raid6_pq hid_generic usbhid hid uas usb_storage i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc video i2c_algo_bit drm_kms_helper aesni_intel syscopyarea aes_x86_64 sysfillrect crypto_simd sysimgblt glue_helper fb_sys_fops cryptd e1000e drm ahci libahci ptp pps_core wmi fjes
[1217206.617408] CPU: 4 PID: 19974 Comm: chrome Not tainted 4.10.0-14-generic #16-Ubuntu
[1217206.617428] Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.83 10/21/2016
[1217206.617453] task: ffff8a41f99b4380 task.stack: ffffb76a0b444000
[1217206.617473] RIP: 0010:__migration_entry_wait+0x16a/0x180
[1217206.617489] RSP: 0000:ffffb76a0b447d68 EFLAGS: 00010246
[1217206.617504] RAX: 0017ffffc0048078 RBX: fffff73853184d70 RCX: fffff73853184d70
[1217206.617523] RDX: 0000000000000001 RSI: ffff8a4046135480 RDI: fffff73846582400
[1217206.617541] RBP: ffffb76a0b447d80 R08: ffff8a426e9ba6c0 R09: ffff8a426e9ba6c0
[1217206.617561] R10: 0000000000000000 R11: 000000007fffffe0 R12: fffff73846582400
[1217206.617581] R13: 3e00000000196090 R14: ffffb76a0b447e30 R15: ffff8a3edf9774b0
[1217206.617601] FS: 00007f750d777480(0000) GS:ffff8a439eb00000(0000) knlGS:0000000000000000
[1217206.617623] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1217206.617638] CR2: 000032988cc906c0 CR3: 000000069d1f8000 CR4: 00000000001406e0
[1217206.617657] Call Trace:
[1217206.617669] migration_entry_wait+0x74/0x80
[1217206.617684] do_swap_page+0x5b3/0x770
[1217206.617697] handle_mm_fault+0x873/0x1360
[1217206.617712] __do_page_fault+0x23e/0x4e0
[1217206.617726] do_page_fault+0x22/0x30
[1217206.617740] page_fault+0x28/0x30
[1217206.617751] RIP: 0033:0x55dc93afa9df
[1217206.617763] RSP: 002b:00007ffddb24e940 EFLAGS: 00010206
[1217206.617779] RAX: ffffcd65dddb65a0 RBX: 00000002aee49d7f RCX: 000000032988cc90
[1217206.617798] RDX: 0000000000000c90 RSI: 000032988cc906c0...

Read more...

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks, Patrik. We should test older kernels then. -8 was requested in the past, but it would be good to confirm those results:

https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/12001523

Revision history for this message
Mitchell Tasman (tasman) wrote :

@Patrick Salisbury,

Hi. I have yet to experience the BUG with 4.10.0-8:

$ uname -a
Linux titanic 4.10.0-8-generic #10-Ubuntu SMP Mon Feb 13 14:04:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
$ uptime
17:54:47 up 6 days, 2:57, 1 user, load average: 0.24, 0.45, 0.61

I had run this kernel for some days earlier as well, but rebooted to try one of your test kernels.

Regards,
Mitch

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the update. It might be best to next start testing 4.10.0-12 then:
https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/11837051

Revision history for this message
Mitchell Tasman (tasman) wrote :

@Patrick Salisbury,

As a P.S., If you are able to confirm that Ubuntu 4.10.0-14 is the first version affected, that would be consistent with a hypothesis that the patch series associated with: http://bugs.launchpad.net/bugs/1671613 may be a factor in the regression:

  * POWER9: Additional power9 patches (LP: #1671613)
    - mm/autonuma: don't use set_pte_at when updating protnone ptes
    - mm/autonuma: let architecture override how the write bit should be stashed in a protnone pte.
    - powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write
    - mm/gup: check for protnone only if it is a PTE entry
    - mm/thp/autonuma: use TNF flag instead of vm fault
    - SAUCE: powerpc/mm: handle protnone ptes on fork
    - SAUCE: power/mm: update pte_write and pte_wrprotect to handle savedwrite
    - mm/ksm: improve deduplication of zero pages with colouring
    - mm: introduce page_vma_mapped_walk()
    - mm, ksm: convert write_protect_page() to use page_vma_mapped_walk()
    - mm/ksm: handle protnone saved writes when making page write protect

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

For the sake of time, if 4.4.0-12 is good, here are the -13 and -14 links which are the suspected last good and first bad kernels:

4.4.0-13:
https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/12106252

and

4.4.0-14:
https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/12102155

If we can confirm these are the last good and first bad kernels, I'll start a bisect to narrow down the exact commit.

Revision history for this message
Lorrin Nelson (lhn-5) wrote :

I can also confirm the Ubuntu build of 4.10.0-8 does not have the bug.

$ uname -a
Linux gooseberry 4.10.0-8-generic #10-Ubuntu SMP Mon Feb 13 14:04:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ uptime
 21:14:04 up 6 days, 2 min, 1 user, load average: 0.79, 0.62, 0.43

I was crashing daily on the Ubuntu builds of 4.10.0-19 and 4.10.0-20

I will try 4.4.0-13.

Revision history for this message
pauljohn32 (pauljohn) wrote :

OK, I will install 4.10.0-13. Are these the correct ones:

linux-headers-4.10.0-13_4.10.0-13.15_all.deb
linux-headers-4.10.0-13-generic_4.10.0-13.15_amd64.deb
linux-image-4.10.0-13-generic_4.10.0-13.15_amd64.deb
linux-image-extra-4.10.0-13-generic_4.10.0-13.15_amd64.deb
linux-tools-common_4.10.0-13.15_all.deb

For what it is worth, I've not seen this happen in 2 days using

Linux delllap-16 4.10.0-20-generic #22-Ubuntu SMP Thu Apr 20 09:22:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

It is not nearly so frequent as it was.

Revision history for this message
Seth Forshee (sforshee) wrote :

I'm now able to produce this pretty reliably with 4.10.0-20.22. My testing shows that the "POWER9: Additional power9 patches" patches are responsible, two of them in particular:

 - mm: introduce page_vma_mapped_walk()
 - mm, ksm: convert write_protect_page() to use page_vma_mapped_walk()

These patches don't appear to be included for any functionality they provide, but rather to make "mm/ksm: handle protnone saved writes when making page write protect" a clean cherry pick instead of a backport. But the backport isn't that difficult, so as far as I can tell we can do away with the other two patches.

I've built 4.10.0-20.22 with just those changes - reverting all three of the above patches then backporting the one which is actually needed - and I'm no longer able to reproduce this bug. Everyone, please give it a try and let me know whether or not you still see problems.

http://people.canonical.com/~sforshee/lp1674838/

Revision history for this message
Colin Ian King (colin-king) wrote :

Thanks Seth, this successfully fixes a reproducer that was able to trigger this bug for me.

Revision history for this message
kiney (jannik-winkel) wrote :

I'm testing 4.10.0-20-generic #22+lp1674838v201705030839 right now. But not enough time has passed to draw any conclusions. I will report back tomorrow or so.

Revision history for this message
Jordao (carlosjordao) wrote :

I've been experiencing same kind of freeze / lookup with kernel 4.10.0, from packages
linux-image-4.10.0-19-generic
linux-image-4.10.0-20-lowlatency
linux-image-4.10.0-20-generic

I use Ubuntu at home and at my work and I experienced in both after upgrading to 17.04.
I kept only 4.8 version installed until I feel confident there are any stable versions.

Some times it freezes very quickly, other times it becomes very unstable and starts to freeze one application after another. In the later moment I could perceive something related to firefox / flash becoming zombie unable to kill, one cpu core get locked and with 100% use, then freezes. SysRq works for reboot.

If there is any way to collect data for debugging and diagnosis, please tell me.

Revision history for this message
David Bierce (cppe-david) wrote :

Sorry, was out for a few days.

Just catching up, I'm running the kernel from http://people.canonical.com/~sforshee/lp1674838/

The box I'm testing will has been hitting the panic in less than hour from its high workload since updating to 17.04 will update yes or no.

Revision history for this message
David Bierce (cppe-david) wrote :

After running under the usual high load for 4 hours the previously caused a hang after 1 hour, the issue hasn't popped up yet using 4.10.0-20-generic #22+lp1674838v201705030839

Revision history for this message
kiney (jannik-winkel) wrote :

No crash after more than 23 hours uptime with 4.10.0-20-generic #22+lp1674838v201705030839.

With the normal ubuntu 4.10 the system crashes every few hours. So I can confirm the bug is probably fixed in the kernel from ~sforshee

Revision history for this message
David Bierce (cppe-david) wrote :

4.10.0-20-generic #22+lp1674838v201705030839
fry:~$ uptime
 11:27:04 up 16:20, 2 users, load average: 13.48, 14.59, 16.62

No crash, previous crashes would appear within an hour with lighter load and memory pressure. Seth's thesis on the back port seems to be correct.

Revision history for this message
Mathieu Pellerin (nirvn-asia) wrote :

I've also been suffering from this issue. The kernel here (http://people.canonical.com/~sforshee/lp1674838/) appears to have dealt with the system hang, happily viewing videos on Firefox now.

Revision history for this message
pauljohn32 (pauljohn) wrote :

Good news. I have no crashes in 2 days with http://people.canonical.com/~sforshee/lp1674838.

Revision history for this message
Lorrin Nelson (lhn-5) wrote :

Sounds like this have been narrowed down to changes more recent than this, but I have tried 4.10.0-13. So far it is stable.

$ uname -a
Linux gooseberry 4.10.0-13-generic #15-Ubuntu SMP Thu Mar 9 20:28:34 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ uptime
 22:59:30 up 3 days, 1:30, 1 user, load average: 0.80, 0.77, 0.64

Revision history for this message
Tim Ritberg (xpert-reactos) wrote :

In opposit to Kernel 4.8x this 4.10 is very unstable.
I have this problem too. I am using 4.10.0-20-generic.

Revision history for this message
litjens (jhcl) wrote :

no more problems since i installed version 4.10.0-20-generic (sforshee@gloin) #22+lp1674838v201705030839 two days ago while the regular 4.10.0-20-generic crashed within minutes after some heavy virtualbox/firefox induced memory usage. xenial, 16.04.2

Revision history for this message
throwaway45224 (throwaway45224) wrote :
Revision history for this message
Kelvin Ma (taylorswift) wrote :

This happened to me just now; computer randomly froze and I had to do a hard reboot. Later found out what happened by digging through syslog.

Revision history for this message
throwaway45224 (throwaway45224) wrote :
Revision history for this message
Allan Mertner (amertner) wrote :

Yes, I think the conclusion is that those are secondary effects of the bug. Fingers crossed that the actual fix has been found and will appear in an official release soon...

Revision history for this message
Andrey Arapov (andrey-arapov) wrote :

Occurred again to me with Linux 4.10.0-20-generic #22-Ubuntu.
I attached some logs.

Revision history for this message
Edwin (edwin-v) wrote :

I have been running "4.10.0-20-generic #22~lp1674838" since 28 April and today it finally gave me the swapops error again. This illustrates the need for using the reproducible test cases.

Revision history for this message
Emilio (emilio-moretti) wrote :

@Joseph Salisbury Your changes fixed the problem:
http://people.canonical.com/~sforshee/lp1674838

I was not able to use the computer for more than 10 minutes before that, and it's been working OK for a few hours now. I can't wait for the official release.
Thank you

Revision history for this message
throwaway45224 (throwaway45224) wrote :

Emilio, Edwin mentioned above that lp1674838 is still susceptible to this bug.

Revision history for this message
Emilio (emilio-moretti) wrote :

Oh, it's sad to hear that. At least I'm not getting them so often. Thanks for letting me know.

Revision history for this message
Jan Claeys (janc) wrote :

edwin-v was testing an older test kernel from jsalisbury, not the newer one from sforshee

Revision history for this message
Mitchell Tasman (tasman) wrote :

I'm seeing stable behavior with @Seth Forshee's kernel:

$ uname -a
Linux titanic 4.10.0-20-generic #22+lp1674838v201705030839 SMP Wed May 3 13:41:02 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
$ uptime
 14:41:08 up 4 days, 2 min, 1 user, load average: 0.12, 0.25, 0.13

What is the next step in getting Seth's patch series into the official Ubuntu kernels?

Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote :

@tasman: it's already slated for inclusion into one of the next kernel packages: https://lists.ubuntu.com/archives/kernel-team/2017-May/083976.html

Revision history for this message
Vidar Braut Haarr (vhaarr+launchpad) wrote :

Will it be included in the next artful kernel, or only zesty? I ask because the email subject says "[PATCH 0/4][Zesty SRU]".

And when will it be, roughly? 2 weeks? 3 months? Tomorrow?

Revision history for this message
Seth Forshee (sforshee) wrote :

On Wed, May 10, 2017 at 02:19:26PM -0000, Vidar Braut Haarr wrote:
> Will it be included in the next artful kernel, or only zesty? I ask
> because the email subject says "[PATCH 0/4][Zesty SRU]".
>
> And when will it be, roughly? 2 weeks? 3 months? Tomorrow?

Currently artful is using the same kernel as zesty. As per previous
testing on this bug, kernels after 4.10 should not be affected.

Given the current SRU schedule (and barring any unforeseen
circumstances), the kernel with this fix should release on June 5.

Revision history for this message
Mathieu Pellerin (nirvn-asia) wrote :

Hmm, a June 5 release date for this fix seems to be wait too late for the severity of this system freeze bug. A large number of people who run Ubuntu now (either long-time users or newcomers testing the desktop platform) has his/her computer crash every hour or so. On that basis, shouldn't Canonical speed up the fix delivery?

Revision history for this message
Nazar Mokrynskyi (nazar-pc) wrote :

I agree that this should be released much faster. For instance, one bug I found in btrfs-progs that was fixed in less that a week after 2 weeks from reporting was fixed in Ubuntu, while bug only affected small subset of btrfs users with additional features and didn't actually corrupt or hang anything. Waiting almost 3 weeks to release something that causes many systems to hang randomly is way too long. Consider releasing this sooner, please.

Revision history for this message
flux242 (flux242) wrote :

ah, c'mon, how severe could this bug be? You, loosing hours of work because your system freezes? Who cares? They have more important things to care about - preparing for ipo and stuff. And don't forget to buy canonical stocks as they out

canonical you really should mark 17.04 as testing and not recommended for install

Revision history for this message
Christian Sarrasin (sxc731) wrote :

Not a comment on the slow patching schedule but while we're waiting for a proper release, those who prefer to use a released kernel might want to switch to 4.10.0-13-generic. Several people here (myself included) haven't had a crash for days using it and there is strong suspicion (see #104) that 4.10.0-14 is the one that introduced the bug.

apt install linux-image-4.10.0-13
apt install linux-headers-4.10.0-13-generic
apt install linux-image-extra-4.10.0-13-generic

To change your default grub boot option, see https://askubuntu.com/a/216420/145568

Not sure what regressions using 4.10.0-13 would entailed compared to later kernels?

Revision history for this message
Seth Forshee (sforshee) wrote :

> Hmm, a June 5 release date for this fix seems to be wait too late
> for the severity of this system freeze bug.

This is our normal SRU cycle. Out-of-cycle updates are for the most part
reserved for critical security issues.

> Not sure what regressions using 4.10.0-13 would entailed compared to
> later kernels?

Probably better to use the kernel I posted, as this is more up-to-date
and thus has more fixes (including security fixes). A kernel will also
be available in -proposed much sooner than that (sometime next week in
all likelihood), and this bug will be updated when that is available, so
I'd suggest updating to that when it is available.

Revision history for this message
Kwang Moo Yi (kwang-m-yi) wrote :

Hope this fix lands soon, since without it, the 4.10 kernel is basically unusable. Even for 16.04, this bug makes the hwe-edge series pretty much unusable.

Revision history for this message
Sean Tobin (seantobin) wrote :

If it helps at all in the calculus of determining when to release this update, we've got a production RethinkDB cluster that was affected by this bug under Zesty server. Installing the http://people.canonical.com/~sforshee/lp1674838/ kernel set resolved the issue for us, but I'd greatly prefer running a kernel from the official repo.

Revision history for this message
Mitchell Tasman (tasman) wrote :

@Seth Forshee,

Thanks again for your patch series and test kernel, which has resulted in my system running stably for over a week now.

Although your patch series was ACK'd on the Ubuntu kernel mailing list, I thought it worth mentioning that it doesn't yet appear to have been applied to master-next of http://kernel.ubuntu.com/git/ubuntu/ubuntu-zesty.git/.

Revision history for this message
MURAT ATES (digasi) wrote :

murat@Murat-Laptop:~$ uname -r
4.10.0-20-generic
murat@Murat-Laptop:~$ uname -a
Linux Murat-Laptop 4.10.0-20-generic #22+lp1674838v201705030839 SMP Wed May 3 13:41:02 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

I just booted into the kernel 4.10.0-20 by Seth Forshee and encountered this bug within a few hours.
This happened at one point I clicked on "Share This Page" icon on firefox while viewing some page on nike.com and I clicked on more options to load page at https://activations.cdn.mozilla.net as soon as I clicked on that the firefox stopped working and the bug was hit. Alt+Sysrq+E did not reboot properly.

Are we back to the drawing board?

Revision history for this message
throwaway45224 (throwaway45224) wrote :

Murat, are you sure it's the same bug? I experienced a random freeze on
4.8.0-51-generic, but there was nothing in syslog and symptoms were
quite different. What's in your syslog?

Revision history for this message
MURAT ATES (digasi) wrote :

You are right. This is NOT the same bug but stunningly happens with this kernel while the "share + more actions" on firefox worked fine with the ubuntu kernel.

Here is the relevant entry in the syslog:

May 11 16:40:30 Murat-Laptop dbus-daemon[1834]: Activating service name='org.gnome.GConf'
May 11 16:40:30 Murat-Laptop dbus-daemon[1834]: Successfully activated service 'org.gnome.GConf'
May 11 16:40:32 Murat-Laptop unity-panel-ser[2370]: Already have a menu for window ID 65011728 with path /com/canonical/menu/3E00010 from :1.106, unregistering that one
May 11 16:40:32 Murat-Laptop unity-panel-ser[2370]: Already have a menu for window ID 65011728 with path /com/canonical/menu/3E00010 from :1.106, unregistering that one
May 11 16:40:33 Murat-Laptop compiz[2359]: 1494535233840#011FirefoxAccounts#011ERROR#011Background refresh of profile failed, bumping _cachedAt: {"name":"FxAccountsProfileClientError","code":3
04,"errno":997,"error":"PARSE_ERROR","message":null}
May 11 16:40:34 Murat-Laptop systemd[1]: Starting Stop ureadahead data collection...
May 11 16:40:34 Murat-Laptop systemd[1]: Started Stop ureadahead data collection.
May 11 16:40:56 Murat-Laptop compiz[2359]: Vector smash protection is enabled.
May 11 16:48:54 Murat-Laptop unity-panel-ser[2370]: Already have a menu for window ID 67109067 with path /com/canonical/menu/40000CB from :1.113, unregistering that one

There is 8 minutes of silence after the crash between 16:40 and 16:48, where I instinctively pressed alt+sysrqE to failure of reboot!

I CAN reproduce this particular error, which clearly is NOT the bug we are dealing with on this thread, following the same steps. Actually you can try this by clicking on "Share this page" icon on firefox and then the + icon to install more social actions.

You CAN kill the stalled Firefox processs from a terminal.

Revision history for this message
throwaway45224 (throwaway45224) wrote :

> I CAN reproduce this particular error, which clearly is NOT the bug we
> are dealing with on this thread, following the same steps. Actually you
> can try this by clicking on "Share this page" icon on firefox and then
> the + icon to install more social actions.

If you can reproduce the error, file a separate bug report please.

Revision history for this message
David Ordenes D. (radioboy-2) wrote :

This happens to me with 4.10.0-20 in Zesty KDE. It happens to me with a fresh install and only when using a fresh firefox profile, with and without flash enabled (and having a lot of tabs open); it leaves a zombie process after manually killing it, but then the system locks in about a minute.
If I ask too much of chrome it just crashes on its own and the system keeps working fine.
I set up an 8 GB partition to be mounted as /swap when installing, so I'm not sure how it works now that swap is supposed to be a file.

Revision history for this message
Jan Claeys (janc) wrote :

One issue with Seth Forshee's kernel is that it doesn't work with secure boot, so people might have to disable that temporarily) to be able to boot...

Revision history for this message
Seth Forshee (sforshee) wrote :

> One issue with Seth Forshee's kernel is that it doesn't work with secure
> boot, so people might have to disable that temporarily) to be able to
> boot...

Yes, however the kernel which should be available in -proposed sometime
next week will have a signed counterpart.

Also while I'm commenting I might as well let you know that the patches
are now commited, for zesty. Not sure why the status hasn't been updated
but I'll do so now.

Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Committed
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Changed in linux-hwe-edge (Ubuntu Zesty):
status: In Progress → Fix Committed
Revision history for this message
Adrian (adrian-b) wrote :

Just to add another confirmation regarding Seth Forshee's kernel, I've be up for 11 days now with it and was crashing every 1 or 2 days (since upgrading to Zesty).

Uptime
16:39:28 up 11 days, 6:54, 2 users, load average: 0.99, 0.91, 0.99

uname -r
4.10.0-20-generic

uname -a
Linux adrian-AMD 4.10.0-20-generic #22+lp1674838v201705030839 SMP Wed May 3 13:41:02 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Alex Garel (alex-garel) wrote :

Same as Adrian, here, I've been testing Seth Forshee's kernel for 4+ days experimenting no more crash (versus 2/3 crash a day).

$ uname -a
Linux tignasse 4.10.0-20-generic #22+lp1674838v201705030839 SMP Wed May 3 13:41:02 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Jonas Slivka (jonas-slivka) wrote :

I'm still experiencing freezes with Seth Forshee's kernel (5-6 times a day on relatively heavy load)...

➜ ~ uname -a
Linux dell 4.10.0-20-generic #22+lp1674838v201705030839 SMP Wed May 3 13:41:02 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Attaching kern.log with stack trace.

Revision history for this message
3axis (3axis) wrote :
Download full text (3.3 KiB)

just hung again, only reisub worked.

On Wed, 17 May 2017, 13:41 Jonas Slivka, <email address hidden> wrote:

> I'm still experiencing freezes with Seth Forshee's kernel (5-6 times a
> day on relatively heavy load)...
>
> ➜ ~ uname -a
> Linux dell 4.10.0-20-generic #22+lp1674838v201705030839 SMP Wed May 3
> 13:41:02 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>
> Attaching kern.log with stack trace.
>
> ** Attachment added: "kern.log (Linux 4.10.0-20 (Forshee's kernel))"
>
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674838/+attachment/4878220/+files/kern_ubuntu_17.04_x64_vmlinuz-4.10.0-20.log
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (1683889).
> https://bugs.launchpad.net/bugs/1674838
>
> Title:
> kernel BUG at /build/linux-
> 7LGLH_/linux-4.10.0/include/linux/swapops.h:129
>
> Status in linux package in Ubuntu:
> Fix Committed
> Status in linux-hwe-edge package in Ubuntu:
> In Progress
> Status in linux source package in Zesty:
> Fix Committed
> Status in linux-hwe-edge source package in Zesty:
> Fix Committed
>
> Bug description:
> Randomly, khugepaged process will take 100% CPU, and I can only
> restart the computer to recover it.
>
> Relevant dmesg attached (dmesg_crash.txt).
>
> ProblemType: Bug
> DistroRelease: Ubuntu 17.04
> Package: linux-image-4.10.0-14-generic 4.10.0-14.16
> ProcVersionSignature: Ubuntu 4.10.0-14.16-generic 4.10.3
> Uname: Linux 4.10.0-14-generic x86_64
> ApportVersion: 2.20.4-0ubuntu2
> Architecture: amd64
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: mathieu 2221 F.... pulseaudio
> /dev/snd/pcmC1D0p: mathieu 2221 F...m pulseaudio
> /dev/snd/controlC1: mathieu 2221 F.... pulseaudio
> CurrentDesktop: Unity:Unity7
> Date: Tue Mar 21 23:03:23 2017
> HibernationDevice: RESUME=UUID=67e78e4c-94ee-447c-ae60-4387dae296dd
> InstallationDate: Installed on 2016-01-31 (415 days ago)
> InstallationMedia: Ubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64
> (20160131)
> MachineType: LENOVO 20344
> ProcFB: 0 inteldrmfb
> ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic
> root=UUID=b982929e-11d0-4984-885c-6c9daba24836 ro noprompt quiet splash
> vt.handoff=7
> RelatedPackageVersions:
> linux-restricted-modules-4.10.0-14-generic N/A
> linux-backports-modules-4.10.0-14-generic N/A
> linux-firmware 1.164
> SourcePackage: linux
> UpgradeStatus: Upgraded to zesty on 2017-03-02 (19 days ago)
> dmi.bios.date: 10/16/2014
> dmi.bios.vendor: LENOVO
> dmi.bios.version: 96CN29WW(V1.15)
> dmi.board.asset.tag: 31900058WIN
> dmi.board.name: INVALID
> dmi.board.vendor: LENOVO
> dmi.board.version: 31900058WIN
> dmi.chassis.asset.tag: 31900058WIN
> dmi.chassis.type: 10
> dmi.chassis.vendor: LENOVO
> dmi.chassis.version: Lenovo Yoga 2 13
> dmi.modalias:
> dmi:bvnLENOVO:bvr96CN29WW(V1.15):bd10/16/2014:svnLENOVO:pn20344:pvrLenovoYoga213:rvnLENOVO:rnINVALID:rvr31900058WIN:cvnLENOVO:ct10:cvrLenovoYoga213:
> dmi.product.name: 20344
> dmi.product.version: Lenovo Yoga 2 13
> dmi.sys.vendor: LENOVO
>
> ...

Read more...

Revision history for this message
Emilio (emilio-moretti) wrote :

@jonas-slivka that's not related to this ticket.

You have a different bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1677673

Revision history for this message
tkb608 (tkb608) wrote :

Ran into this bug during high load on 4.10.0-21

Followed @sxc731 Advice from...

https://bugs.launchpad.net/ubuntu/zesty/+source/linux/+bug/1674838/comments/138

Things seem good now.

Thanks for the good work here. Look forward to the fix dropping.

Revision history for this message
Ehsan Akhgari (ehsan) wrote :

I am also experiencing the hang still with the lp1674838 kernel:

$ cat /proc/version
Linux version 4.10.0-21-generic (root@gomeisa) (gcc version 6.3.0 20170406 (Ubuntu 6.3.0-12ubuntu2) ) #23~lp1674838 SMP Mon May 1 16:13:17 UTC 2017

Please see the attached kern.log, for example around May 17 14:25:47.

Revision history for this message
Hans van den Bogert (hbogert) wrote :

@ehsan, what do you mean? I can't see any error in the kern.log, which corresponds to the ones we would expect in this thread. Not every kernel panic is caused by the issue in this ticket.

Revision history for this message
Mitchell Tasman (tasman) wrote :

@Ehsan,

Hi. You are running the original test kernel for this BUG, which turned out not to resolve the problem.

Please switch to @Seth Forshee's kernel as linked to #108:

http://people.canonical.com/~sforshee/lp1674838/

That kernel identifies itself as: 4.10.0-20-generic #22+lp1674838v20170503083

Revision history for this message
Ehsan Akhgari (ehsan) wrote :

My apologies, I think I was confused before. I saw another hang with similar symptoms and I think confirmation bias made me assume I'm seeing this bug again without double checking everything carefully. Sorry about that!

Now I verified I'm running the right kernel, I'll report if I see the original hang I first reported in Bug #1687267:

$ cat /proc/version
Linux version 4.10.0-20-generic (sforshee@gloin) (gcc version 6.3.0 20170406 (Ubuntu 6.3.0-12ubuntu2) ) #22+lp1674838v201705030839 SMP Wed May 3 13:41:02 UTC 2017

Revision history for this message
Peter Selinger (selinger) wrote :

I believe that a kernel that crashes every 1-4 hours is a critical security bug, so an update should be released immediately, rather than some time in June.

I was unable to install linux-image-4.10.0-13, because it no longer seems to be in the repository.

I am still experiencing this bug with vmlinuz-4.10.0-20-generic.efi.signed:

May 18 08:42:05 puffin kernel: [171667.771712] ------------[ cut here ]------------
May 18 08:42:05 puffin kernel: [171667.771735] kernel BUG at /build/linux-2NWldV/linux-4.10.0/include/linux/swapops.h:129!
May 18 08:42:05 puffin kernel: [171667.771756] invalid opcode: 0000 [#1] SMP

Revision history for this message
throwaway45224 (throwaway45224) wrote :

> critical [...] bug

Totally agree.

> security bug

It's a critical bug, but not a security issue.

Revision history for this message
tkb608 (tkb608) wrote :
Revision history for this message
Tim Passingham (tim-8aw3u04umo) wrote :

A tale of ordinary folk - just users.

We have 4 systems running 17.04 - 3 xubuntu and 1 ubuntu. 3 are fine. Now one regularly locks up and has to be manually rebooted. The other 3 are fine. I initially reported against #1687267, (now marked as a duplicate of this) including a crash log.

The desktop that crashes is a 5 year old Dell used by my non-IT partner. This situation is quite impossible for her to tolerate. We have never, ever, had a problem like this before. SHe is now suggesting reverting to Windows, having never had such failures with that (nor have I in the last 10 or more years).

Fortunately there was a 16.10 kernel left after upgrading - 4.8.0-51 and by using Grub-customizer I managed to make this the default kernel. Whether it is truly OK with 17.04 I don't know, but so far so good.

Having always told my partner to do updates, I now have to tell her not to, because 4.10.0-21 just landed and changed the boot order.

If ubuntu (etc) is ever to be a normal desktop of choice for normal users, regular crashes of this nature have to be fixed ASAP. This one is now almost 2 months old. I've tripped over similar crash reports not (yet) marked as a duplicate (eg #1686727) and this problem may be even more widespread than people realise.

Revision history for this message
throwaway45224 (throwaway45224) wrote :

Tim Passingham, thanks for telling us your story. Ordinary folk are not
very well represented here, in the bug tracker.

Does anybody know any news websites that might be interested in covering
this issue? I emailed the guy who writes for OMG! Ubuntu!, but he didn't
respond so far. It might be a good idea if other people contacted him
too: http://www.omgubuntu.co.uk/tip. Maybe you can phrase your email
better than I did and it may draw the editor's attention. Besides,
creating buzz about the issue increases the chance that it gets noticed.

Revision history for this message
tkb608 (tkb608) wrote :

I'm supportive of the idea that this is a show song bug. But you know, free software, and not a LTS version. So maybe take it down a notch.

Revision history for this message
pauljohn32 (pauljohn) wrote :

One more happy week of success with http://kernel.ubuntu.com/~jsalisbury/lp1674838.

I understand Tim Passingham's frustration, but can I turn the question a different way? I want to know "why didn't this problem affect all Ubuntu users? Why just us?" A particular motherboard is to blame?

Also, is there any word if this fix will go into Ubuntu updates? I ask because, after the Ubuntu firmware update within the last few days, I'm no longer able to get HDMI devices to work--just black screen of death on the HDMI in Intel video. I fear I'm in for another hard-to-diagnose problem I am sure everybody will say that using the lp1674838 kernel is complicating that. If I use a stock kernel, they might be more able to figure it out.

Revision history for this message
Tim Passingham (tim-8aw3u04umo) wrote :

Thanks for the responses to my comments. I appreciate that this is not the LTS version, and I didn't have to pay to get it.

I have two approaches to updating. Either stay fully up to date on the most recent versions and risk having issues with ever changing versions, or never update at all. I take the view that being really up to date is best, so I keep us moving forward on non-LTS versions and take the risk. If I was asked to pay, say £50 or so, I would.

Other have suggested booting the appropriate version each time I boot. That's a nuisance, and not something my non-IT partner could handle at all (just switch it on and log in - why do I have to do anything else?). I suspect some who spend a lot of time in IT have little idea how people who have not grown up with it regard it (I started in 1970, so still have some feel for it, but am by no means expert in any area).

I assume that the recent release of 4.10.0-21 would override the patched 4.10.0.20. Is that correct? If so it would still cause ordinary folk some problems.

When I go away for a while I'm still going to be forced to tell my partner not to take any updates at all.

I'll see I can contact OMG ubuntu.

Revision history for this message
Patrik Lundquist (patrik-lundquist) wrote :

@Tim

Add/set in /etc/default/grub:
GRUB_DEFAULT=saved
GRUB_SAVEDEFAULT=true

and run "sudo update-grub" to have GRUB remember the kernel you booted last time.

Revision history for this message
Thomas M Steenholdt (tmus) wrote :

@Tim - What Patrick said... Or prevent the kernel package from being updated until the fix is included:
https://askubuntu.com/questions/18654/how-to-prevent-updating-of-a-specific-package

Revision history for this message
Kwang Moo Yi (kwang-m-yi) wrote :

Just to add, even 16.04 LTS is affected by this bug, as if you install linux-hwe-edge, you install the same bugged kernel. For me, the next time I wipe my machine, it won't be ubuntu, simply because this type of bug is unacceptable for a daily driver.

Revision history for this message
Loïc (opensource-loic) wrote :

I'm also grateful for the free service provided by Ubuntu here, although I must object ! The release being LTS or not is irrelevant to this problem as we are obviously in the support lifecycle of the zesty release. The release being non-LTS is not an indicator of instability, or at least it should not.

I understand this is not a security issue, but it's highly affecting user experience for at least 144 people here in this thread. And of course, it's only the tip of the iceberg, most users - and particularly non-tech users who are afaik one of Ubuntu's main targets - don't want the hassle of reporting bugs. They'll eventually change the piece of software causing them trouble.

Also, the culprit of these freezes seem to be a patch Ubuntu team decided to backport. I would assume they'd be swift to fix the issues they've been introducing in a widely used kernel.

If you know how to do so, please hasten the process, for our sakes and Ubuntu's !

Revision history for this message
Tim Passingham (tim-8aw3u04umo) wrote :

Thanks to all for the advice. I'll trying saving the last used kernel using grub.

Is there like to be any problem running a 4.8 version kernel with 17.04? It seems OK so far.

Revision history for this message
tkb608 (tkb608) wrote :

@kwang-m-yi Good point, I had forgotten that new installs of 16.04.2 would automatically get HWE and therefore this bug.

Revision history for this message
flux242 (flux242) wrote :

hm, my computer hanged up even with the kernel 21
Linux chrome 4.10.0-21-generic #23-Ubuntu SMP Fri Apr 28 16:14:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

The symptoms were very similar - firefox in the foreground and computer locked up but there where no usual swapon oops record in the syslog this time.

Revision history for this message
Vincas Dargis (talkless) wrote :

I have experienced freeze with Linux vinco 4.10.0-21-generic #23~16.04.1-Ubuntu SMP Tue May 2 12:57:17 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux while browsing with Firefox.

In addition to "invalid opcode: 0000 [#1] SMP", there are:
"NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [Timer:8949]"

At first, Firefox froze. Could not kill it, and when tried to log out for KDE session, everything froze, and I had to reisub.

Attaching syslog-cut.txt.

Revision history for this message
Strntydog (strntydog) wrote :

So, how will we know when this patch drops into a distribution kernel?

Revision history for this message
Sven Hartrumpf (hartrumpf) wrote :

Hi.

The relevant line in the kern.log of the affected server (4.10.0-21-generic) was:

kernel BUG at /build/linux-lz1RHE/linux-4.10.0/include/linux/swapops.h:129!

I switched to the 4.11 mainline kernel:
Linux 4.11.2-041102-generic #201705201036 SMP Sat May 20 14:38:21 UTC 2017 x86_64 x86_64 x86_64

I hope this is a valid work-around ...

Sven

Revision history for this message
q8374gf (q8374gf) wrote :

Mint Cinnamon 18.1 user here. Just mentioning that so as to make it easier for other people doing a Google search to find this thread, since it was pretty hard for me to find. Same problem as everyone else, swapops.h:129!. For me particularly, khugepaged goes to Uninterruptible Sleep (D) state when writing a bunch of files to usb drive, then a lot of stuff quits working, like Firefox, then a complete lockup soon after. Have to reset computer via power button. Switching kernel away from any 4.10.x available in Mint to 4.8.0-52 seems to fix it.

Revision history for this message
pauljohn32 (pauljohn) wrote :

This should be marked "SOLVED!".

It appears that newcomers are arriving at this ticket to report same old problem, without realizing it has been fixed in a replacement kernel offered by Seth Forshee. The problem is now understood and there is no need to guess about installing alternative kernels from whatever repository. Seth's kernel fixed it. Look up to comment 108:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674838/comments/108

I've been running this alternative for weeks and the problem has not reappeared.

Revision history for this message
Nazar Mokrynskyi (nazar-pc) wrote :

This issue is not solved until fixed kernel appears in stable repositories. This is exactly why new people are arriving and will do so for a few more weeks.

Revision history for this message
Tim Passingham (tim-8aw3u04umo) wrote :

And even if the patched kernel is installed by those few users that manage to find this bug report, or one of its many duplicates, it gets replaced by 4.10.0.21 which reinstates the bug. Only installing 4.11 keeps the problem away.

Revision history for this message
Rocko (rockorequin) wrote :

I find it absolutely astonishing that a bug that is easy to trigger, can cause data loss, and requires a hard reboot is treated with a lower priority than security bugs.

Surely Ubuntu 17.04 should be flagged "not suitable for production machines" until this issue is fixed?

Revision history for this message
Steve (greatape) wrote :

Another vote for this needs releasing now!

Before I tracked down the cause and found this bug report I'd spent hours trying to diagnose the reason for my machines instability. I'd re-installed Ubuntu over the previous version, when that didn't work got a new hard disk and did a totally new clean install and was shocked to find the machine still freezing regularly. At this point was almost sure it was a hardware problem, even though numerous memory scans had shown up nothing.

If I hadn't managed to setup Netconsole to catch the incriminating logs and then google this report I'd have bought a whole new machine by now, and been very annoyed when I'd have found the identical problem occurred on that as well.

The whole handling of this problem says to me Ubuntu simply can't be looked upon as anything other than a toy for technical experts who are prepared to get their hands dirty tracking the cause of problems like this, and prepared to put their machine on ice for a few weeks while they wait for a fix to be released.

You have to understand that while the problem is annoying enough in itself for those of us who know what the problem is. For those who don't know why their machines are freezing multiple times per days it's going to be causing huge amounts of grief, wasted time and expense trying to fix and diagnose the problem.

Revision history for this message
Hans van den Bogert (hbogert) wrote :

> tkb608 (tkb608) wrote on 2017-05-19:
> @kwang-m-yi Good point, I had forgotten that new installs of 16.04.2 would automatically get HWE and therefore this bug.

Although that's true, @kwang-m-yi talked about hwe-edge; and the "edge" variant is not installed by default on 16.04.2.
As example, I hit this bug in precisely that scenario; though I fully realized that installing the hwe-edge *manually* might have unforeseen consequences.

Revision history for this message
Vincas Dargis (talkless) wrote :

2017.05.21 22:53, JOSHUA CRUNK rašė:
> Have to reset computer via power button.

You can use REISUB sequence to reboot more safely.

https://en.wikipedia.org/wiki/Magic_SysRq_key

Revision history for this message
Tim Passingham (tim-8aw3u04umo) wrote :

I don't understand that article. It says press Alt+SysReq and another key, and that SysReq is the PrintScreen key.

If I press Alt + PrintScreen it asks to print the screen - I have no chance to enter another key. Can someone clarify this? I'm sure it's obvious.....

Revision history for this message
Tim Passingham (tim-8aw3u04umo) wrote :

Sorry - I now understand that I have to press all 3 keys at the same time.

Revision history for this message
munbi (gabriele) wrote :

@Tim, this is what works for me on a Dell Latitude E6540:

1. Left hand: Press and hold “Fn” key (between Ctrl and the Windows key)
2. Right hand: Press and hold “Alt” + “SysRq” keys (Stamp)
3. Left hand: Release “Fn” key
4. Left hand: Press and release “r” key. (Screenshot dialogs may start popping up. Ignore them)
5. Left hand: Press and release “e” key. (Your GUI should collapse to a tty, most processes terminated)
6. Left hand: Press and release “i” key. (Progress of key shown in the tty, most proceses killed)
7. Left hand: Press and release “s” key. (Progress of key shown in the tty, syncs filesystems)
8. Left hand: Press and release “u” key. (Progress of key shown in the tty, unmounts filesystems)
9. Left hand: Press and release “b” key. (Progress of key shown in the tty, starts reboot)
10. Right hand: Release all keys

Revision history for this message
David Jung (djung) wrote :

For those of us that just want to use Ubuntu without having to care what a "kernel" is, could someone kindly explain how to install the fix from comment #108? Just download all the .deb files from that web-page and install them with the software updater or software center?
Thank you.

Revision history for this message
David Jung (djung) wrote :

For anyone else wondering, here's what seemed to work (don't really know if it is the correct approach):

1. Download select .deb files from the page mentioned in comment #180: http://people.canonical.com/~sforshee/lp1674838/

Specifically:
linux-cloud-tools-4.10.0-20-generic_4.10.0-20.22+lp1674838v201705030839_amd64.deb
linux-cloud-tools-common_4.10.0-20.22+lp1674838v201705030839_all.deb
linux-doc_4.10.0-20.22+lp1674838v201705030839_all.deb
linux-headers-4.10.0-20-generic_4.10.0-20.22+lp1674838v201705030839_amd64.deb
linux-image-4.10.0-20-generic_4.10.0-20.22+lp1674838v201705030839_amd64.deb
linux-image-extra-4.10.0-20-generic_4.10.0-20.22+lp1674838v201705030839_amd64.deb
linux-libc-dev_4.10.0-20.22+lp1674838v201705030839_amd64.deb
linux-tools-4.10.0-20-generic_4.10.0-20.22+lp1674838v201705030839_amd64.deb
linux-tools-common_4.10.0-20.22+lp1674838v201705030839_all.deb

(was guessing that the not 'generic' and 'lowlatency' files were variations)
(don't know if the cloudtools one is needed, it seemed to indicate some kind of error on installation regarding dependencies)

2. Install them with: sudo dpkg -i *.deb

3. Reboot and select "Advanced Options for Ubuntu" from the boot menu, then select the 4.10.0-20 entry (unfortunately, there doesn't seem to be a way to see which is actually the lp1674838 kernel just installed)

4. Once booted, you can do "uname -a" to check it is the bugfix kernel you're running (has the lp1674848 in the name).

Cheers.

Revision history for this message
Tim Passingham (tim-8aw3u04umo) wrote :

@munbi - thanks. That doesn't seem to work for me on a different system, but pressing and holding Alt, PrintScreen and r, releasing all, then all of Alt, PrintScreen and e, the the same for b, etc, seemed to work. I didn't see the GUI collapse, but it did reboot. I guess the rest happened although I saw no visual evidence.

Revision history for this message
geez (geez) wrote :

This bug affects my laptop running 17.04, which is an upgrade from an earlier installation. I appear to have no other kernels available in the repository that do not have this bug, and there's no way I'm installing an untrusted kernel.

Considering how long this bug has been open, this is getting ridiculous.

Revision history for this message
stupid user (mc6312) wrote :

Very strange bug - occurred several times with the 4.10.0-20 and 4.10.0-21 kernels on the Intel Core i5-2400, but did not appear with the same kernels on the Intel Core 2 Duo E6700.

Revision history for this message
Dan Streetman (ddstreet) wrote :

Just as an FYI to everyone still commenting in this bug, this is fixed in kernel 4.10.0-22.24:
https://launchpad.net/ubuntu/+source/linux/4.10.0-22.24

which is in the -proposed repository:
https://wiki.ubuntu.com/Testing/EnableProposed

and as Seth said in comments above, is scheduled for general release to the -updates repository on June 5:
https://lists.ubuntu.com/archives/kernel-sru-announce/2017-May/000096.html

Revision history for this message
Tim Passingham (tim-8aw3u04umo) wrote :

Thanks. Installed and being used on a system that didn't display the fault, to check before inflicting on my other half.

Revision history for this message
Kwang Moo Yi (kwang-m-yi) wrote :

@ddstreet: It seems it's not yet in the xenial-proposed repo yet, but probably will be there soon-ish I guess?

Revision history for this message
tkb608 (tkb608) wrote :

  I followed the instructions at https://wiki.ubuntu.com/Testing/EnableProposed . I did the "Selective upgrading from -proposed" section before "enabling proposed", then installed 4.10.0-22.24 via synaptic.
  This went smoothly for me. Thanks @ddstreet for the link.

@kwang-m-yi I can't speak for xenial yet, this was for zesty.

Revision history for this message
Dan Streetman (ddstreet) wrote :

> It seems it's not yet in the xenial-proposed repo yet, but probably will be there
> soon-ish I guess?

Yes, there was a technical issue with the build, but it seems minor and so looks like it should be ready soon-ish. It will be listed here when it's in the -proposed repository:
https://launchpad.net/ubuntu/+source/linux-hwe-edge/+publishinghistory

Revision history for this message
tkb608 (tkb608) wrote :

  Not that I can recommend it to anyone else, but I did update to 4.10.0-22.24 on xenial 16.04.2 by pointing the to the zesty 17.04 proposed repo. Hasn't crashed in the 5 minutes I've been running it. Caveat emptor.

Revision history for this message
Brendan Murray (brendanpmurray) wrote :
Download full text (39.4 KiB)

Just had something similar to Patrick's in #100 above, but this time on 9 different GPFs on kernel 4.10.0-21. The locations are all quite different, but I wonder if there is something common there:

May 23 14:34:02 thornback kernel: [ 1371.525043] general protection fault: 0000 [#1] SMP
May 23 14:34:02 thornback kernel: [ 1371.525053] Modules linked in: rfcomm bnep nls_utf8 hfsplus intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass btusb btrtl btbcm btintel crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc uvcvideo bluetooth aesni_intel aes_x86_64 crypto_simd glue_helper cryptd joydev input_leds snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm intel_cstate intel_rapl_perf ie31200_edac snd_seq_midi edac_core snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd mei_me mei soundcore lpc_ich shpchp tpm_infineon serio_raw mac_hid parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_logitech_hidpp uas usb_storage hid_logitech_dj
May 23 14:34:02 thornback kernel: [ 1371.525101] usbhid hid nouveau ahci libahci mxm_wmi wmi i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops psmouse drm alx mdio fjes video
May 23 14:34:02 thornback kernel: [ 1371.525115] CPU: 3 PID: 2208 Comm: compiz Not tainted 4.10.0-21-generic #23-Ubuntu
May 23 14:34:02 thornback kernel: [ 1371.525120] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77-DS3H, BIOS F11a 11/13/2013
May 23 14:34:02 thornback kernel: [ 1371.525125] task: ffff96b8cb718000 task.stack: ffffb2b043c74000
May 23 14:34:02 thornback kernel: [ 1371.525132] RIP: 0010:kmem_cache_alloc_trace+0x7b/0x190
May 23 14:34:02 thornback kernel: [ 1371.525135] RSP: 0018:ffffb2b043c77ba8 EFLAGS: 00010286
May 23 14:34:02 thornback kernel: [ 1371.525139] RAX: 0000000000000000 RBX: 00000000014080c0 RCX: 00000000000161e2
May 23 14:34:02 thornback kernel: [ 1371.525143] RDX: 00000000000161e1 RSI: 00000000014080c0 RDI: 000000000001c6e0
May 23 14:34:02 thornback kernel: [ 1371.525147] RBP: ffffb2b043c77bd8 R08: ffff96b8ded9c6e0 R09: 0000000000000000
May 23 14:34:02 thornback kernel: [ 1371.525151] R10: 96b8306b23c00000 R11: 0000000000000000 R12: 00000000014080c0
May 23 14:34:02 thornback kernel: [ 1371.525155] R13: ffff96b8ce003540 R14: ffffffffc0434a62 R15: ffff96b8ce003540
May 23 14:34:02 thornback kernel: [ 1371.525168] FS: 00007f677a6b5780(0000) GS:ffff96b8ded80000(0000) knlGS:0000000000000000
May 23 14:34:02 thornback kernel: [ 1371.525172] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 23 14:34:02 thornback kernel: [ 1371.525175] CR2: 00007f677a582510 CR3: 00000004062c0000 CR4: 00000000001406e0
May 23 14:34:02 thornback kernel: [ 1371.525179] Call Trace:
May 23 14:34:02 thornback kernel: [ 1371.525214] nouveau_fence_new+0x42/0xc0 [nouveau]
May 23 14:34:02 thornback kernel: [ 1371.525240] nouveau_gem_ioctl_pushbuf+0xe80/0x1610 [nouveau]
May 23 14:34:02 thornback kernel: [ 1371.525254] drm_ioctl+0x21b/0x4c0 [drm]
May 23 14:34:02 thornback kernel: [ 1371.525258] ? ___sys_recv...

Revision history for this message
geez (geez) wrote :

@ddstreet: Installed from zesty-proposed, testing it how. Question: Given that this is quite a critical bug, is there any reason this hasn't been put into the main archives sooner? It is quite a bad user experience, especially for novice users that won't find this bug nor know how to do a selective upgrade.

Revision history for this message
Dan Streetman (ddstreet) wrote :

> Question: Given that this is quite a critical bug, is there any reason this
> hasn't been put into the main archives sooner?

I'm not the right person to ask that, but you can see Seth's comment 139 above.

"This is our normal SRU cycle. Out-of-cycle updates are for the most part
reserved for critical security issues."

Revision history for this message
Nazar Mokrynskyi (nazar-pc) wrote :

I'm wondering how did it reach zesty-proposed earlier that artful-proposed. On Artful I still do not see an update.

Revision history for this message
Seth Forshee (sforshee) wrote :

> I'm wondering how did it reach zesty-proposed earlier that artful-
> proposed. On Artful I still do not see an update.

In artful the kernels are just copied forward from zesty once they reach
-updates. Soon artful will switch to 4.11, at which point its kernels
will start going into proposed. You can manually download the deb
package files from zesty-proposed and install them in artful.

Revision history for this message
Tim Passingham (tim-8aw3u04umo) wrote :

This may not be a security issue, but it certainly is critical to normal users who don't have access to information here, let alone temporary patches or proposed changes.

Maybe ubuntu only want techies using their systems, and would prefer normal users went elsewhere? I'll certainly have to start looking for an alternative, more stable system, for my partner's desktop.

Revision history for this message
Patrick McManus (mcmanus-ducksong) wrote :

I now have 23 hours of uptime thanks to 4.10.0-22 #24 from zesty proposed. That's a zesty record for me :)

thanks.

Revision history for this message
geez (geez) wrote :

@ddstreet, sforshee: Thanks for the replies. Given that this also affects LTS installations that use the HWE stack, shouldn't this have as high a priority as critical security issues? I'd consider this a critical usability issue; there are tons of people running LTS, particularly on servers, for exactly the reason that it is "always" stable. Needless to say this is a bit of an outlier considering Ubuntu's overall good track record.

I understand that testing 4.10.0-22 takes time; one could conceivably apply a hotfix to 4.10.0-21?

As far as my own testing is concerned, I've been using my personal laptop for work today (instead of my 16.04 work laptop) with the kernel from zesty-proposed, and no issues so far.

Revision history for this message
Daniel Holbert (dholbert) wrote :

I installed the kernel with the fix from zesty-proposed (4.10.0-22-generic #24-Ubuntu), but after ~4 hours of uptime on that kernel, I hit what felt like the same system lockup again. (Or perhaps a new version of this lockup that the patch introduces / leaves unfixed?)

Here's the kern.log from that lockup. Hope this is helpful; otherwise, sorry for adding noise.

Revision history for this message
Seth Forshee (sforshee) wrote :

I apprciate that this bug has a significant impact for many. However we
have a QA process to test kernels before they get pushed out to
everyone, and it is always risky to skip this testing which is why we
rarely do it. In the case of the fix for this bug the changes required
are fairly substantial and should go through testing.

The kernel in zesty-proposed is exactly the same kernel that will be
released to -updates in a couple of weeks (assuming it passes QA, etc.)
so please do not hesitate to run this kernel. There is also a signed
kernel available in -proposed.

Revision history for this message
Seth Forshee (sforshee) wrote :

> I installed the kernel with the fix from zesty-proposed
> (4.10.0-22-generic #24-Ubuntu), but after ~4 hours of uptime on that
> kernel, I hit what felt like the same system lockup again. (Or perhaps
> a new version of this lockup that the patch introduces / leaves
> unfixed?)
>
> Here's the kern.log from that lockup. Hope this is helpful; otherwise,
> sorry for adding noise.
>
> ** Attachment added: "kern.log snippet for lockup on 4.10.0-22-generic #24-Ubuntu"
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674838/+attachment/4882848/+files/kern-log-snippet.txt

That is a differnt issue, please file a new bug. Thanks!

Revision history for this message
Rocko (rockorequin) wrote :

@dholbert: your lockup looks like https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1680904 (and the upstream bug is https://bugs.freedesktop.org/show_bug.cgi?id=100516). It's a bug in the Intel graphics drivers that unfortunately is present in both kernels 4.10 and 4.11, but should be fixed in 4.12.

Revision history for this message
Daniel Holbert (dholbert) wrote :

(Thanks @Rocko - I'd filed https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1693357 but I've now marked that as a duplicate of the bug you mentioned.)

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-zesty
Revision history for this message
flux242 (flux242) wrote :

> If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

never go full retard, dude

Revision history for this message
Christian Nassau (nassau) wrote :

I've been running "-proposed" for quite a while now without the kernel oops, hence changed the tag to "verification-done-zesty".

tags: added: verification-done-zesty
removed: verification-needed-zesty
Revision history for this message
flux242 (flux242) wrote :

for those of you like me who were reluctant to add testing into the sources list (because of course it could break something else) and couldn't find the deb packages because they didn't provide any direct link, here is a little script I had to write that downloads and installs the kernel 4.10.0-22.24. Adjust the architecture and the download directory

https://gist.github.com/635e1dad33c335fe9592bb1b7c28cd3c

Revision history for this message
Kwang Moo Yi (kwang-m-yi) wrote :

@Tim - To be fair, Ubuntu LTS without the hwe-edge is perfectly stable. However, it is true that all this process was a bit disappointing, and I ended up moving to debian sid, which seems to be a quite stable experience so far.

Revision history for this message
Frode Nordahl (fnordahl) wrote :

As much as I want the -22 to fix all problems, it does not.

However, my crashes does currently not leave a trace in the logs. Sticking to -13 keeps my workhorse running riddled of any crash or freeze problems.

The simplest way to describe what is going on is that all I/O gets stuck and that my displays get a interesting change/distortion to the background image. (I can provide screen-shots upon request)

Any advice on how to best collect useful data from crashed/frozen machine is welcome.

The issue is highly reproducible on my system under heavy load.

Example: Deploy some big software with your favorite deployment tool on LXD containers and libvirt virtual machines through MAAS, at the same time do some backups with duplicity and playback of (YouTube) video in your favorite browser (Firefox/Chrome).

Revision history for this message
Tim Passingham (tim-8aw3u04umo) wrote :

I tried to install 4.10.0-22 on the one system we have that was regularly crashing, but it gets installation errors (being unable to access files in /usr/src/linux-headers-4.10.0-22), so I cannot verify if it would fix the run-time crashes. I was unable to work round this in a short enough time to make it worthwhile.

Given that this system is stable on 4.8, I'll have to wait for the normal release process rather than check the proposed version.

Revision history for this message
Axy (joshi-a) wrote :

Been affecting me as well -- will try the alternate kernels.

Kernel that's crashing:
axyjo@frost:~$ uname -a
Linux frost 4.10.0-21-generic #23-Ubuntu SMP Fri Apr 28 16:14:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Walter Garcia-Fontes (walter-garcia) wrote :

I'm getting a freeze with the same symptoms described in this bug report, but with this message

May 31 02:37:27 walter kernel: [410240.819651] NMI watchdog:
BUG: soft lockup - CPU#14 stuck for 22s! [JS Helper:23856]

I'm running 4.10.0-21-generic.

I will duplicate my bug report (bug #1677491) to this one, feel free to reverse if it is not the same bug.

Revision history for this message
Hans van den Bogert (hbogert) wrote :

It's strange that I hit this at least once every two days, but server has been working for weeks now.
Is there already a description of what common scenarios are when this bug is hit, i.e., is it already reproducible?

Revision history for this message
Walter Garcia-Fontes (walter-garcia) wrote :

I've seen this bug in the scenario described in comment #218. In my case, there was always a frozen Firefox around, but all the other processes running in the system where also reacting slowly or frozen.

Revision history for this message
Nikolaj Løbner Sheller (nikolaj-l) wrote :

I just experienced this bug on 4.10.0-21-generic #23-Ubuntu.

My Firefox stopped responding, and when trying to kill Firefox the Firefox process became un-killable used 25% CPU time and N/A memory.

CPU: 3 PID: 2558 Comm: firefox Tainted: G OE 4.10.0-21-generic #23-Ubuntu

This is the first time I have seen the issue. I have been running 17.04 for two or three weeks.

Revision history for this message
Michael Thayer (michael-thayer) wrote :

I was also experiencing this issue with the official 4.10.0-21-generic kernel; I ran the ~lp1674838 kernel for several days, and have been running the -22-generic test kernel for a couple without problems.

Revision history for this message
vvhk (vvhk-deactivatedaccount-deactivatedaccount) wrote :

Got bitten by this again today. Ubuntu 17.04, 4.10.0-21-generic #23-Ubuntu SMP Fri Apr 28 16:14:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux. Firefox again.

I think this is issue is critical enough to expedite the release of fixed kernel, posthaste.

Revision history for this message
Colan Schwartz (colan) wrote :

Folks, just enable the Proposed channel for the next four days (until this is released into stable on the 5th). You can disable it again afterwards. This is what I've been doing, and haven't run into this issue again (or any other problems with Proposed).

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (16.0 KiB)

This bug was fixed in the package linux - 4.10.0-22.24

---------------
linux (4.10.0-22.24) zesty; urgency=low

  * linux: 4.10.0-22.24 -proposed tracker (LP: #1691146)

  * Fix NVLINK2 TCE route (LP: #1690155)
    - powerpc/powernv: Fix TCE kill on NVLink2

  * CVE-2017-0605
    - tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()

  * perf: qcom: Add L3 cache PMU driver (LP: #1689856)
    - [Config] CONFIG_QCOM_L3_PMU=y
    - perf: qcom: Add L3 cache PMU driver

  * No PMU support for ACPI-based arm64 systems (LP: #1689661)
    - drivers/perf: arm_pmu: rework per-cpu allocation
    - drivers/perf: arm_pmu: manage interrupts per-cpu
    - drivers/perf: arm_pmu: split irq request from enable
    - drivers/perf: arm_pmu: remove pointless PMU disabling
    - drivers/perf: arm_pmu: define armpmu_init_fn
    - drivers/perf: arm_pmu: fold init into alloc
    - drivers/perf: arm_pmu: factor out pmu registration
    - drivers/perf: arm_pmu: simplify cpu_pmu_request_irqs()
    - drivers/perf: arm_pmu: handle no platform_device
    - drivers/perf: arm_pmu: rename irq request/free functions
    - drivers/perf: arm_pmu: split cpu-local irq request/free
    - drivers/perf: arm_pmu: move irq request/free into probe
    - drivers/perf: arm_pmu: split out platform device probe logic
    - arm64: add function to get a cpu's MADT GICC table
    - [Config] CONFIG_ARM_PMU_ACPI=y
    - drivers/perf: arm_pmu: add ACPI framework
    - arm64: pmuv3: handle !PMUv3 when probing
    - arm64: pmuv3: use arm_pmu ACPI framework

  * [SRU][Zesty]QDF2400 kernel oops on ipmitool fru write 0 fru.bin
    (LP: #1689886)
    - ipmi: Fix kernel panic at ipmi_ssif_thread()

  * tty: pl011: fix earlycon work-around for QDF2400 erratum 44 (LP: #1689818)
    - tty: pl011: fix earlycon work-around for QDF2400 erratum 44
    - tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44

  * kernel-wedge fails in artful due to leftover squashfs-modules d-i files
    (LP: #1688259)
    - Remove squashfs-modules files from d-i
    - [Config] as squashfs-modules is builtin kernel-image must Provides: it

  * arm64/ACPI support for SBSA watchdog (LP: #1688114)
    - clocksource: arm_arch_timer: clean up printk usage
    - clocksource: arm_arch_timer: rename type macros
    - clocksource: arm_arch_timer: rename the PPI enum
    - clocksource: arm_arch_timer: move enums and defines to header file
    - clocksource: arm_arch_timer: add a new enum for spi type
    - clocksource: arm_arch_timer: rework PPI selection
    - clocksource: arm_arch_timer: split dt-only rate handling
    - clocksource: arm_arch_timer: refactor arch_timer_needs_probing
    - clocksource: arm_arch_timer: move arch_timer_needs_of_probing into DT init
      call
    - clocksource: arm_arch_timer: add structs to describe MMIO timer
    - clocksource: arm_arch_timer: split MMIO timer probing.
    - [Config] CONFIG_ACPI_GTDT=y
    - acpi/arm64: Add GTDT table parse driver
    - clocksource: arm_arch_timer: simplify ACPI support code.
    - acpi/arm64: Add memory-mapped timer support in GTDT driver
    - clocksource: arm_arch_timer: add GTDT support for memory-mapped timer
    - acpi/arm64: Add SBS...

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Vincas Dargis (talkless) wrote :

When it should be available for 16.04?

Revision history for this message
Tim Passingham (tim-8aw3u04umo) wrote :

Is there a delay in getting 4.10.0-22 to stable release? I had previously understood it was expected yesterday (June 5th).

user722 (user722)
information type: Public → Private
information type: Private → Public
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (16.0 KiB)

This bug was fixed in the package linux - 4.10.0-22.24

---------------
linux (4.10.0-22.24) zesty; urgency=low

  * linux: 4.10.0-22.24 -proposed tracker (LP: #1691146)

  * Fix NVLINK2 TCE route (LP: #1690155)
    - powerpc/powernv: Fix TCE kill on NVLink2

  * CVE-2017-0605
    - tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()

  * perf: qcom: Add L3 cache PMU driver (LP: #1689856)
    - [Config] CONFIG_QCOM_L3_PMU=y
    - perf: qcom: Add L3 cache PMU driver

  * No PMU support for ACPI-based arm64 systems (LP: #1689661)
    - drivers/perf: arm_pmu: rework per-cpu allocation
    - drivers/perf: arm_pmu: manage interrupts per-cpu
    - drivers/perf: arm_pmu: split irq request from enable
    - drivers/perf: arm_pmu: remove pointless PMU disabling
    - drivers/perf: arm_pmu: define armpmu_init_fn
    - drivers/perf: arm_pmu: fold init into alloc
    - drivers/perf: arm_pmu: factor out pmu registration
    - drivers/perf: arm_pmu: simplify cpu_pmu_request_irqs()
    - drivers/perf: arm_pmu: handle no platform_device
    - drivers/perf: arm_pmu: rename irq request/free functions
    - drivers/perf: arm_pmu: split cpu-local irq request/free
    - drivers/perf: arm_pmu: move irq request/free into probe
    - drivers/perf: arm_pmu: split out platform device probe logic
    - arm64: add function to get a cpu's MADT GICC table
    - [Config] CONFIG_ARM_PMU_ACPI=y
    - drivers/perf: arm_pmu: add ACPI framework
    - arm64: pmuv3: handle !PMUv3 when probing
    - arm64: pmuv3: use arm_pmu ACPI framework

  * [SRU][Zesty]QDF2400 kernel oops on ipmitool fru write 0 fru.bin
    (LP: #1689886)
    - ipmi: Fix kernel panic at ipmi_ssif_thread()

  * tty: pl011: fix earlycon work-around for QDF2400 erratum 44 (LP: #1689818)
    - tty: pl011: fix earlycon work-around for QDF2400 erratum 44
    - tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44

  * kernel-wedge fails in artful due to leftover squashfs-modules d-i files
    (LP: #1688259)
    - Remove squashfs-modules files from d-i
    - [Config] as squashfs-modules is builtin kernel-image must Provides: it

  * arm64/ACPI support for SBSA watchdog (LP: #1688114)
    - clocksource: arm_arch_timer: clean up printk usage
    - clocksource: arm_arch_timer: rename type macros
    - clocksource: arm_arch_timer: rename the PPI enum
    - clocksource: arm_arch_timer: move enums and defines to header file
    - clocksource: arm_arch_timer: add a new enum for spi type
    - clocksource: arm_arch_timer: rework PPI selection
    - clocksource: arm_arch_timer: split dt-only rate handling
    - clocksource: arm_arch_timer: refactor arch_timer_needs_probing
    - clocksource: arm_arch_timer: move arch_timer_needs_of_probing into DT init
      call
    - clocksource: arm_arch_timer: add structs to describe MMIO timer
    - clocksource: arm_arch_timer: split MMIO timer probing.
    - [Config] CONFIG_ACPI_GTDT=y
    - acpi/arm64: Add GTDT table parse driver
    - clocksource: arm_arch_timer: simplify ACPI support code.
    - acpi/arm64: Add memory-mapped timer support in GTDT driver
    - clocksource: arm_arch_timer: add GTDT support for memory-mapped timer
    - acpi/arm64: Add SBS...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Revision history for this message
Tim Passingham (tim-8aw3u04umo) wrote :

There's a problem with this new version - it won't install on the one system I have that is affected by the bug. The installation error is:

dpkg: error processing archive /var/cache/apt/archives/linux-headers-4.10.0-22_4.10.0-22.24_all.deb (--unpack):
 unable to open '/usr/src/linux-headers-4.10.0-22/arch/ia64/sn/Makefile.dpkg-new': Operation not permitted

The directory contains:
.../usr/src/linux-headers-4.10.0-22/arch/ia64$ cd sn
total 20
drwxr-xr-x 5 root root 4096 Jun 6 15:01 .
drwxr-xr-x 13 root root 4096 Jun 6 15:01 ..
drwxr-xr-x 3 root root 4096 Jun 6 15:01 include
drwxr-xr-x 3 root root 4096 Jun 6 15:01 kernel
drwxr-xr-x 3 root root 4096 Jun 6 15:01 pci
..../usr/src/linux-headers-4.10.0-22/arch/ia64/sn$

I've done a clean, update and install -f, but the headers still failed to install, giving a similar but different error code.

Revision history for this message
Tim Passingham (tim-8aw3u04umo) wrote :

I have now reported the installation error using the automated report system - #1696132

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (16.1 KiB)

This bug was fixed in the package linux-hwe-edge - 4.10.0-22.24~16.04.1

---------------
linux-hwe-edge (4.10.0-22.24~16.04.1) xenial; urgency=low

  * linux-hwe-edge: 4.10.0-22.24~16.04.1 -proposed tracker (LP: #1691149)

  * linux: 4.10.0-22.24 -proposed tracker (LP: #1691146)

  * Fix NVLINK2 TCE route (LP: #1690155)
    - powerpc/powernv: Fix TCE kill on NVLink2

  * CVE-2017-0605
    - tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()

  * perf: qcom: Add L3 cache PMU driver (LP: #1689856)
    - [Config] CONFIG_QCOM_L3_PMU=y
    - perf: qcom: Add L3 cache PMU driver

  * No PMU support for ACPI-based arm64 systems (LP: #1689661)
    - drivers/perf: arm_pmu: rework per-cpu allocation
    - drivers/perf: arm_pmu: manage interrupts per-cpu
    - drivers/perf: arm_pmu: split irq request from enable
    - drivers/perf: arm_pmu: remove pointless PMU disabling
    - drivers/perf: arm_pmu: define armpmu_init_fn
    - drivers/perf: arm_pmu: fold init into alloc
    - drivers/perf: arm_pmu: factor out pmu registration
    - drivers/perf: arm_pmu: simplify cpu_pmu_request_irqs()
    - drivers/perf: arm_pmu: handle no platform_device
    - drivers/perf: arm_pmu: rename irq request/free functions
    - drivers/perf: arm_pmu: split cpu-local irq request/free
    - drivers/perf: arm_pmu: move irq request/free into probe
    - drivers/perf: arm_pmu: split out platform device probe logic
    - arm64: add function to get a cpu's MADT GICC table
    - [Config] CONFIG_ARM_PMU_ACPI=y
    - drivers/perf: arm_pmu: add ACPI framework
    - arm64: pmuv3: handle !PMUv3 when probing
    - arm64: pmuv3: use arm_pmu ACPI framework

  * [SRU][Zesty]QDF2400 kernel oops on ipmitool fru write 0 fru.bin
    (LP: #1689886)
    - ipmi: Fix kernel panic at ipmi_ssif_thread()

  * tty: pl011: fix earlycon work-around for QDF2400 erratum 44 (LP: #1689818)
    - tty: pl011: fix earlycon work-around for QDF2400 erratum 44
    - tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44

  * kernel-wedge fails in artful due to leftover squashfs-modules d-i files
    (LP: #1688259)
    - Remove squashfs-modules files from d-i
    - [Config] as squashfs-modules is builtin kernel-image must Provides: it

  * arm64/ACPI support for SBSA watchdog (LP: #1688114)
    - clocksource: arm_arch_timer: clean up printk usage
    - clocksource: arm_arch_timer: rename type macros
    - clocksource: arm_arch_timer: rename the PPI enum
    - clocksource: arm_arch_timer: move enums and defines to header file
    - clocksource: arm_arch_timer: add a new enum for spi type
    - clocksource: arm_arch_timer: rework PPI selection
    - clocksource: arm_arch_timer: split dt-only rate handling
    - clocksource: arm_arch_timer: refactor arch_timer_needs_probing
    - clocksource: arm_arch_timer: move arch_timer_needs_of_probing into DT init
      call
    - clocksource: arm_arch_timer: add structs to describe MMIO timer
    - clocksource: arm_arch_timer: split MMIO timer probing.
    - [Config] CONFIG_ACPI_GTDT=y
    - acpi/arm64: Add GTDT table parse driver
    - clocksource: arm_arch_timer: simplify ACPI support code.
    - acpi/arm64: Add memory-mapped timer support in GTD...

Changed in linux-hwe-edge (Ubuntu):
status: In Progress → Fix Released
status: In Progress → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → Fix Released
Revision history for this message
Christian Sarrasin (sxc731) wrote :

@tim-8aw3u04umo no installation issue here. I used: 'sudo apt update && sudo apt upgrade'.

uname -r reports "4.10.0-22-generic"

Revision history for this message
Tim Passingham (tim-8aw3u04umo) wrote :

I have 4 17.04 systems. The one that had this bug is the only one which has an installation problem with 4.10.0-22. Maybe a coincidence, maybe not?

Revision history for this message
Tim Passingham (tim-8aw3u04umo) wrote :

Seth has fixed my installation problem, so all my 4 17.04 4.10.0.22 systems are now being used. I'll report if I get any further crashes (but I don't expect any).

See #1696132 if you are interested in what the problem was.

Changed in linux-hwe-edge (Ubuntu Zesty):
status: Fix Committed → Fix Released
Revision history for this message
Robbie Crash (sardonic-smiles) wrote :

I believe I may still be encountering this bug, this seems to happen any time the system is under significant load. Attached is my kern.log from right after the NMI watchdog soft lockups start. Please let me know if I should submit additional logs from the next time this happens, or submit a new bug.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

@Robbie

It's not the same, please file a new bug.

Revision history for this message
Richard Hainsworth (rnhainsworth) wrote :

uname -a
Linux merlin 4.10.0-28-generic #32-Ubuntu SMP Fri Jun 30 05:32:18 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Same symptoms as reported about. Firefox freezes and causes the system to freeze.

Cannot get Firefox to work. I have had to install Chrome in order to access web.

Have sent automated crash reports to Mozilla and Ubuntu.

If it is not the same problem as reported here, it looks the same from all comments her.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.