kernel instability with BTRFS filesystem as rootfs

Bug #1776005 reported by Sylvain Calador
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
High
Joseph Salisbury
Xenial
Incomplete
High
Joseph Salisbury

Bug Description

Hello,

As explained yesterday on #ubuntu-kernel, several laptop won't boot (kernel panic) with the latest kernel 4.4.0-127-generic (x86_64) and upper (proposed 4.4.0-128-generic), on Xenial 16.04 LTS

I have made many tests to verify that the bootloader (GRUB) is correctly configured (I have also reinstall it, but same result).

I have also verified the BTRFS system is clean and not corrupted (with a livecd).

The behavior is unpredictable, because often the kernel fall in "panic" during boot, but not all the time, if i do several consecutive reboot to test stability of the boot "process". I have no search behavior with previous kernels (like 4.4.0-124-generic).

I suspect (even if I'm quiet disturbed by this behavior, maybe I wrong) that something I changed on last kernel:
- on BTRFS subsystem, I have "aggregate" BTRFS partitions as rootfs (for a long time)
- something related with the last mitigation about the recent security processor breach

But my skills stop here, and I'm not able to find the origin of this behavior despite my efforts.
I can continue my investigation if I have some additional pointers.

Thanks for your time,

Sylvain

Tags: cscc xenial
Revision history for this message
Sylvain Calador (sylvain-calador) wrote :
Revision history for this message
Sylvain Calador (sylvain-calador) wrote :

The apport log, after a successful boot on the 4.4.0-127-generic

description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1776005

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Sylvain Calador (sylvain-calador) wrote :

Hello,

It seems that the IO-APIC [1] is the main origin of this instability.
So booting the kernel with the option "noapic" seems to be a good workaround for this issue, even if it's difficult to me to spot the commit which is the origin of this problem in reasonable time.

Best regards,

[1]: https://fr.wikipedia.org/wiki/IO-APIC

Sylvain

Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
status: New → In Progress
Changed in linux (Ubuntu):
status: Incomplete → In Progress
Changed in linux (Ubuntu Xenial):
importance: Undecided → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with a revert of commit 2f500a9d7ff6. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1776005

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Revision history for this message
Sylvain Calador (sylvain-calador) wrote :

Hello,

After many contradictory tests, because not reproducible, the only solution that works on all the laptop where appeared the problem, is to install the latest kernel 4.4.0.130.136 (proposed) and to uninstall the package "intel-microcode" and replace it with a dummy package to prevent the installation of intel microcode, which seems not functional in some cases. This workaround is far from optimal in terms of updates, but seems to solve these wired errors.

Best regards,

Sylvain

Revision history for this message
Sylvain Calador (sylvain-calador) wrote :

Hello Joseph,

I have tested the kernel you kindly provide in #5 but this not resolves the issue on all laptop which have trouble.

What I said in #4 is partial and not resolves the issue on all test cases. Sorry for this wrong interpretation, but the cases are quite complicated.

I have hidden some useless comments of my own that can disrupt the reading of this bug report.

Thanks for your time,

Sylvain

Changed in linux (Ubuntu Xenial):
status: In Progress → Incomplete
Changed in linux (Ubuntu):
status: In Progress → Incomplete
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.