Linux 3.2 freezes system on FUJITSU ESPRIMO P7935

Bug #914161 reported by Reinhard Tartler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

The system was upgraded from ubuntu oneiric to precise.

After booting kernel 3.2, the system locks up completely after a couple of minutes of uptime. The time is long enough to log in graphically and investigate the system log. The magic sysrq keys no longer work. System fan starts to run at full speed. System does not react to network traffic. The system log does not indicate anything related to the lock-up.

Booting kernel 3.0 from oneiric does not exhibit these symptoms.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-8-generic 3.2.0-8.14
ProcVersionSignature: Ubuntu 3.0.0-14.23-generic 3.0.9
Uname: Linux 3.0.0-14-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
AplayDevices: aplay: device_list:242: no soundcards found...
ApportVersion: 1.90-0ubuntu1
Architecture: amd64
ArecordDevices: arecord: device_list:242: no soundcards found...
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D2', '/dev/snd/hwC0D3', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D3p', '/dev/snd/pcmC0D7p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
CurrentDmesg:
 Error: command ['sh', '-c', 'dmesg | comm -13 --nocheck-order /var/log/dmesg -'] failed with exit code 1: comm: /var/log/dmesg: Permission denied
 dmesg: write failed: Broken pipe
Date: Tue Jan 10 09:30:58 2012
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
MachineType: FUJITSU ESPRIMO P7935
ProcEnviron:
 LANGUAGE=C.UTF-8
 PATH=(custom, user)
 LANG=C.UTF-8
 LC_MESSAGES=C.UTF-8
 SHELL=/usr/bin/zsh
ProcKernelCmdLine: root=UUID=0b0e2b49-6556-4b44-b100-794a02511abf ro quiet splash
PulseSinks: Error: command ['pacmd', 'list-sinks'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
PulseSources: Error: command ['pacmd', 'list-sources'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RfKill:

SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
WifiSyslog:

dmi.bios.date: 05/22/2009
dmi.bios.vendor: FUJITSU // Phoenix Technologies Ltd.
dmi.bios.version: 6.00 R1.15.2812.A2
dmi.board.name: D2812-A2
dmi.board.vendor: FUJITSU
dmi.board.version: S26361-D2812-A2
dmi.chassis.type: 6
dmi.chassis.vendor: FUJITSU
dmi.modalias: dmi:bvnFUJITSU//PhoenixTechnologiesLtd.:bvr6.00R1.15.2812.A2:bd05/22/2009:svnFUJITSU:pnESPRIMOP7935:pvr:rvnFUJITSU:rnD2812-A2:rvrS26361-D2812-A2:cvnFUJITSU:ct6:cvr:
dmi.product.name: ESPRIMO P7935
dmi.sys.vendor: FUJITSU
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 1.90-0ubuntu1
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC663 Analog [ALC663 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: tartler 2584 F.... pulseaudio
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xfc520000 irq 48'
   Mixer name : 'Intel Eaglelake HDMI'
   Components : 'HDA:10ec0663,17341157,00100001 HDA:80862803,80860101,00100000'
   Controls : 38
   Simple ctrls : 22
CurrentDmesg:
 Error: command ['sh', '-c', 'dmesg | comm -13 --nocheck-order /var/log/dmesg -'] failed with exit code 1: comm: /var/log/dmesg: Permission denied
 dmesg: write failed: Broken pipe
DistroRelease: Ubuntu 12.04
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
MachineType: FUJITSU ESPRIMO P7935
Package: linux (not installed)
ProcEnviron:
 LANGUAGE=C.UTF-8
 PATH=(custom, user)
 LANG=C.UTF-8
 LC_MESSAGES=C.UTF-8
 SHELL=/usr/bin/zsh
ProcKernelCmdLine: root=UUID=0b0e2b49-6556-4b44-b100-794a02511abf ro quiet splash
ProcVersionSignature: Ubuntu 3.0.0-14.23-generic 3.0.9
RfKill:

Tags: precise
Uname: Linux 3.0.0-14-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: netgrp netgrp
WifiSyslog:

dmi.bios.date: 05/22/2009
dmi.bios.vendor: FUJITSU // Phoenix Technologies Ltd.
dmi.bios.version: 6.00 R1.15.2812.A2
dmi.board.name: D2812-A2
dmi.board.vendor: FUJITSU
dmi.board.version: S26361-D2812-A2
dmi.chassis.type: 6
dmi.chassis.vendor: FUJITSU
dmi.modalias: dmi:bvnFUJITSU//PhoenixTechnologiesLtd.:bvr6.00R1.15.2812.A2:bd05/22/2009:svnFUJITSU:pnESPRIMOP7935:pvr:rvnFUJITSU:rnD2812-A2:rvrS26361-D2812-A2:cvnFUJITSU:ct6:cvr:
dmi.product.name: ESPRIMO P7935
dmi.sys.vendor: FUJITSU

Revision history for this message
Reinhard Tartler (siretart) wrote :
Revision history for this message
Reinhard Tartler (siretart) wrote :

this bug may be related to or duplicate of bugs #891830 and #911210

summary: - Linux 3.2 freezes system
+ Linux 3.2 freezes system on FUJITSU ESPRIMO P7935
Revision history for this message
Reinhard Tartler (siretart) wrote :

The problem still persists with mainline kernels downloaded from http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2012-01-09-precise/

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 914161

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.2.0-8.14)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get upgrade

If the bug still exists, change the bug status from Incomplete to Incomplete. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

tags: added: kernel-request-3.2.0-8.14
Revision history for this message
Reinhard Tartler (siretart) wrote : AcpiTables.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Reinhard Tartler (siretart) wrote : AlsaDevices.txt

apport information

Revision history for this message
Reinhard Tartler (siretart) wrote : AplayDevices.txt

apport information

Revision history for this message
Reinhard Tartler (siretart) wrote : Card0.Amixer.values.txt

apport information

Revision history for this message
Reinhard Tartler (siretart) wrote : Card0.Codecs.codec.2.txt

apport information

Revision history for this message
Reinhard Tartler (siretart) wrote : Card0.Codecs.codec.3.txt

apport information

Revision history for this message
Reinhard Tartler (siretart) wrote : Lspci.txt

apport information

Revision history for this message
Reinhard Tartler (siretart) wrote : Lsusb.txt

apport information

Revision history for this message
Reinhard Tartler (siretart) wrote : PciMultimedia.txt

apport information

Revision history for this message
Reinhard Tartler (siretart) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Reinhard Tartler (siretart) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Reinhard Tartler (siretart) wrote : ProcModules.txt

apport information

Revision history for this message
Reinhard Tartler (siretart) wrote : PulseSinks.txt

apport information

Revision history for this message
Reinhard Tartler (siretart) wrote : PulseSources.txt

apport information

Revision history for this message
Reinhard Tartler (siretart) wrote : UdevDb.txt

apport information

Revision history for this message
Reinhard Tartler (siretart) wrote : UdevLog.txt

apport information

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Confirmed
tags: added: kernel-da-key regression-release
Revision history for this message
Reinhard Tartler (siretart) wrote :

I did reinstall the same machine with debian/wheezy and was unable to reproduce this with the kernel from experimental:

Linux faui49i 3.2.0-rc7-amd64 #1 SMP Wed Dec 28 14:29:59 UTC 2011 x86_64 GNU/Linux

Revision history for this message
Reinhard Tartler (siretart) wrote :

Installing the kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2012-01-09-precise/ does trigger this bug again.

Maybe the bug was introduced after 3.2.0-rc7?

I'm reinstalling ubuntu oneiric again and will test with kernels from
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-rc6-precise/

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the update. It would be great to know if running 3.2.0-rc6 makes the bug go away. If it does not, it would be great if you can test some prior rc kernels, to identify when the regression was introduced.

Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.2.0-8.15)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-8.15
Revision history for this message
Reinhard Tartler (siretart) wrote :

Oneiric with the kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-rc6-precise/ does trigger this issue.

Revision history for this message
Reinhard Tartler (siretart) wrote :

Oneric with kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.1.8-precise/ does not seem to trigger this issue.

Revision history for this message
Reinhard Tartler (siretart) wrote :

Oneric with kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2012-01-12-precise/ does trigger this issue.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Reinhard Tartler (siretart) wrote :

Oneiric with kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-rc1-oneiric/ does not seem to trigger this issue.

Revision history for this message
Reinhard Tartler (siretart) wrote :

Oneiric with kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-rc2-oneiric/ does not seem to trigger this issue.

Revision history for this message
Reinhard Tartler (siretart) wrote :

Oneiric with kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-rc4-oneiric/ does not seem to trigger this issue.

note, I didn't test rc3 because that kernel failed to build an no packages are available.

Revision history for this message
Reinhard Tartler (siretart) wrote :

Oneiric with the kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-rc5-precise/ does trigger this issue

Revision history for this message
Reinhard Tartler (siretart) wrote :

Interestingly this issue *is* reproducible with http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-rc4-precise/

Does this indicate that the issue is toolchain related?

Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.2.0-9.16)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-9.16
Revision history for this message
Reinhard Tartler (siretart) wrote :

Sorry, the symptoms still happens with 3.2.0-9.16.

As indicated above, I suspect a toolchain issue.

tags: added: bot-stop-nagging
removed: kernel-request-3.2.0-8.14 kernel-request-3.2.0-8.15 kernel-request-3.2.0-9.16 precise
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Reinhard, thanks for all the testing.

Can you confirm the following:
v3.2-rc2-oneiric GOOD
v3.2-rc4-oneiric GOOD
v3.2-rc4-precise BAD
v3.2-rc5-precise BAD

I'll see what changes there are between v3.2-rc4-oneiric and v3.2-rc4-precise. I'll also perform a bisect if needed and generate some test kernels, if you have time to test them?

Revision history for this message
Reinhard Tartler (siretart) wrote :

Hi Joseph,

I have tested the following two kernels just today:

v3.2-rc4-oneiric GOOD
v3.2-rc4-precise BAD

I'm happy to test additional test kernels on my test machine. Thank you so much for investigating this issue, since we have a pool with a number of machines of this type.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Reinhard,

This would indicate the issue is caused in the kernel config files, since that is the only difference between these two kernels. I'll do a diff of the config files to see what all the differences are.

One additional test would be to test 3.2.0-3.7 vs 3.2.0-2.6, since 3.2.0-2.7 was when the kernel was re-based to v3.2-rc4. The kernels are available at:

3.2.0-2.6:
https://launchpad.net/ubuntu/+source/linux/3.2.0-2.6/+build/2976027

3.2.0-3.7:
https://launchpad.net/ubuntu/+source/linux/3.2.0-3.7/+build/2981087

Revision history for this message
Reinhard Tartler (siretart) wrote : Re: [Bug 914161] Re: Linux 3.2 freezes system on FUJITSU ESPRIMO P7935

On Do, Jan 26, 2012 at 18:34:59 (CET), Joseph Salisbury wrote:

> This would indicate the issue is caused in the kernel config files,
> since that is the only difference between these two kernels. I'll do a
> diff of the config files to see what all the differences are.

Right, that also seems plausible to me.

> One additional test would be to test 3.2.0-3.7 vs 3.2.0-2.6, since
> 3.2.0-2.7 was when the kernel was re-based to v3.2-rc4. The kernels are
> available at:
>
> 3.2.0-2.6:
> https://launchpad.net/ubuntu/+source/linux/3.2.0-2.6/+build/2976027
>
> 3.2.0-3.7:
> https://launchpad.net/ubuntu/+source/linux/3.2.0-3.7/+build/2981087

I can confirm that both these kernels exhibit the behavior descriped in
the bug description of this bug.

IOW: both are BAD.

--
Gruesse/greetings,
Reinhard Tartler, KeyID 945348A4

Revision history for this message
Reinhard Tartler (siretart) wrote :

To make things more interesting, I've just tested https://launchpad.net/ubuntu/precise/amd64/linux-image-3.2.0-1-generic/3.2.0-1.3 on this machine and that kernel also exhibits the described symptoms above.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

That is interesting. Kernel 3.2.0-1.3 was re-based to upstream v3.2-rc2. I recall you did not hit this bug when testing mainline v3.2-rc4, in comment #36.

There are all of the kernels for Precise listed on the page:
https://launchpad.net/ubuntu/precise/+source/linux

Would it be possible for you to test some of these kernels? We already know 3.2.0-1.3 is bad, so maybe test 3.2.0-1.1:
https://launchpad.net/ubuntu/+source/linux/3.2.0-1.1

The goal is to find the first kernel version that exhibited this bug, so we can bisect and identify the exact commit.

Revision history for this message
Reinhard Tartler (siretart) wrote :

Oneiric with the kernel from https://launchpad.net/ubuntu/+source/linux/3.2.0-1.1 does trigger this issue

Revision history for this message
Reinhard Tartler (siretart) wrote :

Oneiric with the kernel from
https://launchpad.net/ubuntu/precise/amd64/linux-image-3.1.0-2-generic/3.1.0-2.3
does not seem to trigger this issue.

Revision history for this message
Reinhard Tartler (siretart) wrote :

Is there anything else I could collect data-wise?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Reinhard,

Sorry for the delay. I will build some additional test kernels to continue the bisect. I'll post a link to the test kernel shortly.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

It looks like the following config file changes were made between v3.1.0-2.3 and v3.2.0-1.1:

  * [Config] enforcer -- ensure CONFIG_FAT_FS is built-in on arm
  * [Config] Enable PCI_IOV on powerpc
  * [Config] Temporarily disable CONFIG_PASEMI_MAC on powerpc
  * [Config] updateconfigs after select ARM_AMBA
  * [Config] Temporarily disable CONFIG_KVM_BOOK3S_32 on powerpc
  * [Config] Enable CONFIG_EXT2_FS=m
  * [Config] Build in CONFIG_SATA_AHCI=y
  * [Config] Enable EVENT_POWER_TRACING_DEPRECATED=y for powertop
  * [Config] Built-in xen-netfront and xen-blkfront
  * [Config] CONFIG_USB_XHCI_HCD=y
  * [Config] CONFIG_R6040=m
  * [Config] Consolidated amd64 server flavour into generic
  * [Config] updateconfigs after rebase to 3.2-rc1
  * [Config] Disabled dm-raid4-5
  * [Config] Disabled ndiswrapper
  * [Config] Disable vt6656
  * [Config] exclude ppp-modules for virtual flavour
  * [Config] CONFIG_MEMSTICK_R592=m

I'll investigate further and will post a test kernel. I may be able to build a v3.2.0-1.1 test kernel using the v3.1.0-2.3 config file. That will confirm that one of the config options about causes this bug.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I created a test kernel, which is available at:
http://people.canonical.com/~jsalisbury/lp914161

Can you test that kernel and report back if the bug still exists or not?

Revision history for this message
Reinhard Tartler (siretart) wrote :

I've been running the kernel from http://people.canonical.com/~jsalisbury/lp914161/ kernel for over 3h now, and it does not exhibit the bug:

>> uname -a
Linux faui49i 3.2.0-1-generic #1~lp914161v1 SMP Thu Feb 16 21:16:57 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>> uptime
  15:26:09 up 3:19, 1 user, load average: 0.00, 0.01, 0.05

Revision history for this message
Reinhard Tartler (siretart) wrote :

As an extra saftey check, I've tried kernel v3.2.0-1.1 again, and it exposed the bug after a couple of minutes of uptime.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

That means that a config file change between 3.1.0-2.3 and 3.2.0-1.1 caused this bug.

When the bug happens, do you get any output on the screen? It would be very helpful if we could capture a trace or a screen shot of the panic.

Revision history for this message
Reinhard Tartler (siretart) wrote :

Unfortunately, there isn't any output of the screen. No matter if in X or on the console, the screen just freezes in the sense that it is not updated anymore and the content at the time of the freeze becomes static. There is no visual corruption in X11 either.

I'm not even sure if the kernel actually panics, or if it "just" hangs in some ISR or something.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the update, Reinhard.

I will bisect through the kernel config options to identify which specific one is causing this bug. I will have another test kernel posted shortly.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I created another test kernel, which is available at:
http://people.canonical.com/~jsalisbury/lp914161

Can you test that kernel and report back if the bug still exists or not?

Revision history for this message
Reinhard Tartler (siretart) wrote :

the first of the two testkernel seems to be GOOD:

>> uname -a
Linux faui49i 3.2.0-1-generic #1~lp914161v1 SMP Thu Feb 16 21:16:57 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
tartler-@faui49i:~
>> uptime
 12:44:06 up 2:52, 1 user, load average: 0.00, 0.01, 0.05

I'm going to test the 'v2' one now.

Revision history for this message
Reinhard Tartler (siretart) wrote :

Also the 'v2' kernel seems to be GOOD:

root@faui49i:~# uname -a
Linux faui49i 3.2.0-1-generic #1~lp914161v2 SMP Fri Feb 17 20:45:25 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
root@faui49i:~# uptime
 13:35:44 up 49 min, 1 user, load average: 0.00, 0.01, 0.05

Revision history for this message
Reinhard Tartler (siretart) wrote :

hm. If I had to guess, I would pick CONFIG_USB_XHCI_HCD. Is there a command-line option to disable this driver at boot time?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

There is no way to disable that specific option. However, I can build a test kernel with just that option disabled. I'll post a link to it shortly.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I created another test kernel, which is available at:
http://people.canonical.com/~jsalisbury/lp914161

This test kernel has the CONFIG_USB_XHCI_HCD option set to m, which is how it is set in v3.1.0-2.3.

Can you test that kernel and report back if the bug still exists or not?

Revision history for this message
Reinhard Tartler (siretart) wrote :

Hmrpf. wrong pick, this kernel again triggers the symptom after a couple of minutes:

unaroot@faui49i:~# uname -a
Linux faui49i 3.2.0-1-generic #1~lp914161v3 SMP Tue Feb 21 16:26:36 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the update, Reinhard. It was definitely worth a test. I'll continue to bisect the kernel config options and have another test kernel built shortly.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I created another test kernel, which is available at:
http://people.canonical.com/~jsalisbury/lp914161

Can you test that kernel and report back if the bug still exists or not?

Revision history for this message
Reinhard Tartler (siretart) wrote :

Still BAD:

root@faui49i:~# uname -a
Linux faui49i 3.2.0-1-generic #1~lp914161v4 SMP Tue Feb 21 22:42:16 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

This kernel again hangs after a couple of minutes of uptime.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the update.

That's a bit of good news that the v4 kernel is bad, since the v2 kernel was good. I enabled the following config parameters in the v4 kernel:

CONFIG_IRQ_REMAP=y
 CONFIG_INTEGRITY=y
CONFIG_INTEL_IOMMU=y
CONFIG_INTEL_IOMMU_DEFAULT_ON=y
CONFIG_INTEL_IOMMU_FLOPPY_WA=y

I'll build a test kernel with the IOMMU options disabled like they are in 3.1.0-2.3.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I created another test kernel, which is available at:
http://people.canonical.com/~jsalisbury/lp914161

Can you test that kernel and report back if the bug still exists or not?

Revision history for this message
Reinhard Tartler (siretart) wrote :

This kernel appears to be GOOD:

root@faui49i:~# uname -a
Linux faui49i 3.2.0-1-generic #1~lp914161v5 SMP Thu Feb 23 16:45:58 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
root@faui49i:~ $ uptime
 09:49:46 up 43 min, 1 user, load average: 0.00, 0.01, 0.05

I'll continue running this kernel for rest of today, just to be sure, but other than that, I'd say 45min already classifies as good.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks, Reinhard.

I'm building one additional kernel. This v6 kernel will have the following re-enable:
CONFIG_IRQ_REMAP=y
CONFIG_INTEGRITY=y

I will leave the following options disabled still to confirm they are the cause of this bug:
CONFIG_INTEL_IOMMU
CONFIG_INTEL_IOMMU_DEFAULT_ON
CONFIG_INTEL_IOMMU_FLOPPY_WA

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Actually a really quick test would be to boot back with the latest precise kernel, but use the following boot option:
intel_iommu=off

Revision history for this message
Reinhard Tartler (siretart) wrote :

Unfortunately, I have to leave now for the weekend and will be able to restart testing on monday.

Nevertheless, I've booted the latest precise kernel without intel_iommu=off to verify that the bug still persists:

root@faui49i:~# uname -a
Linux faui49i 3.2.0-17-generic #26-Ubuntu SMP Fri Feb 17 21:35:49 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
root@faui49i:~# uptime
 19:27:55 up 24 min, 1 user, load average: 0.00, 0.01, 0.06

So it *seems* the bug does no longer happen, but I really need to keep the machine running for a longer amount of time to be sure. Nevertheless, the conclusio of bug #907377 was to disable INTEL_IOMMU by default AFAIUI the changelog:

  * Revert "SAUCE: dmar: disable if ricoh multifunction detected"
  * [Config] Disable CONFIG_INTEL_IOMMU_DEFAULT_ON
    - LP: #907377, #911236
  * [Config] Enable CONFIG_IRQ_REMAP

Therefore, I believe that the option intel_iommu=off is now used by default. Will this option be kept for precise release?

Revision history for this message
Reinhard Tartler (siretart) wrote :

I've been running the latest precise kern now for over 6h without problems.

However, passing 'intel_iommu=on' to the kernel command line does make the problem re-appear. So this indeed seems to be a bug in the intel iommu driver.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@Reinhard

We intend to keep iommu disabled by default. So it appears this bug is resolved since you have to purposely enable intel_iommu to make the bug come back.

I will mark the bug as "Fix Release" since it is resolved in the latest Precise kernel.

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Reinhard Tartler (siretart) wrote :

On Mo, Feb 27, 2012 at 19:02:19 (CET), Joseph Salisbury wrote:

> We intend to keep iommu disabled by default. So it appears this bug is
> resolved since you have to purposely enable intel_iommu to make the bug
> come back.

I'm glad to hear that!

> I will mark the bug as "Fix Release" since it is resolved in the latest
> Precise kernel.

Nevertheless, this bug was big fun to work on. It has been a pleasure to
work with you here :-)

--
Gruesse/greetings,
Reinhard Tartler, KeyID 945348A4

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I enjoyed working with you as well, Reinhard!

To post a comment you must log in.