[intrepid] becomes unresponsive after some time; unexpected IRQ trap? (Dell Latitude D430)

Bug #253089 reported by Martin Pitt
20
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

With the intrepid kernel (2.6.26-4), my system becomes very sluggish after some hours of working with it. Typing in terminals feels like tar, and every key stroke takes about 0.5 seconds response time. Also, shutdown becomes veeery slow and most of the time just hangs. OTOH, firefox works just fine.

I noticed that after a while dmesg has several dozen repeats of the following messages:

[20563.433058] ACPI: PCI Interrupt 0000:0c:00.0[A] -> GSI 17 (level, low) -> IRQ 17
[20563.433058] PM: Writing back config space on device 0000:0c:00.0 at offset 1 (was 100002, writing 100006)
[20563.433058] iwl3945: Radio disabled by HW RF Kill switch
[20563.433058] ACPI: PCI interrupt for device 0000:0c:00.0 disabled
[20714.959047] ACPI: PCI Interrupt 0000:0c:00.0[A] -> GSI 17 (level, low) -> IRQ 17
[20714.959047] PM: Writing back config space on device 0000:0c:00.0 at offset 1 (was 100002, writing 100006)
[20714.959047] iwl3945: Radio disabled by HW RF Kill switch
[20714.959047] ACPI: PCI interrupt for device 0000:0c:00.0 disabled
[20725.262003] irq 219, desc: c0478c00, depth: 0, count: 0, unhandled: 0
[20725.262003] ->handle_irq(): c016aef0, handle_bad_irq+0x0/0x2a0
[20725.262003] ->chip(): c044d8c0, no_irq_chip+0x0/0x40
[20725.262003] ->action(): 00000000
[20725.262003] IRQ_DISABLED set
[20725.262003] IRQ_MASKED set
[20725.262003] unexpected IRQ trap at vector db

I cannot confirm whether these messages start when the system becomes sluggish. I'll put some more attention to that.

This does not happen at all when running Intrepid with the Hardy kernel, the system runs happily for days, through suspends and hibernates, etc.

Revision history for this message
Martin Pitt (pitti) wrote :

Oh, for the record, this is a Dell Latitude D430, running i386 hardy. System is docked, with the wifi turned off with the killswitch (as dmesg says correctly).

top etc. show that the CPU is at 0% while that happens, so there is no specific process blocking CPU or I/O.

Revision history for this message
Chris Coulson (chrisccoulson) wrote :

Hmmm, a quick Google search reveals somebody else suffering with sporadic system hangs in Debian Lenny, and they have the same messages in their dmesg output as you. The page is http://www.linuxforen.de/forums/showthread.php?p=1659644 (German)

I couldn't find anything on the Debian bug-tracker though.

Revision history for this message
Martin Pitt (pitti) wrote :

Thank you for that hint! I tried with "pci=nomsi" and haven't noticed any slowdown so far.

Now I just get this repeatedly:

[10781.427102] ACPI: PCI Interrupt 0000:0c:00.0[A] -> GSI 17 (level, low) -> IRQ 17
[10781.427294] PM: Writing back config space on device 0000:0c:00.0 at offset 1 (was 100002, writing 100006)
[10781.427384] iwl3945: Radio disabled by HW RF Kill switch
[10781.427411] ACPI: PCI interrupt for device 0000:0c:00.0 disabled

The device in the first line is my Intel 3945 wifi card. So this blurb just seems to happen from time to time without any noticeable harm.

So it seems the third block of messages is the actual offender here, it also starts much later than the iwl3945 spewage:

[20725.262003] irq 219, desc: c0478c00, depth: 0, count: 0, unhandled: 0
[20725.262003] ->handle_irq(): c016aef0, handle_bad_irq+0x0/0x2a0
[20725.262003] ->chip(): c044d8c0, no_irq_chip+0x0/0x40
[20725.262003] ->action(): 00000000
[20725.262003] IRQ_DISABLED set
[20725.262003] IRQ_MASKED set
[20725.262003] unexpected IRQ trap at vector db

So I can live with the "pci=nomsi" workaround, but ideally we can fix this by default?

Changed in linux:
status: New → Confirmed
Revision history for this message
Chris Coulson (chrisccoulson) wrote :

I agree! Could you also please attach the information requested in https://wiki.ubuntu.com/KernelTeamBugPolicies

Thanks

Revision history for this message
Martin Pitt (pitti) wrote :
Revision history for this message
Martin Pitt (pitti) wrote :
Revision history for this message
Martin Pitt (pitti) wrote :
Revision history for this message
Martin Pitt (pitti) wrote :

Please note that this is the dmesg *with* pci=nomsi, where the slowdown doesn't occur. I already gave the relevant dmesg snippet for the default case.

Please let me know if you need anything else. Thanks!

Changed in linux:
assignee: nobody → ubuntu-kernel-team
Revision history for this message
Chris Coulson (chrisccoulson) wrote :

Martin,

I've just thought - it might also be useful to attach the output of "cat /proc/interrupts" and "sudo dmidecode" as well.

Thanks

Revision history for this message
Martin Pitt (pitti) wrote :
Revision history for this message
Martin Pitt (pitti) wrote :
Revision history for this message
Martin Pitt (pitti) wrote :

This is /proc/interrupts right after a fresh boot, without a pci= parameter.

Revision history for this message
Martin Pitt (pitti) wrote :

Now I got it again, first time after 5 hours 40 minutes uptime:

[19486.431597] irq 219, desc: c0481c00, depth: 0, count: 0, unhandled: 0
[19486.431597] ->handle_irq(): c016f730, handle_bad_irq+0x0/0x2a0
[19486.431597] ->chip(): c04562c0, no_irq_chip+0x0/0x40
[19486.431597] ->action(): 00000000
[19486.431597] IRQ_DISABLED set
[19486.431597] IRQ_MASKED set
[19486.431597] unexpected IRQ trap at vector db

I have done archive admin all the day, i. e. just worked with ssh, firefox, and gnome-terminal. The only thing I changed over the lunch break was to switch off my monitor.

I attach my current /proc/interrupts now. Mostly the diff is just higher counts (which is to be expected), but there's one additional line:

  219: 8 0 none-edge

which seems to correspond to the "unexpected IRQ trap":

$ dmesg |grep "unexpected IRQ trap"|wc -l
8

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Martin Pitt (pitti) wrote :

Seems to be fixed in 2.6.27.

Changed in linux:
status: Confirmed → Fix Released
Revision history for this message
richiek (rickh746) wrote :

Brilliant! Thanks for letting us know Martin.

I've been battling with this problem recently (running Debian Lenny on my Dell Precision M90 laptop) and I'm just about at my wits end dealing with my computer always hanging after a few hours of uptime. All the problems I've been having are pretty much as is pointed out in this thread (different IRQ number[217] but otherwise exactly the same). It's so nice to know I'm not the only on experiencing them.

Does anyone know (approx.) how soon 2.6.27 will be released? I can't wait to try it out. I see on kernel.org that it's at rc5, but I'm not sure how many more candidates to expect before an official release.

Revision history for this message
richiek (rickh746) wrote :

I forgot to mention...
Interestingly, I first noticed this problem after running a recently compiled 2.6.26 kernel. Thinking it was a badly configured kernel (I did go a bit crazy cutting back the options) I reverted back to my previously stable kernel build, version 2.6.25.9, and the problems continued!
This makes me think, is it possible that this problem could have been inflamed by some recently updated package?

Revision history for this message
Jamie Lokier (jamie-shareable) wrote :

Just want to comment that I'm using kernel 2.6.27-2.3 (latest on 2008-09-09) and I am still seeing these sort of symptoms, and specifically these messages occur before it goes strange:

Sep 9 01:28:46 amilo kernel: [20443.532290] irq 219, desc: c04bbb00, depth: 0, count: 0, unhandled: 0
Sep 9 01:28:46 amilo kernel: [20443.532302] ->handle_irq(): c0176a10, handle_bad_irq+0x0/0x2a0
Sep 9 01:28:46 amilo kernel: [20443.532316] ->chip(): c048f0a0, no_irq_chip+0x0/0x40
Sep 9 01:28:46 amilo kernel: [20443.532325] ->action(): 00000000
Sep 9 01:28:46 amilo kernel: [20443.532329] IRQ_DISABLED set
Sep 9 01:28:46 amilo kernel: [20443.532332] IRQ_MASKED set
Sep 9 01:28:46 amilo kernel: [20443.532336] unexpected IRQ trap at vector db

I am not running Intrepid, but Hardy with an Intrepid kernel, libc and couple of other packages. The reason is that several bugs I'm subscribed to suggest I try the Intrepid kernel and report back if they solve other problems. (So far the new kernel causes quite a lot of new problems...).

My laptop is a Fujitsu-Siemens Amilo Si-1520, Core Due (32-bit) 2.0GHz, 2.5MB RAM.

I don't know if it's relevant, but on booting I always get this message with the new kernel:

    ACPI: EC: GPE storm detected, disabling EC GPE

Some other places suggest it's due to a flaw in Acer laptops, but this is not an Acer laptop.

I've attached the output of /proc/interrupts, dmidecode, lspci -v and lsusb -v in case they are useful.
They were made about 10 minutes after the strange behaviour and kernel message about the interrupt started.

Revision history for this message
Jamie Lokier (jamie-shareable) wrote :
Revision history for this message
Jamie Lokier (jamie-shareable) wrote :
Revision history for this message
Jamie Lokier (jamie-shareable) wrote :
Revision history for this message
Jamie Lokier (jamie-shareable) wrote :

After the message about "unexpected IRQ trap at vector db" (it's always this number),
things go oddly sluggish, and in particular, when typing often the last few things typed won't
appear until _another_ key is pressed. When doing "cat some_file.txt", sometimes that won't scroll
to the end until a key is pressed (this is in Gnome Terminal.), or until a second or two have passed.
Auto-repeat has entirely stopped working. It behaves as though some interrupts aren't being serviced until something else occurs to trigger their servicing.

I have also noticed that about 50% of boots lock up when the progress bar is about 20% (I haven't looked at it with nosplash), and occasionally the machine locks up, and sometimes SysRq-B fails to reboot from a normal working state.

When none of these things happen, normal operation seems much the same as Hardy's 2.6.24-19 kernel, i.e. fine.

(Except sound recording stopped working, compared with the Hardy 2.6.24-19 kernel where it's fine, but that's probably unrelated).

I'm not using any special boot options, just "root=/dev/mapper/vg0-ubuntu ro quiet splash".

Revision history for this message
Martin Pitt (pitti) wrote :

Reopening then. But as I said, it doesn't happen to me any more, so I can't help with further debugging.

Changed in linux:
status: Fix Released → Confirmed
Revision history for this message
Chris Coulson (chrisccoulson) wrote :

James,

You say that you are running the Intrepid kernel on Hardy. As this is something that should be reproducible from the live CD environment, I would appreciate it if you could try running the Intrepid Alpha 5 live CD, to see if you still get the same problem (I suspect you probably will).

Thanks

Changed in linux:
status: Confirmed → Incomplete
Revision history for this message
Jamie Lokier (jamie-shareable) wrote : Re: [Bug 253089] Re: [intrepid] becomes unresponsive after some time; unexpected IRQ trap? (Dell Latitude D430)

Chris Coulson wrote:
> You say that you are running the Intrepid kernel on Hardy. As this is
> something that should be reproducible from the live CD environment, I
> would appreciate it if you could try running the Intrepid Alpha 5 live
> CD, to see if you still get the same problem (I suspect you probably
> will).

Chris, thanks for giving this bug some attention.

I'm not sure how to go about your suggestion.

The problem occurred twice, but in both cases took many hours to
manifest while I was using the laptop busily, and I don't see how I
can get useful work done will running from the live CD. This isn't a
spare computer, it's my work laptop.

Also, I've stopped running the Intrepid kernel with the Gutsy install,
because there are other regressions getting in the way of useful work
- 3G-over-bluetooth internet connectivity has stopped working, and so
has audio in.

I'm happy to follow instructions and send further bug reports, but I
don't have much "spare" time to try things. It's unfortunate that
this bug doesn't manifest quickly.

Any suggestions?

Thanks,
-- Jamie

Revision history for this message
Jamie Lokier (jamie-shareable) wrote :

Jamie Lokier wrote:
> Also, I've stopped running the Intrepid kernel with the Gutsy install,

Slight typo there - it's a Hardy install.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Jamie,

I'm curious if booting with the 'pci=nomsi' workaround Martin mentioned works for you? You may want to also give the 2.6.27-3 Intrepid kernel a try as it was rebased with the upstream 2.6.27-rc6 kernel. I also think it might be better if you open a separate bug report since Martin, the original bug reporter, has commented this is resolved for him with 2.6.27. It'll be easier for the kernel team to debug if they don't have to sift through logs/comments which don't relate to the issue you are seeing and additionally Martin won't be sent messages about a bug that is no longer a bug for him. Thanks.

Revision history for this message
Juha Heinanen (jh-tutpro) wrote :

My Dell E1405 running latest Ubuntu Hardy kernel 2.6.24-21-generic #1 SMP suffered from this unresponsiveness problem until today when i found this thread and added pci=nomsi to kernel boot parameters. After that my system has been running fine. Let me know if you want any other tests to be done to fix the problem.

-- Juha

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Juha,

If you could test the latest daily Intrepid image and verify if you still need to use pci=nomsi or not that would be great. The final Intrepid Ibex is set to be released at the end of this month but daily images can be found at http://cdimage.ubuntu.com/daily-live/current/ . Any additional feedback we can get from you would be much appreciated. Thanks.

Revision history for this message
Juha Heinanen (jh-tutpro) wrote : [Bug 253089] Re: [intrepid] becomes unresponsive after some time; unexpected IRQ trap? (Dell Latitude D430)

Leann Ogasawara writes:

 > If you could test the latest daily Intrepid image and verify if you
 > still need to use pci=nomsi or not that would be great. The final
 > Intrepid Ibex is set to be released at the end of this month but daily
 > images can be found at http://cdimage.ubuntu.com/daily-live/current/
 > .

i'm running hardy, because it is a long term release. so i cannot
easily upgrade this host to intrepid, because i would also need to get
it somehow back to hardy.

let me know if there are any means to do tests without distribution
upgrade.

-- juha

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

You should be able to test using a LiveCD which will not affect your current installation. Images can be found at the link I referenced above. Thanks.

Revision history for this message
Juha Heinanen (jh-tutpro) wrote :

Leann Ogasawara writes:

 > You should be able to test using a LiveCD which will not affect your
 > current installation. Images can be found at the link I referenced
 > above. Thanks.

i can try live cd, but if i have understood correctly the live cd
business, it will not allow me to run my current application/data
environment on the pc that includes e17 windowing system, but a fresh
ubuntu environment.

the problem with this irq problem is that it does not appear
immediately, but usually after one/two hours work on the pc. i cannot
simulate such session in live cd environment without access to my
applications and data.

-- juha

Revision history for this message
Richard Khoury (richiek) wrote :

I have to agree with Leann on this; it just doesn't feel like the same thing.

I happened to try the LiveCD last night but it was far from a normal session. In any case, I did play around with it for a couple of hours, and then left it idling for a few hours more. In total, Intrepid was probably running for around 5 hours and the problem that I normally have didn't present itself.

Just to recap, my system is a laptop: Dell Precision M90. I normally run Debian unstable and I have recently built kernel 2.6.27 but the problem still continued without pci=nomsi in the kernel args.

Revision history for this message
Richard Khoury (richiek) wrote :
Revision history for this message
Richard Khoury (richiek) wrote :
Revision history for this message
Richard Khoury (richiek) wrote :

(Sorry for any confusion, I was actually referring to Juha's post, not Leann's one)

Revision history for this message
Steve Beattie (sbeattie) wrote :

This bug was reported in the Intrepid development cycle; removing regression-potential and marking as regression-release.

Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Just curious if anyone has been able to test and confirm if this issue remains with the latest Jaunty Jackalope 9.04 release which contains a newer 2.6.28 based kernel - http://www.ubuntu.com/getubuntu/download . Please let us know your results. Thanks.

Revision history for this message
Martin Pitt (pitti) wrote :

I haven't seen this problem in ages. Closing, thanks for the reminder.

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.