laptop-mode/IDE-APM hang on various laptops

Bug #12483 reported by Justin Mason
86
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-source-2.6.15 (Debian)
Fix Released
Unknown
linux-source-2.6.15 (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

This bug report describes a system hang/freeze which occurs on battery power, apparently related to IDE power management settings and/or the Linux kernel's "laptop mode". It seems to occur at unpredictable intervals, and results in a completely hung system with the hard drive activity light illuminated.

Revision history for this message
Matthew Garrett (mjg59) wrote :

Could you please attach the output of dmesg, the /var/log/dmesg file and the
contents of /proc/interrupts ?

Revision history for this message
Justin Mason (jm-ubuntu) wrote :

Created an attachment (id=1243)
dmesg output

Revision history for this message
Justin Mason (jm-ubuntu) wrote :

Created an attachment (id=1244)
contents of /var/log/dmesg

Revision history for this message
Justin Mason (jm-ubuntu) wrote :

Created an attachment (id=1245)
contents of /proc/interrupts

Revision history for this message
Justin Mason (jm-ubuntu) wrote :

regarding interrupts: it's worth noting that I've tried swapping around the
listed PCI->interrupt mappings in the Thinkpad BIOS, since that did seem to be a
frequent cause of reported TP hangs. It made no difference -- at least when I
was simply aiming for a wide coverage across all interrupt levels to minimize
conflicts, without aiming to resolve any particular conflict.

Revision history for this message
Justin Mason (jm-ubuntu) wrote :

oh -- another thing; this issue occurs with the following versions of madwifi
CVS, in addition to the universe package:

  - madwifi CVS of May 28 2004, with vanilla 2.6.6 kernel
  - madwifi CVS of Jan 15 2005, with vanilla 2.6.10 kernel

Revision history for this message
Matthew Garrett (mjg59) wrote :

Ok. Does this happen if you disable your iptables configuration?

Revision history for this message
Justin Mason (jm-ubuntu) wrote :

I haven't tried that -- I'll give it a go.

Revision history for this message
Justin Mason (jm-ubuntu) wrote :

after a couple of test crashes last night; yes, it does happen with ip_tables
unloaded.

It also happens if I run "sudo laptop-mode start" while on AC power, let the
machine become idle, then trigger some disk activity.

It definitely seems to be caused by laptop-mode.

Revision history for this message
Justin Mason (jm-ubuntu) wrote :

'It definitely seems to be caused by laptop-mode.'

sorry -- correction -- it definitely seems to be caused by a *combination of*
laptop-mode and madwifi. (I've just verified that with the madwifi modules
unloaded, and laptop-mode active, I get no crashes.)

Revision history for this message
Reinhard Tartler (siretart) wrote :

I can confirm this bug! Since I disabled laptop-mode in /etc/default/laptop-mode
I had no hard hangs described above.

Hardware: Thinkpad R40 2722 B3G, with
Atheros AR5212 802.11abg NIC

Revision history for this message
Matt Zimmerman (mdz) wrote :

Since this doesn't seem to affect other wireless drivers, it's probably a bug in
madwifi.

Are you able to get a trace from the hang?

Please try http://www.ubuntulinux.org/wiki/DebuggingSystemCrash

Revision history for this message
Justin Mason (jm-ubuntu) wrote :

nope; I've tried, and it's a hard hang, magic-SysRq doesn't work. I've tried
using an NMI watchdog, but that doesn't seem to work on my hardware (Thinkpad T40).

I've tried nmi_watchdog=1 and nmi_watchdog=2, without luck (the interrupt
counters don't increment). I'll retry if you have any tips, though.

Revision history for this message
Justin Mason (jm-ubuntu) wrote :

new info; a user on the linux-thinkpad list reported crashes when running on
battery (with ACPI instead of APM, and no mention of wifi however). They
reported that upgrading the BIOS fixed it for them.

I've given that a try, and am now running the latest T40 BIOS and Embedded
Controller Program (BIOS v3.14, Controller v3.04) -- no luck, it still crashes
when laptop-mode and wifi are active simultaneously. so this bug is still open...

Revision history for this message
Matt Zimmerman (mdz) wrote :

I'm seeing this on my Thinkpad T42 using ipw2200 (not madwifi). I've disabled
laptop-mode for now under the assumption that this will fix it. Should I try
upgrading the BIOS, or is there some information that I could help gather?

The system hangs solid: not even magic sysrq works. The hard disk light is
solid on, and only a power cycle can recover.

Revision history for this message
Justin Mason (jm-ubuntu) wrote :

Matt, that's certainly the same symptoms I see; sounds like the same bug, alright!

As I noted in my last comment, I tried a BIOS upgrade without it making any
difference, so I wouldn't recommend that.

Revision history for this message
Jeff Waugh (jdub) wrote :

I'm seeing the same symptoms on my Dell X300, which is virtually the same
hardware as the Thinkpads (Celeron, ipw2200). I can't see any particular
behaviour associated with it though, because it generally happens when I've
walked away from the computer for a while. Every now and then it happens within
minutes of using it. I will turn off laptop-mode and see what happens.

Revision history for this message
Cliff Rowley (cliffrowley) wrote :

Hi. I'm actually a Gentoo user searching for information regarding swsusp2
hangs. I also have an IBM T40 with Gentoo installed, and I am experiencing a
complete system hang with the described symptoms when on DC. Just FYI. I will
do some testing tomorrow and post back with any results.

Revision history for this message
Cliff Rowley (cliffrowley) wrote :

OK, as promised I did a little testing - and it appears that I still hang
irrespective of whether laptop mode is enabled or not as soon as I switch from
AC to DC. I wonder what on earth could be causing this..

Revision history for this message
Mark Kohler (mkohler) wrote :

I'm seeing this bug on a Thinkpad R51 (MadWifi) with Hoary. I haven't timed it,
but it seems like about 5-20 minutes after switching from AC to battery, I will
get a solid hang with the hard drive light on. I'm not certain if it happens
every time I make that transition, but I've never had it happen if I've booted
up with the battery and stayed there. I haven't been able to find anything in
the logs that correlates with the hang.

Revision history for this message
Chuck Short (zulcss) wrote :

What does the log file say after the rebooot?

Revision history for this message
Chris Kühl (blixtra) wrote :

I'd just like to confirm another instance of this bug on a new T43 (2668-75U).
My results are however a little different.

After finding this bug report, I got rid of laptop-mode but after a reboot the
crash still happened about 15 minutes in. I then removed the restricted modules
and haven't experienced the crash in the last 5 hours which is MUCH longer than
ever before so I'm assuming it's OK now except of course that I have no
wireless. :-(

I'll leave it like this today and tomorrow reinstall laptop-mode to see if
something happens there.

It seems to me it's the madwifi drivers on certian laptops. Up until a week ago
I was running my HP ze4125 with a atheros PCMCIA card without problems (Same
install disk and software selection).

This is all on Hoary w/ updates running on AC power. Haven't tried it solely on
battery.

Will report back.

Revision history for this message
Mark Kohler (mkohler) wrote :

I've done some more experimenting, and I only see the hang if I'm using the
madwifi interface, and switch to battery power.

Revision history for this message
Matt Zimmerman (mdz) wrote :

Some of you seem to be experiencing a different problem. If you are seeing a
hang which is not related to laptop-mode, please file a separate bug with a
clear description of the problem. The problem described in this bug has been
confirmed to be solved by disabling laptop-mode, and has been observed without
the use of any particular wireless driver.

Revision history for this message
Matt Zimmerman (mdz) wrote :

*** Bug 17455 has been marked as a duplicate of this bug. ***

Revision history for this message
Matt Zimmerman (mdz) wrote :

Kernel team: how can we go about debugging this problem? The system seems to be
so thoroughly wedged that we can't get useful information out of it at the time.

Revision history for this message
Chris Kühl (blixtra) wrote :

(In reply to comment #22)
> I'd just like to confirm another instance of this bug on a new T43 (2668-75U).
> My results are however a little different.
>
> After finding this bug report, I got rid of laptop-mode but after a reboot the
> crash still happened about 15 minutes in. I then removed the restricted modules
> and haven't experienced the crash in the last 5 hours which is MUCH longer than
> ever before so I'm assuming it's OK now except of course that I have no
> wireless. :-(
>
> I'll leave it like this today and tomorrow reinstall laptop-mode to see if
> something happens there.
>
> It seems to me it's the madwifi drivers on certian laptops. Up until a week ago
> I was running my HP ze4125 with a atheros PCMCIA card without problems (Same
> install disk and software selection).
>
> This is all on Hoary w/ updates running on AC power. Haven't tried it solely on
> battery.
>
> Will report back.

UPDATE: It seems that my problem is related to bug #16873 instead. Haven't had a
freezy since switching to vesa driver.

Revision history for this message
Matt Zimmerman (mdz) wrote :

*** Bug 17911 has been marked as a duplicate of this bug. ***

Revision history for this message
Matthew Garrett (mjg59) wrote :

Can people try booting with the

nmi_watchdog=1

kernel parameter? If it's a kernel deadlock rather than a hardware issue, this
may trigger some error reporting on failure - an nmi should be fired some time
after the machine hangs, and if it isn't dealt with the kernel should report a
lockup. On the other hand, it could be an ATI issue - is everyone having trouble
running on ATI hardware?

Revision history for this message
Matt Zimmerman (mdz) wrote :

0000:01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility M7
LW [Radeon Mobility 7500]

Revision history for this message
Justin Mason (jm-ubuntu) wrote :

I already tried nmi_watchdog=1 and =2, without luck -- the NMI counter in
/proc/interrupts never changed from 0, indicating that I 'probably have a
processor that needs to be
added to the nmi code' according to Documentation/nmi_watchdog.txt. so NMIs
didn't help.

yep, I'm using ATI hardware btw:

0000:01:00.0 VGA compatible controller: ATI Technologies Inc Radeon R250 Lf [Radeon
 Mobility 9000 M9] (rev 02) (prog-if 00 [VGA])

Revision history for this message
Jeff Waugh (jdub) wrote :

I'm using an Intel i855.

Revision history for this message
Bart Samwel (bart-samwel) wrote :

Hi,

I'm the maintainer of laptop mode tools, and I've recently worked with someone
who had similar problems, on Gentoo in this case. It was also a Thinkpad. I
think this one had a freeze while the hard drive was spinning down, the drive
was apparently dropping I/O requests during spindown because the kernel
complained about DMA timeouts just before freezing:

hda: dma_timer_expiry: dma status == 0x21
hda: DMA timeout error

(or something similar to that -- order may have been different)

He's now trying it out with the laptop-mode-tools hdparm -B handling disabled,
and that seems to work. Could anyone here confirm if that fixes their problem?

Revision history for this message
Justin Mason (jm-ubuntu) wrote :

btw, I've never seen any interesting hard-disk-related messages like those DMA
ones. it happens silently in my experience, without useful logs.

Revision history for this message
Alex Hudson (bugs-alexhudson) wrote :

(In reply to comment #34)
> btw, I've never seen any interesting hard-disk-related messages like those DMA
> ones. it happens silently in my experience, without useful logs.

I've never had them either, *but* the hangs always happen when I expect disk
activity to start again - e.g., saving a file, opening a file not in the cache, etc.

Bart: could you make it clear what you're asking people to try? Thanks.

Revision history for this message
Justin Mason (jm-ubuntu) wrote :

'I've never had them either, *but* the hangs always happen when I expect disk
activity to start again - e.g., saving a file, opening a file not in the cache,
etc.'

yeah, 100% agreed on that point, FWIW.

Revision history for this message
Bart Samwel (bart-samwel) wrote :

OK, here's what I want you to try. If you're running laptop-mode-tools, edit
/etc/laptop-mode/laptop-mode.conf and set CONTROL_HD_POWERMGMT (version 1.07 and
higher) or DO_HDPARM_POWERMGMT (version 1.06 and lower) to 0. Wait and see if
the crashes still happen. If you're running some older laptop mode scripts
instead of laptop mode tools, edit /usr/sbin/laptop_mode and comment out the
lines that do "hdparm -B" (or, upgrade to laptop-mode-tools :) ). In this case,
again, wait and see if the crashes still happen.

Revision history for this message
Matt Zimmerman (mdz) wrote :

(In reply to comment #37)
> OK, here's what I want you to try. If you're running laptop-mode-tools, edit
> /etc/laptop-mode/laptop-mode.conf and set CONTROL_HD_POWERMGMT (version 1.07 and
> higher) or DO_HDPARM_POWERMGMT (version 1.06 and lower) to 0. Wait and see if
> the crashes still happen. If you're running some older laptop mode scripts
> instead of laptop mode tools, edit /usr/sbin/laptop_mode and comment out the
> lines that do "hdparm -B" (or, upgrade to laptop-mode-tools :) ). In this case,
> again, wait and see if the crashes still happen.

In Ubuntu terms, this means editing /etc/acpi/power.sh and commenting out these
two lines:

        $HDPARM -S 12 /dev/hda
        $HDPARM -B 1 /dev/hda

Revision history for this message
Bart Samwel (bart-samwel) wrote :

> In Ubuntu terms, this means editing /etc/acpi/power.sh and commenting out these
> two lines:

> $HDPARM -S 12 /dev/hda
> $HDPARM -B 1 /dev/hda

No, comment out only the -B line! The -S line sets the idle timeout (which
causes the drive to spin down), while the -B line sets an aggressive power
management mode (which, in our case, turned out to cause system freezes).

Revision history for this message
Matthew Garrett (mjg59) wrote :

Have this helped for anyone?

59 comments hidden view all 139 comments
Revision history for this message
Matthew Garrett (mjg59) wrote :

acpi-support-0.46 disables this by default - you can reenable it in
/etc/default/acpi-support

Revision history for this message
Matt Zimmerman (mdz) wrote :

Adjusting severity and milestone accordingly.

Revision history for this message
Ben Collins (ben-collins) wrote :

This bug has been flagged because it is old and possibly inactive. It may or may
not be fixed in the latest release (Breezy Badger 5.10). It is being marked as
"NEEDSINFO". In two weeks time, if the bug is not updated back to "NEW" and
validated against Breezy, it will be closed.

This is needed in order to help manage the current bug list for the kernel. We
would like to fix all bugs, but need users to test and help with debugging.

If this change was in error for this bug, please respond and make the
appropriate change (or email <email address hidden> if you cannot make the
change).

Thanks for your help.

Revision history for this message
Bart Samwel (bart-samwel) wrote :

A recent thread on LKML eventually traced the madwifi problem down to a bug in
the ipw2200 driver (version 1.0.8):

http://lkml.org/lkml/2005/11/16/223

The ipw2200 bug:

http://bughost.org/bugzilla/show_bug.cgi?id=821

Revision history for this message
Ben Collins (ben-collins) wrote :

Based on the bughost.org reports, assuming fixed in dapper.

Revision history for this message
Matthew Garrett (mjg59) wrote :

We've seen the problem on machines that weren't running 1.0.8 of the ipw2200
driver, so that's not the problem.

Revision history for this message
Ben Collins (ben-collins) wrote :

If possible, please upgrade to Dapper's 2.6.15-8 kernel. If you do not want to
upgrade to Dapper, then you can also wait for the Dapper Flight 2 CD's, which
are due out within the next few days.

Let me know if this bug still exists with this kernel.

Revision history for this message
Alistair Phipps (alistairphipps) wrote :

Still occurs with vanilla kernel 2.6.15-rc5.

Revision history for this message
Ben Collins (ben-collins) wrote :

(In reply to comment #107)
> Still occurs with vanilla kernel 2.6.15-rc5.

I'd much prefer a test with Flight 2, just to make sure.

Thanks

Revision history for this message
Christian Elkjaer (c.elkjaer) wrote :

I have been using Dapper from day 1 for daily work and I have not been able to
reproduce this bug for a long time.. until today. Noticeable Laptop-mode is
disabled according to /etc/default/acpi-support.

I run Dapper Drake Flight 1 + daily updates on my Thinkpad T41.

Revision history for this message
Ben Collins (ben-collins) wrote :

(In reply to comment #109)
> I have been using Dapper from day 1 for daily work and I have not been able to
> reproduce this bug for a long time.. until today. Noticeable Laptop-mode is
> disabled according to /etc/default/acpi-support.
>
> I run Dapper Drake Flight 1 + daily updates on my Thinkpad T41.

So you are running 2.6.15-9 kernel?

Revision history for this message
Christian Elkjaer (c.elkjaer) wrote :

Sorry for having forgotten to answer you back then in December. At present I cannot tell which kernel I used back then but for sure it was the current Dapper kernel at that time.

After re-installing and testing Dapper Flight 3 for stability for a week or more I have not experienced even a single hang. That is good news though it would be comforting to know what makes the difference.

Revision history for this message
Ben Collins (ben-collins) wrote :

Likely it was any one of various updates to the kernel during that time.

Glad to hear it is fixed.

Changed in linux-source-2.6.15:
status: Confirmed → Fix Released
Revision history for this message
Matt Zimmerman (mdz) wrote :

I don't think we can infer that this bug was fixed; we worked around it for breezy by disabling laptop-mode:

acpi-support (0.46) breezy; urgency=low

  * Add some extra machines to the whitelists
  * Depend on powermgmt-base
  * Disable restarting irda services by default
  * Disable laptop-mode by default

 -- Matthew Garrett <email address hidden> Wed, 12 Oct 2005 12:50:11 +0100

Revision history for this message
AlexHudson (alexhudson) wrote :

I'm still seeing this bug, though only testing intermittently (with laptop mode off, I get to do real work :o)

Running kernel is 2.6.15-14-686, I don't use the ATi binary drivers, I have ipw2200 but can remove that (wep140 is currently broken, it seems, so I'm using the e1000). IBM Tpad R51. I'm willing to do whatever tests are asked for.

Matt Zimmerman (mdz)
Changed in linux-source-2.6.15:
status: Fix Released → Confirmed
Matt Zimmerman (mdz)
description: updated
Revision history for this message
Christian Elkjaer (c.elkjaer) wrote :

After doing the daily update of my Dapper Flight 4 install today I immediately saw this bug. In fact twice within an hour.

Have not seen it since pre-Dapper Flight 3, so it really surprised me.

Revision history for this message
Matt Zimmerman (mdz) wrote :

That's because laptop-mode was re-enabled in Dapper:

acpi-support (0.57) dapper; urgency=low

  * Re-enable laptop-mode

 -- Matthew Garrett <email address hidden> Thu, 16 Feb 2006 00:37:52 +0000

I'm still seeing this bug as well with laptop-mode enabled. Matthew, is there useful debugging to do or are we forced to disable this again?

Revision history for this message
Empien (empien) wrote :

FYI, I am seeing this problem too and it's fairly easy to recreate - on IBM T40 with latest Dapper (up-to-date as of today), kernel 2.6.15-14-686 and acpi-support (0.59).

Revision history for this message
Christian Elkjaer (c.elkjaer) wrote :

Just to let you know..

I am testing the beta release of Dapper on my laptop (clean re-install on a Thinkpad T41) and within half an hour I saw this bug. Consequently I may have to disable laptop-mode manually as I did with Dapper Flight 4, 5 and 6 to be able to work.

Perhaps it should be considered disabling laptop-mode for the final release of Dapper as with Breezy if no solutions are coming up.

Revision history for this message
Christian Elkjaer (c.elkjaer) wrote :

I forgot to add:

I am talking about the kernel 2.6.15-20-386 and acpi-support version 0.71.

Revision history for this message
Matt Zimmerman (mdz) wrote : Re: [Bug 12483] Re: laptop-mode/IDE-APM hang on various laptops

On Sat, Apr 22, 2006 at 12:11:22PM -0000, Christian Elkjaer wrote:
> Just to let you know..
>
> I am testing the beta release of Dapper on my laptop (clean re-install on
> a Thinkpad T41) and within half an hour I saw this bug. Consequently I may
> have to disable laptop-mode manually as I did with Dapper Flight 4, 5 and
> 6 to be able to work.
>
> Perhaps it should be considered disabling laptop-mode for the final
> release of Dapper as with Breezy if no solutions are coming up.

Eek, I didn't realize this was still enabled by default. I disabled it
locally ages ago, of course.

Disabled for now:

acpi-support (0.72) dapper; urgency=low

  * Re-disable laptop-mode by default, to work around LP#12483
    - Has the pleasant side effect of making the comment match the code

 -- Matt Zimmerman <email address hidden> Sat, 22 Apr 2006 05:34:12 -0700

If someone is able to isolate which hardware configurations are affected, we
can be more selective, but until then (or until the underlying bug is
fixed), I don't see that we have a choice.

--
 - mdz

Revision history for this message
Bart Samwel (bart-samwel) wrote :
Download full text (4.2 KiB)

Hi guys,

I went through the whole thing once more, and I tried to collect all of the information in one small overview, so we could see new patterns. For laptops that I didn't know, I've collected the wireless card info from the net, I've marked these entries with "###". There seem to have been measurement errors and other bugs interfering (such as the ATI XOrg driver issue and the ipw2200 1.0.8 bug), but the big picture is pretty clear: there is a good chance that there is ipw2100/2200 hardware in all of the affected boxes.

I'm not entirely sure, but does IPW2100/2200 imply a Centrino(-like) laptop or can it exist without Centrino hardware? If it can only exist as part of a full centrino laptop then the problem may lie anywhere in the Centrino hardware. An Intel HD controller chipset issue perhaps?

BASIC SYMPTOMS:

- Crash after going into battery mode
- HD light stays on

HARDWARE OBSERVATIONS:

*** Thinkpad R51 (bug #11168, #17911)
  - ### ipw2100
  - ### mobility radeon 9000

*** Thinkpad R51 (Mark Kohler in bug #12483)

  - madwifi
  - Not solved by disabling "hdparm -B". (Or is it? see message 24-09-2005 06:56:02 UTC)
  - ### mobility radeon 9000

*** Thinkpad R51 (AlexHudson in bug #12483)

  - ipw2200
  - ### mobility radeon 9000

*** Thinkpad T40 (model 2379, Justin Mason, bug #12483)

  - madwifi
  - Vanilla 2.6.6 kernel + madwifi May 28, 2004
  - Vanilla 2.6.10 kernel + madwifi Jan 15, 2005
  - ATI Technologies Inc Radeon R250 Lf [Radeon Mobility 9000 M9] (rev 02) (prog-if 00 [VGA])
  - Disabling "hdparm -B" solves this for Justin.

*** Thinkpad T40 (Manoj Naik in bug #12483)

  - Not solved by disabling "hdparm -B", apparently (see msg dated 12-03-2006)
  - ### T40 can be delivered with Atheros (madwifi or airo drivers) or ipw2100

*** Thinkpad R40 (model 2722 B3G with Atheros AR5212 802.11abg NIC, Reihard Tartler, in bug #12483)

  - ### ati mobility radeon 7500
  - ### ipw2100

*** Thinkpad T41 (Christian Elkjaert in bug #12483)

  - ipw2100
  - radeon mobility 7500
  - Disabling "hdparm -B" does not resolve the crashes.
  - Disabling laptop mode does not _completely_ resolve the crashes (see msg in bug #12483 dated 25-12-2005), but makes them less frequent (see msg in bug #12483 dated 28-02-2006).

*** Thinkpad R42 (Matt Zimmerman, bug #12483)

  - ipw2200
  - Radeon Mobility 7500
  - can reproduce without laptop mode, with only hdparm -B1
  - can reproduce without hdparm -B, with only laptop mode

*** Dell X300 (= Celeron + ipw2200, Jeff Waugh in bug #12483)

  - Intel i855 graphics
  - ### comes with ipw2200

*** Dell 600M (Ben Maurer in bug #12483)
  - Still crashes with "hdparm -B" disabled
  - radeon 7000: switching to binary ati driver (see bug #10579) made problems less, but not disappear.
  - Has crashes also on AC -- may not be this bug at all.
  - See bug #17128 (ipw2200 instability)
  - ipw2200

*** Dell 600M (Brandon Hale in bug #12483)
  - Crashes not reproducable with "hdparm -B" disabled.
  - ### ipw2200 hardware

*** Dell Latitude D600 (bug #13957)
  - ### ipw2100 hardware

*** Acer TravelMate 620 (bug #13957)
  ...

Read more...

Revision history for this message
Empien (empien) wrote :

> Posted by Bart Samwel at 2006-04-22 14:29:48 UTC
> ... but the big picture is pretty clear: there is a good chance > that there is ipw2100/2200 hardware in all of the affected > boxes.

I am seeing this problem with T40 and Atheros (madwifi). lspci reports:
0000:02:02.0 Ethernet controller: Atheros Communications, Inc. AR5212 802.11abg NIC (rev 01)

Revision history for this message
Bart Samwel (bart-samwel) wrote :

Manoj Naik wrote:
>> Posted by Bart Samwel at 2006-04-22 14:29:48 UTC
>> ... but the big picture is pretty clear: there is a good chance > that there is ipw2100/2200 hardware in all of the affected > boxes.
>
> I am seeing this problem with T40 and Atheros (madwifi). lspci reports:
> 0000:02:02.0 Ethernet controller: Atheros Communications, Inc. AR5212 802.11abg NIC (rev 01)

Damn. :-/

What's handling IDE on that machine BTW? Intel chipset?

Revision history for this message
Empien (empien) wrote :

Bart Samwel wrote:
>> I am seeing this problem with T40 and Atheros (madwifi). lspci reports:
>> 0000:02:02.0 Ethernet controller: Atheros Communications, Inc. AR5212 802.11abg NIC (rev 01)
>
> Damn. :-/
>
> What's handling IDE on that machine BTW? Intel chipset?

0000:00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller (rev 01)

If I switch to console quickly enough (Ctrl-Alt-f1) and use one of the sysrq keys ('t' I think) before the freeze, I get a bunch of errors as follows:

ide: failed opcode was: unknown
hda: task_in_intr: status=??
hda: task_in_intr: error=??

I don't know if it's important but I'll try to note the status/error codes next time this happens.

Revision history for this message
Bart Samwel (bart-samwel) wrote :

Manoj Naik wrote:
> Bart Samwel wrote:
>>> I am seeing this problem with T40 and Atheros (madwifi). lspci reports:
>>> 0000:02:02.0 Ethernet controller: Atheros Communications, Inc. AR5212 802.11abg NIC (rev 01)
>> Damn. :-/
>>
>> What's handling IDE on that machine BTW? Intel chipset?
>
> 0000:00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller (rev 01)
>
> If I switch to console quickly enough (Ctrl-Alt-f1) and use one of the sysrq keys ('t' I think) before the freeze, I get a bunch of errors as follows:
>
> ide: failed opcode was: unknown
> hda: task_in_intr: status=??
> hda: task_in_intr: error=??
>
> I don't know if it's important but I'll try to note the status/error codes next time this happens.

That would be very interesting indeed.

BTW, AgenT reported earlier in this thread that this worked for him on a
Thinkpad with exactly that IDE controller. His lspci output was "Intel
Corp. 82801DBM (ICH4) Ultra ATA Storage Controller (rev 01)", yours said
"IDE Controller". I don't know if this could make a difference between
crashing and not crashing...

I think it would be interesting to find out what kind of IDE controllers
the other people are using. It does seem to be a HD problem, after all.
As IPW2[12]00 is Intel, it's possible that the laptops all use the same
Intel IDE controller. So, could anybody else who can reproduce this
problem please report their exact IDE controller lines/descriptions as
reported by lspci? (In fact, full lspci output might be preferable, so
that we can check out other common factors!)

Revision history for this message
Christian Elkjaer (c.elkjaer) wrote : lspci output for Thinkpad T41

lspci output for Thinkpad T41 in response to Bart Samwel's request (see post on 2006-04-22).

Revision history for this message
Mark Kohler (mkohler) wrote : lspci output for Thinkpad R51

Here is "lspci" output for the Thinkpad R51 1836-HAU, which also exhibits the IDE-APM hang.

Revision history for this message
Vincent Untz (vuntz) wrote : lspci output for Asus M6Ne

Also happens here, on my Asus M6Ne.

Revision history for this message
Matt Zimmerman (mdz) wrote : Re: [Bug 12483] Re: [Bug 12483] Re: laptop-mode/IDE-APM hang on various laptops

On Sat, Apr 22, 2006 at 09:44:44PM -0000, Bart Samwel wrote:
> I think it would be interesting to find out what kind of IDE controllers
> the other people are using. It does seem to be a HD problem, after all.
> As IPW2[12]00 is Intel, it's possible that the laptops all use the same
> Intel IDE controller. So, could anybody else who can reproduce this
> problem please report their exact IDE controller lines/descriptions as
> reported by lspci? (In fact, full lspci output might be preferable, so
> that we can check out other common factors!)

I see it here on a T42 with ipw2200 and:

0000:00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller (rev 01)

--
 - mdz

Revision history for this message
Bart Samwel (bart-samwel) wrote :

OK, the IDE controllers are virtually the same:

T41 and T42:
82801DBM (ICH4-M) IDE Controller (Rev 01)
82801DBM (ICH4-M) IDE Controller (rev 01)

Asus M6Ne:
82801DBM (ICH4-M) IDE Controller (rev 03)

R51:
82801DBM (ICH4) Ultra ATA Storage Controller (rev 01)

According to ThinkWiki this chip may be found in Thinkpads G40, G41, R40, R50, R50e, R50p, R51, T40, T40p, T41, T41p, T42, T42p, X31, X32, X40. Reference:

http://www.thinkwiki.org/wiki/Intel_82801DBM

The thing is, AgenT reported that the problem did _not_ occur on his Thinkpad, which contains the exact same chip. And I know a lot more laptops containing this chip that _don't_ have the problem. Combo with some other chip? Or with a specific HD series?

Anyway, I googled "system hang 82801DBM" and found the 82801DBM specs, which contained on erratum that may or may not be our problem, I'm not very well-versed in PCI protocol speak. If anybody else cares to interpret this and see if it may be the problem, please be my guest. :-) The doc is here:

http://download.intel.com/design/chipsets/specupdt/25266307.pdf

Quote:

"5. PCI Non-Linear Addressing

Problem: If a PCI Memory Read Multiple or Memory Read Line transaction falls on the last DWORD of a 32 byte cache line boundary and non-linear addressing (cache-line wrap mode) is used, the ICH4-M will pre-fetch data past the cache line boundary. All subsequent PCI bus master reads will get incorrect data. Subsequent processor cycles to PCI/LPC will get blocked behind the surplus data resulting in a system hang.

Implication: System hang only seen in synthetic test environment. No known commercial PCI devices support cache-line wrap mode using Memory Read Multiple or Memory Read Line.

Workaround: None

Status: There are no plans to fix this erratum."

Revision history for this message
AlexHudson (alexhudson) wrote :

My R51 Thinkpad also has the 82801DBM (ICH4-M) IDE Controller (rev 01) chip. According to lspci, it's used for a load of stuff, the only other significant chip really being the PCI bridge (82855PM rev 03).

I had a Hitachi hard drive model IC25N060ATMR04-0, and a DVD RAM drive MATSHITADVD-RAM UJ-811.

Revision history for this message
Justin Mason (jm-ubuntu) wrote : lspci -v output for Thinkpad T40

btw the T40 doesn't have a full Centrino chipset afaik.

Revision history for this message
Reinhard Tartler (siretart) wrote : Hardware information of R40 2722 B3G

Affected machine by this bug.

output of lspci, lspci -n, lshal and dmidecode

Revision history for this message
Bart Samwel (bart-samwel) wrote :

Updated data aggregation:

R40, T41 and T42:
82801DBM (ICH4-M) IDE Controller (rev 01)

Asus M6Ne:
82801DBM (ICH4-M) IDE Controller (rev 03)

T40 and R51:
82801DBM (ICH4) Ultra ATA Storage Controller (rev 01)

To Justin: Yeah, I'd expect it to report ICH4-M if it were Centrino. I guess that the ICH4 controller is the "desktop" edition of the same chip.

Revision history for this message
Angelo Lisco (angystardust-gmail) wrote :

Matt Zimmerman has just uploaded a new version of 'acpi-support' package...

acpi-support (0.72) dapper; urgency=low
 .
   * Re-disable laptop-mode by default, to work around LP#12483
     - Has the pleasant side effect of making the comment match the code

Revision history for this message
cinchurge (cinchurge) wrote :

I seem to have a similar problem on my ThinkPad X31. I installed Dapper (kernel 2.6.15-26) about 2 weeks ago and everything worked well except after suspending to ram, I/O error would occur with one of the mounted HD partitions, and rebooting would cause the system to hang completely with the HD light illuminated - all I could do at that point was turn off the power. Disabling laptop-mode by setting ENABLE_LAPTOP_MODE="no" in /etc/default/laptop-mode hasn't solved the problem.

Revision history for this message
Bart Samwel (bart-samwel) wrote : Re: [Bug 12483] Re: laptop-mode/IDE-APM hang on various laptops

cinchurge wrote:
> I seem to have a similar problem on my ThinkPad X31. I installed Dapper
> (kernel 2.6.15-26) about 2 weeks ago and everything worked well except
> after suspending to ram, I/O error would occur with one of the mounted
> HD partitions, and rebooting would cause the system to hang completely
> with the HD light illuminated - all I could do at that point was turn
> off the power. Disabling laptop-mode by setting ENABLE_LAPTOP_MODE="no"
> in /etc/default/laptop-mode hasn't solved the problem.

Hmmm, this may be unrelated if disabling laptop mode doesn't make it go
away. Especially since there's suspending involved, which there isn't in
the other reports.

Revision history for this message
cinchurge (cinchurge) wrote :

>Hmmm, this may be unrelated if disabling laptop mode doesn't make it go
>away. Especially since there's suspending involved, which there isn't in
>the other reports.

thanks for the help, but does this look like any other known bug of Ubuntu? it's driving me crazy... a laptop is almost no good to me if it can't suspend right :(

Revision history for this message
Bart Samwel (bart-samwel) wrote :

cinchurge wrote:
>> Hmmm, this may be unrelated if disabling laptop mode doesn't make it go
>> away. Especially since there's suspending involved, which there isn't in
>> the other reports.
>
> thanks for the help, but does this look like any other known bug of
> Ubuntu? it's driving me crazy... a laptop is almost no good to me if it
> can't suspend right :(

I get that. Eh... perhaps bug 40929. Otherwise, I don't know. Anything
related should show up by searching for "freeze" or "hang", but I don't
see anything that looks like it. Anybody else remember any similar bugs?

Changed in linux-source-2.6.15:
status: Unknown → Unconfirmed
Changed in linux-source-2.6.15:
status: New → Confirmed
Changed in linux-source-2.6.15:
status: Confirmed → Fix Released
Changed in linux-source-2.6.15 (Ubuntu):
assignee: Ben Collins (ben-collins) → nobody
status: Confirmed → Invalid
Displaying first 40 and last 40 comments. View all 139 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.