2.6.12-9-686 and SMP kernel and Ubuntu Live CD Panic During Boot with DPT/Adaptec I2O Controller - Cause: Loads both dpt_i2o and the i2o drivers.

Bug #24050 reported by Chris Samuel
8
Affects Status Importance Assigned to Milestone
linux-source-2.6.15 (Ubuntu)
Fix Released
High
Ben Collins

Bug Description

Having successfully upgrade my laptop to the Breezy RC via apt-get from Hoary I
did the same with my desktop, a 2.4GHz Intel P4 with 1GB RAM and an Adaptec
2100S RAID card which is normally driven by the dpt_i2o driver, after the Breezy
release.

Rebooting after the upgrade I found the SMP version of 2.6.12-9-686 panic'd (see
end of report) whilst initialising my SCSI controller, Googling around seems to
show this this as being a known problem caused when you mistakenly load both the
dpt_i2o driver and the I2O subsystem, which is a Bad Thing(tm).

This same panic happens with the SMP, uniprocessor and KUbuntu Live CD!

See this posting to the Linux-SCSI list with a very similar panic to mine at:

http://www.spinics.net/lists/linux-scsi/msg02497.html

The response saying that you should never load them both is at:

http://www.spinics.net/lists/linux-scsi/msg02503.html

IMPORTANT: The I2O on linux FAQ says:

Note: One user have reported that moving from dpt_i2o to i2o_block has caused
lockups and kernel panics if he uses LVM and XFS on top of it. Other filesystems
on top of LVM and XFS directly on the partition worked fine.

I'm using XFS on LVM, so I'd much rather see dpt_i2o be used (as it was in
Hoary) than I2O!

After posting this I'm going to try and boot from a rescue CD and see if I can
rebuild the initrd to not use the I2O subsystem..

This is the most of the boot I can see and copy by hand using vga=773.

scsi0: Vendor: Adaptec Model: 2100S FW: 370F
   Vendor: ADAPTEC Model: RAID-5 Rev: 370f
   Type: Direct-Access ANSI SCSI revision: 02
I2O subsystem v$Rev$
i2o: max drivers = 8
i20: Checking for PCI I2O controllers...
ACPI: PCI Interrupt 0000:02:01.1[A] -> GSI 21 (level, low) -> IRQ 21
i2o: I2O controller found on bus 2 at 9.
iop0: PCI I2O controller at DC000000 size=1048576
iop0: isomg write combined MTRR
iop0: MTRR workaround for Intel i960 processor
iop0: Installed at IRQ 21
iop0: Activating I2O controller...
iop0: This may take a few minutes if there are many devices
Unable to handle kernel paging request at virtual address 8000002c
 printing eip:
f8a389ad3
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: i20_core dpt_i2o scsi_mod ehci_hcd usbcore ide_disk ide_cd
cdrom ide_generic piix ide_core unix fbcon tileblit font bitblit vesafb
cfbcopyarea cfbimgblt cfbfillrect softcursor capability commoncap
CPU: 0
EIP: 0060:[<f8a39ad3>] Not tainted VLI
EFLAGS: 00010086 (2.6.12-9-686)
EIP is at adpt_isr+0xde/0x201 [dpt_i20]
eax: 80000000 ebx: f7910000 exc: 00000000 edx: f7879000
esi: f7879000 edi: c034ffa4 ebp: 37910000 esp: c034ff28
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c034e000 task=c02cdb80)
Stack: 00000292 00000100 c0120d10 c034ff48 00000000 00000286 00000000 f7871ec0
       00000000 c034ffa4 00000015 c0136594 00000015 f7879000 c034ffa4 00000000
       f7871ec0 00000015 c034ace0 c034ffa4 c013666b 00000015 000f41ff f7d13de4
Call Trace:
 [<c0120d10>] process_timeout+0x0/0x9
 [<c0136594>] handle_IRQ_event+0x39/0x6d
 [<c013666b>] __do_IRQ+0xa3/0xfc
 [<c02940b9>] schedule+0x303/0x5a5
 [<c01051fd>] do_IRQ+0x19/0x24
 [<c010380e>] common_interrupt+0x1a/0x20
 [<c010101e>] default_idle+0x0/0x29
 [<c0101041>] default_idle+0x23/0x29
 [<c01010b2>] cpu_idle+0x3c/0x51
 [<c03507ab>] start_kernel+0x171/0x1ad
 [<c0350346>] unknown_bootoption+0x0/0x1da
Code: 00 40 89 44 24 18 74 10 8b 7b 0c 85 ff 74 09 b9 11 00 00 00 89 de f3 a5 8b
4c 24 18 85 c9 0f 88 a6 00 00 00 8b 43 0c 85 c0 74 0b <8b> 50 2c 85 d2 0f 85 f4
00 00 00 8b 54 24 34 8b 82 84 00 00 00
<0>Kernel panic - not syncing: Fatal exception in interrupt

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

To reproduce - boot a machine with an Adaptec 2100S (and presumably any other DPT I2O SCSI card) with current Kubuntu Live CD (Ubuntu Live CD should do
the same) and it will panic in the same way.

I didn't get enough time last night to rebuild the initrd, will try again this evening.

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

Striping out the I2O subsystem from the cpio archive created by initramfs
doesn't work - it then whinges about not being able to mount the root device.

Creating an initrd with mkinitrd works though as it doesn't try to include I2O
in the first place. It now boots to the point where the init scripts start (far
further than before).

Unfortunately the machine *then* crashes after mounting filesystems with the
same panic as before, presumably because something else (like hotplug) tries to
load those devices. Unfortunately I couldn't see far back enough to confirm
that it was hotplug doing this and now the machine hangs at hotplug. :-(

I'm going to go grab a coffee to see if hotplug eventually completes and if not
boot back into the 2.6.10 kernel from Hoary and see if it passes hotplug there
or whether something has broken terminally now.

Chris

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

OK - hotplug didn't exit under 2.6.12-9-686-smp, but booting into the
2.6.10-5-686-smp kernel from Hoary was fine.

I then completely removed the I2O subsystem by moving
/lib/modules/2.6.12-9-686-smp/kernel/drivers/message/i2o out of the modules tree
and reran depmod for that version.

Rebooting into 2.6.12-9-686-smp then finally worked!

Any ideas on how to avoid this happening in future ?

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

Changed severity to "critical" as the bugzilla docs say that's what crashes should be marked as.

Apologies for not noticing that sooner!

Revision history for this message
Matt Zimmerman (mdz) wrote :
Revision history for this message
Chris Samuel (chris-csamuel) wrote :

(In reply to comment #5)

> Please see https://wiki.ubuntu.com/HelpingWithBugs

Thanks Matt - sorry about that.

Given that I've figured out (hopefully) how to blacklist the I2O subsystem in udev, does
anyone have any hints on how to either stop mkinitramfs from loading it or on how to
force the kernel packages to use mkinitrd rather than mkinitramfs ?

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

(In reply to comment #6)

> Given that I've figured out (hopefully) how to blacklist the I2O subsystem in udev,

Argh - of course that should be hotplug, not udev..

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

(In reply to comment #1)
> To reproduce - boot a machine with an Adaptec 2100S
> (and presumably any other DPT I2O SCSI card) with
> current Kubuntu Live CD (Ubuntu Live CD should do
> the same) and it will panic in the same way.

Flight CD 1 (Dapper Drake) Live CD panics as well on this machine.

Same problem, erroneously tries to load both dpt_i2o and the I2O subsystem.

Chris

Revision history for this message
Ben Collins (ben-collins) wrote :

This bug has been fixed in the latest kernel in our Dapper release. There are no
plans to fix this in breezy.

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

(In reply to comment #9)

> This bug has been fixed in the latest kernel in our Dapper release. There are no
> plans to fix this in breezy.

Hi Ben,

Thanks so much for this good news, when the next test live CD arrives I'll try that out and let
you know how it goes here.

Can I ask how this was fixed please ?

thanks again!
Chris

Revision history for this message
Ben Collins (ben-collins) wrote :

(In reply to comment #10)
> Can I ask how this was fixed please ?

Upstream fixed the i2o drivers so that they didn't ignore the return value of
request_irq(), thus allowing them to both be loaded at once (even if one doesn't
end up working).

Revision history for this message
Chris Samuel (chris-csamuel) wrote :
Download full text (3.2 KiB)

(In reply to comment #9)

> This bug has been fixed in the latest kernel in our Dapper release. There are no
> plans to fix this in breezy.

Just tried Dapper Flight CD 2.

I'm afraid that I still get a crash during boot on my system, although now
with what looks like an Oops rather than a panic, this time it's saying
"scheduling while atomic".

The system appears to boot normally and problems only start when it reaches the
"Detecting and activating hardware..." script.

The Adaptec DPT I2O driver appears to load with no problems, finds the RAID-5,
then the I2O subsystem tries to load and fails because it's already claimed, and
then the system crashes.

Here's the text of the lead up and the failure from the Dapper Flight CD 2 boot
with the "quiet" option removed. Thankfully I could still scroll back unlike
previous panics so I could get the messages prior to the crash. I've had to
transcribe manually so I've ommitted the leading timestamps and any typos
are likely to be mine.

TID 008: Vendor: ADAPTEC Device: AIC-7899 Rev: 00000001
TID 519: Vendor: ADAPTECT Device: RAID-5 Rev: 370F

scsi0 : Vendor: Adaptec Model:2100S FW: 370F
  Vendor: ADAPTEC Model: RAID-5 Rev: 370F
  Type: Direct-Access ANSI SCSI revision: 02
SCSI device sda: 143372288 512-byte hdwr sectors (73407MB)
SCSI device sda: drive cache: write back
SCSI device sda: 143372288 512-byte hdwr sectors (73407MB)
SCSI device sda: drive cache: write back
 sda: sda1
sd 0:0:0:0: Attached scsi disk sda
I2O subsystem v1.288
i2o: max drivers = 8
i2o: checking for PCI I2O controllers...
ACPI: PCI Interrupt 0000:02:01.1[A] -> GSI 21 (level, low) -> IRQ 193
iop0: controller found (0000:02:01.1)
PCI: Unable to reserve mem region #1:2000000@dc000000 for device 0000:02:01.1
iop0: device already claimed
iop0: DMA / IO allocation for I2O controller failed
ACPI: PCI interrupt for device 0000:02:01.1 disabled
dpti0: Trying to Abort: cmd=42
scheduling while atomic: scsi_eh_0/0xffffffff/3654
 [<c02e7722>] schedule+0x5c2/0x690
 [<c011bab3>] vprintk+0x1d3/0x340
 [<f8a3164e>] adpt_i2o_post_wait+0x1ae/0x250 [dpt_i2o]
 [<c0117b20>] default_wake_function+0x0/0x20
 [<f8a309c2>] adpt_abort+0x92/0x120 [dpt_i2o]
 [<f8a1134e>] scsi_eh_abort_cmds+-x4e/0x100 [scsi_mod]
 [<f8a1217b>] scsi_unjam_host+0xab/0x200 [scsi_mod]
 [<f8a122d0>] scsi_error_handler+0x0/0x120 [scsi_mod]
 [<f8a12399>] scsi_error_handler+0xc9/0x120 [scsi_mod]
 [<c012fdf3>] kthread+0x93/0xa0
 [<c012fd60>] kthread+0x0/0xa0
 [<c0101385>] kernel_thread_helper+0x5/0x10

It then says the following:

                                                          [fail]
 * Loading modules... [ ok ]
 * Setting the system clock [ ok ]
 * Cleaning up ifupdown... [ ok ]
 * Setting the system clock [ ok ]
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: <email address hidden>
 * Setting up LVM Volume Groups... [ ok ]
 * Starting Enterprise Volume Management System...

and hangs with the cursor at the right hand side of the screen,
waiting for the EVMS script to...

Read more...

Revision history for this message
Ben Collins (ben-collins) wrote :

Can you see if you can get me more of the dmesg, around where it loads the
dpt_i2o driver?

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

(In reply to comment #13)

> Can you see if you can get me more of the dmesg, around where it loads the
> dpt_i2o driver?

I'll give it a go, I started the first one at the point I first noticed it
loading the DPT drivers, before that there's lots about loading drivers for
USB, parallel port, audio, etc, etc. Anything you want me to look for in
particular (e.g. IRQ messages or ACPI messages) ?

I ask because it's quite a few pages of error-prone typing which was why I
didn't attempt it previously.

cheers,
Chris

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

(In reply to comment #13)

> Can you see if you can get me more of the dmesg, around where it loads the
> dpt_i2o driver?

My bad - I missed this section completely the first time around!

[DVB frontend loads just prior to this]
ACPI: PCI Interrupt 0000:00:1f.5[B] -> GSI 17 (level, low) -> IRQ 217
Loading Adaptec I2O RAID: Version 2.4 Build 5go
Detecting Adaptec I2O RAID controllers...
ACPI: PCI Interrupt: 0000:02:01.1[A] -> GSI 21 (level, low) -> IRQ 209
Adaptec I2O RAID controller 0 at f8f00000 size=100000 irq=209
intel8x0_measure_ac97_clock: measured 152630 usecs
intel8x0" clocking to 48000
Intel 810 + AC97 Audio, version 1.01, 03:24:11 Dec 13 2005
dpti: If you have a lot of devices this could take a few minutes.
dpti0: Reading the hardware resource table.

and then it carries on with the TID 008 etc.. that I reported previously.

So to my novice eye it looks like ACPI is disabling the interrupt
for the Adaptec card just before the abort and the Oops.

If you need more detail of the modules loading before that point
then let me know.

thanks!
Chris

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

Just found that by doing ALT-SYSRQ-I to kill
current tasks I can force it to continue the boot,
including loading X and starting KDE, but obviously
some things aren't happy because DHCP doesn't happen.

However, it does mean that I can now mount a floppy and
grab useful bits out of /proc as well as as much dmesg
output that is left at that point, which I'll attach in
a few minutes.

Chris

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

Created an attachment (id=5364)
Output of lspci -vv

Output of lspci -vv from DD Flight CD 2 when finally booted.

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

Created an attachment (id=5365)
Output of dmesg

Output of dmesg from DD Flight CD 2 after finally getting it to boot.

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

Created an attachment (id=5366)
/proc/interrupts

Contents of /proc/interrupts after DD FCD2 boot

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

Created an attachment (id=5367)
/proc/iomem

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

Created an attachment (id=5368)
/proc/ioports

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

Created an attachment (id=5369)
/proc/dma

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

Created an attachment (id=5370)
/proc/cpuinfo

Revision history for this message
Ben Collins (ben-collins) wrote :

Thanks. After being able to look at the dmesg in more detail, your observation
is correct. When i2o gets loaded, and notices the device is in use, it attempts
to back out. Upon doing so, it calls pci_disable_device(). I used some examples
elsewhere, which showed that it was incorrect in doing this. So I changed it to
only make this call for every failure except the case where the device is in use
(this case).

Will be fixed in 2.6.15-9.11

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

(In reply to comment #24)
> Thanks. After being able to look at the dmesg in more detail, your observation
> is correct. When i2o gets loaded, and notices the device is in use, it attempts
> to back out. Upon doing so, it calls pci_disable_device(). I used some examples
> elsewhere, which showed that it was incorrect in doing this. So I changed it to
> only make this call for every failure except the case where the device is in use
> (this case).

That's great news, thanks so much for that. I presume the I2O subsystem changes
haven't led to it doing the same ?

> Will be fixed in 2.6.15-9.11

How easy would it be for me to modify the ISO image I've downloaded to use
the new kernel for burning and testing ? I did a quick bit of Googling
with no success.

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

(In reply to comment #25)

> That's great news, thanks so much for that. I presume the I2O subsystem changes
> haven't led to it doing the same ?

s/I2O subsystem/DPT driver/

ENOCAFFEINE, sorry.

Revision history for this message
Ben Collins (ben-collins) wrote :

2.6.15-9 is uploaded, so this bug is resolved.

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

(In reply to comment #27)

> 2.6.15-9 is uploaded, so this bug is resolved.

Wonderful - is there any way I can modify the current
Live CD to test this to avoid having to wait for the
next Flight CD release ?

Revision history for this message
Ben Collins (ben-collins) wrote :

(In reply to comment #28)
> (In reply to comment #27)
>
> > 2.6.15-9 is uploaded, so this bug is resolved.
>
> Wonderful - is there any way I can modify the current
> Live CD to test this to avoid having to wait for the
> next Flight CD release ?

Easiest way is to install breezy and dist-upgrade to dapper.

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

(In reply to comment #29)
>
> Easiest way is to install breezy and dist-upgrade to dapper.

I'm already running Breezy, but I'd rather not upgrade to Dapper
as I'm using the box for useful things and don't want to risk
breaking it at present - would much rather just use a Live CD.

I'll refer the question to my local Linux group in case anyone's
got any bright ideas there.. :-)

thanks!
Chris

Revision history for this message
Chris Samuel (chris-csamuel) wrote :

I can confirm that this is now fine - finally upgraded to Dapper. Thanks!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.