-server kernel variant fails to boot on PowerEdge 2650 with AACRAID timeouts

Bug #149071 reported by James Troup
26
Affects Status Importance Assigned to Milestone
Linux
Invalid
High
linux (Ubuntu)
Invalid
Medium
Unassigned
Nominated for Dapper by Freddie
linux-source-2.6.22 (Ubuntu)
Won't Fix
Medium
Unassigned
Nominated for Dapper by Freddie

Bug Description

Binary package hint: linux-source-2.6.22

We tried installing gutsy i386 server (both beta and 2007-10-04) on a Dell PowerEdge 2650. The install went fine up until it failed to come up on first boot post-install, with the AACRAID driver complaining loudly about aborts and resets. I booted into recovery mode and installed the -generic flavour onto the installed system. When I choose that in grub, the installed system boots up fine.

At Kyle's request we also tried the following command line options with the -server kernel without success:

 o noapic
 o scsi_scan=sync
 o pci=nomsi,nommconf

I'll attach a couple of screenshots of the kernel in it's broken state.

Tags: cft-2.6.27
Revision history for this message
James Troup (elmo) wrote :
Revision history for this message
James Troup (elmo) wrote :
Revision history for this message
Loye Young (loyeyoung) wrote :

Similar problem here on two different models of Compaq Proliant. I don't have a clue what's going on, so this is as good a place to post the issue as any.

At first, the server and 386 kernels boot but no video. SSH works, but no hardware video. The keyboard worked, as I could log in and reboot by typing from memory without screen feedback.

Figuring the problem was something about the video and my LCD screen, I used vidmode=-1 to force simple vga. On reboot, kernel starts to boot, then hangs and the keyboard LEDs start flashing.

The upgrade to gutsy seems to have hosed up the kernel configurations. For one, the first stanza in the grub menu is to the 386 kernel instead of either the server or generic (686) kernel. Second, the video hasn't been working correctly.

I have the suspicion that the apparmor dependency on the 386 kernel and the linux-image-ubuntu-modules are the culprits, but I have no idea.

Kernel-level issues are rather foreign to me. I have only a dim inkling of what's going on in that magical moment between when grub fires off and the machine starts booting.

Loye Young

Revision history for this message
Henk van Lingen (henk+ubuntu) wrote :

I think I have the same prob on a Poweredge 1650 after doing a do-release-upgrade from Feisty to Gutsy yesterday. The last 7.04 kernel (2.6.20-16-server) still works:

tarantula:/boot-# uname -a
Linux tarantula 2.6.20-16-server #2 SMP Sun Sep 23 19:57:25 UTC 2007 i686 GNU/Linux

Revision history for this message
Just Pete (bruinsdj) wrote :

I had the same exact problem with my PE2650 after an upgrade installation. From your screen shots, I noticed that you were running the same firmware level I had - Adaptec PERC 3/Di v 2.8.0.6089.

Checking Dell's support site, I found 3 newer firmware versions. I installed the latest (2.8.1.6098) , and now my system boots just fine to the 2.6.22 kernel.

Revision history for this message
hogman23 (rsturdivant) wrote :

This is happening on PE2950 and PE1850's as well!!

Changed in linux-source-2.6.22:
status: New → Confirmed
Revision history for this message
hogman23 (rsturdivant) wrote :

What is the status of this bug? We are having major issues with it. None of our servers that are on Gutsy will boot without backing up to the old Feisty kernel!!!

Revision history for this message
Vinícius de Figueiredo Silva (viniciusfs) wrote :

Same problem here. Upgraded firmware from 2.8-0 6089 to 2.8-1 6098, rebooted and booting fine.

Revision history for this message
Just Pete (bruinsdj) wrote :

@hogman23

I tested a clean installation of 7.10 server on a PE1850 with the following specs:

BIOS A05
PERC 4e/Si RAID controller firmware version 521S

Installation was successful, first boot came up with no problems. What version of firmware are you using on your 1850s?

Revision history for this message
Brian Murray (brian-murray) wrote :

I am assigning this bug to the 'ubuntu-kernel-team' per their bug policy. For future reference you can learn more about their bug policy at https://wiki.ubuntu.com/KernelTeamBugPolicies .

Changed in linux-source-2.6.22:
assignee: nobody → ubuntu-kernel-team
Revision history for this message
hogman23 (rsturdivant) wrote :

Turns out my problem was that I had vga=791 on the kernel line and the new kernel doesn't like that at all. I think this is probably the bug in my case at least.

Revision history for this message
TJ (tj) wrote :

Same problem affecting Hardy beta of 2008-04-02 i386 on Dell PowerEdge 6300/550 with an Adaptec PERC/2 controller. This also affects Gutsy and was reported during the Gutsy development cycle in a separate bug report.

Cannot boot from server CD because of various aacraid failures. Using irqpoll or noapic doesn't help.

I'm trying to capture photographs of the kernel boot messages but they are a bit zippy!

Revision history for this message
TJ (tj) wrote :
Download full text (3.3 KiB)

In reviewing installation logs from previous kernels I found one for Feisty (2.6.20-15-generic) that didn't suffer the aacraid problems. The pertinent part where aacraid initialises is this:

Jul 6 08:12:57 kernel: [ 5.385325] libata version 2.20 loaded.
Jul 6 08:12:57 kernel: [ 5.393187] Adaptec aacraid driver (1.1-5[2423]-mh3)
Jul 6 08:12:57 kernel: [ 5.408009] ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver
Jul 6 08:12:57 kernel: [ 5.775027] input: PS/2 Generic Mouse as /class/input/input2
Jul 6 08:12:57 kernel: [ 20.336635] scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
Jul 6 08:12:57 kernel: [ 20.336641] <Adaptec aic7890/91 Ultra2 SCSI adapter>
Jul 6 08:12:57 kernel: [ 20.336647] aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs
Jul 6 08:12:57 kernel: [ 20.336653]
Jul 6 08:12:57 kernel: [ 20.337314] ACPI: PCI Interrupt 0000:02:06.0[A] -> GSI 22 (level, low) -> IRQ 17
Jul 6 08:12:57 kernel: [ 35.547156] scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
Jul 6 08:12:57 kernel: [ 35.547163] <Adaptec aic7890/91 Ultra2 SCSI adapter>
Jul 6 08:12:57 kernel: 9] aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs
Jul 6 08:12:57 kernel: [ 35.547175]
Jul 6 08:12:57 kernel: [ 35.547775] ACPI: PCI Interrupt 0000:02:08.0[A] -> GSI 20 (level, low) -> IRQ 18
Jul 6 08:12:57 kernel: [ 50.561757] scsi2 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
Jul 6 08:12:57 kernel: [ 50.561764] <Adaptec aic7860 Ultra SCSI adapter>
Jul 6 08:12:57 kernel: [ 50.561769] aic7860: Ultra Single Channel A, SCSI Id=7, 3/253 SCBs
Jul 6 08:12:57 kernel: [ 50.561775]
Jul 6 08:12:57 kernel: [ 50.564505] ACPI: PCI Interrupt Link [LNK8] enabled at IRQ 11
Jul 6 08:12:57 kernel: [ 50.564580] ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LNK8] -> GSI 11 (level, low) -> IRQ 11
Jul 6 08:12:57 kernel: [ 50.573857] ACPI: PCI Interrupt 0000:03:03.0[A] -> GSI 18 (level, low) -> IRQ 19
Jul 6 08:12:57 kernel: [ 50.606049] AAC0: kernel 2.8-0[6089]
Jul 6 08:12:57 kernel: [ 50.606113] AAC0: monitor 2.8-0[6089]
Jul 6 08:12:57 kernel: [ 50.606175] AAC0: bios 2.8-0[6089]
Jul 6 08:12:57 kernel: [ 50.606234] AAC0: serial 8a0376
Jul 6 08:12:57 kernel: [ 50.606717] scsi3 : percraid
Jul 6 08:12:57 kernel: [ 50.607327] scsi 3:0:0:0: Direct-Access DELL Array1 V1.0 PQ: 0 ANSI: 2
Jul 6 08:12:57 kernel: [ 50.607737] scsi 3:0:1:0: Direct-Access DELL Archive V1.0 PQ: 0 ANSI: 2
Jul 6 08:12:57 kernel: [ 51.881025] scsi 2:0:5:0: CD-ROM NEC CD-ROM DRIVE:466 1.06 PQ: 0 ANSI: 2
Jul 6 08:12:57 kernel: [ 51.881144] target2:0:5: Beginning Domain Validation
Jul 6 08:12:57 kernel: [ 51.884397] target2:0:5: FAST-20 SCSI 20.0 MB/s ST (50 ns, offset 15)
Jul 6 08:12:57 kernel: [ 51.887587] target2:0:5: Domain Validation skipping write tests
Jul 6 08:12:57 kernel: [ 51.887656] target2:0:5: Ending Domain Validation
Jul 6 08:12:57 kernel: [ 65.776294] scsi4 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
Jul 6 08:12:57 kernel: [ 65.776302] ...

Read more...

Revision history for this message
TJ (tj) wrote :
Download full text (4.2 KiB)

There seems to be a problem with the cdimage server containing the Feisty Server CD image (long time outs, less than 1KB/s transfer rate) so I've installed Feisty Alternate x86 command-line on the Dell PowerEdge 6300/550 in order to get some information on the hardware. Note that after the installer reboots the system the PERC/2 controller reports unflushed data in the cache and refuses to boot the kernel. It requires a power-off followed by a full system memory count (which can take over 5 minutes) to clear the situation.

$ uname -a
Linux PowerEdge6300 2.6.20-15-generic #2 SMP Sun Apr 15 07:36:31 UTC 2007 i686 GNU/Linux

The PERC/2 controller is:
03:03.0 RAID bus controller [0104]: Digital Equipment Corporation DECchip 21554 [1011:0046] (rev 01)

It is managed by aacraid:

~$ modinfo aacraid
filename: /lib/modules/2.6.20-15-generic/kernel/drivers/scsi/aacraid/aacraid.ko
version: 1.1-5[2423]-mh3
license: GPL
description: Dell PERC2, 2/Si, 3/Si, 3/Di, Adaptec Advanced Raid Products, HP NetRAID-4M, IBM ServeRAID & ICP SCSI driver
author: Red Hat Inc and Adaptec
srcversion: 9F4AEF75C12F7128F830FA2
depends: scsi_mod
vermagic: 2.6.20-15-generic SMP mod_unload 586

$ lspci -nnn
00:02.0 ISA bridge [0601]: Intel Corporation 82371AB/EB/MB PIIX4 ISA [8086:7110] (rev 02)
00:02.1 IDE interface [0101]: Intel Corporation 82371AB/EB/MB PIIX4 IDE [8086:7111] (rev 01)
00:02.2 USB Controller [0c03]: Intel Corporation 82371AB/EB/MB PIIX4 USB [8086:7112] (rev 01)
00:02.3 Bridge [0680]: Intel Corporation 82371AB/EB/MB PIIX4 ACPI [8086:7113] (rev 02)
00:04.0 VGA compatible controller [0300]: ATI Technologies Inc 3D Rage Pro [1002:4749] (rev 5c)
00:08.0 SCSI storage controller [0100]: Adaptec AHA-2940U2/U2W [9005:0010]
00:0a.0 PCI bridge [0604]: Intel Corporation 21154 PCI-to-PCI Bridge [8086:b154]
00:10.0 Host bridge [0600]: Intel Corporation 450NX - 82451NX Memory & I/O Controller [8086:84ca] (rev 03)
00:12.0 Host bridge [0600]: Intel Corporation 450NX - 82454NX/84460GX PCI Expander Bridge [8086:84cb] (rev 04)
00:13.0 Host bridge [0600]: Intel Corporation 450NX - 82454NX/84460GX PCI Expander Bridge [8086:84cb] (rev 04)
00:14.0 Host bridge [0600]: Intel Corporation 450NX - 82454NX/84460GX PCI Expander Bridge [8086:84cb] (rev 04)
01:04.0 Ethernet controller [0200]: Intel Corporation 82557/8/9 [Ethernet Pro 100] [8086:1229] (rev 0d)
01:05.0 Ethernet controller [0200]: Intel Corporation 82557/8/9 [Ethernet Pro 100] [8086:1229] (rev 0d)
02:04.0 SCSI storage controller [0100]: Adaptec AHA-2940U2/U2W / 7890/7891 [9005:001f]
02:06.0 SCSI storage controller [0100]: Adaptec AHA-2940U2/U2W / 7890/7891 [9005:001f]
02:08.0 SCSI storage controller [0100]: Adaptec AIC-7860 [9004:6078] (rev 03)
03:03.0 RAID bus controller [0104]: Digital Equipment Corporation DECchip 21554 [1011:0046] (rev 01)

$ lsmod | grep aac
aacraid 59652 2
scsi_mod 142348 8 st,sr_mod,sg,sd_mod,aacraid,aic7xxx,scsi_transport_spi,libata

$ grep -i aac /var/log/kern.log
Apr 3 18:07:41 PowerEdge6300 kernel: [ 6.394845] Adaptec aacraid driver (1.1-5[2423]-mh3)
Apr 3 18:07:41 PowerEdge6300 kernel: [ 51.623757] AAC0: kernel 2...

Read more...

Revision history for this message
TJ (tj) wrote :

This is the Hardy 2.6.24 kernel panic transcribed from the attached photograph of the screen. I'm not sure if this was the first failure reported as te kernel started but I suspect it might be based on the call stack:

[ 461.589252] ESI: 22ac132d EDI: f7cbda78 EBP: 22ac09e7 ESP: df839d54
[ 461.589321] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 461.589388] CR0:8005003b CR2: b7eb5000 CR3: 1f8b6000 CR4: 00000690
[ 461.589461] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 461.589530] DR6: ffff0ff0 DR7: 00000400
[ 461.589592] [<c0216446>] __delay+0x6/0x10
[ 461.589702] [<f883265b>] aac_fib_send+0x21b/0x2d0 [aacraid]
[ 461.589833] [<f882e9f4>] aac_get_adapter_info+0x74/)x600 [aacraid]
[ 461.589963] [<f882bf43>] aac_probe_one+0x1f3/0x450 [aacraid]
[ 461.590086] [<f8832b20>] aac_command_thread+0x0/0x6a0 [aacraid]
[ 461.590214] [<c0223526>] pci_device_probe+0x56/0x80
[ 461.590327] [<c027eab8>] driver_probe_device+0x88/0x190
[ 461.590441] [<c0212290>] kobject_uevent_env+0xf0/0x3d0
[ 461.590554] [<c027ed2e>] __driver_attach+0x9e/0xa0
[ 461.590665] [<c027deeb>] bus_for_each_dev+0x3b/0x60
[ 461.590777] [<c027e936>] driver_attach+0x16/0x20
[ 461.590889] [<c027ec90>] __driver_attach+0x0/0xa0
[ 461.590999] [<c027e26a>] bus_add_driver+0x8a/0x1e0
[ 461.591112] [<c02236d6>] __pci_register_driver+0x56/0x90
[ 461.591224] [<f883d033>] aac_init+0x33/0x74 [aacraid]
[ 461.591344] [<c01516c6>] sys_init_module+0x126/0x19c0
[ 461.591463] [<c0105442>] syscall_call+0x7/0xb
[ 461.591573] [<c0310000>] sigd_send+0x170/0x2f0
[ 461.591683] =======================

Revision history for this message
TJ (tj) wrote : aacraid fails on 2.6.24-15-generic with Dell PowerEdge PERC/2 RAID controller
Download full text (4.6 KiB)

I attached a serial console and captured 2.6.24-15-generic attempting to start. I've attached the log-file so we don't need to rely on screen photographs.

Key features are, I think:

[ 439.306852] Adaptec aacraid driver 1.1-5[2449]-ms
[ 439.758050] irq 10: nobody cared (try booting with the "irqpoll" option)
[ 439.764849] Pid: 1263, comm: udevd Not tainted 2.6.24-15-generic #1
[ 439.771202] [<c0165764>] __report_bad_irq+0x24/0x80
...
[ 439.839092] handlers:
[ 439.841428] [<f88859a0>] (ahc_linux_isr+0x0/0x250 [aic7xxx])
[ 439.847367] Disabling IRQ #10

[ 495.422177] BUG: soft lockup - CPU#3 stuck for 11s! [modprobe:1447]
[ 495.428524]
[ 495.430086] Pid: 1447, comm: modprobe Not tainted (2.6.24-15-generic #1)
[ 495.436855] EIP: 0060:[<c021662b>] EFLAGS: 00000293 CPU: 3
[ 495.442429] EIP is at delay_tsc+0x2b/0x50
[ 495.446510] EAX: 78583a4b EBX: 0000003f ECX: 00000000 EDX: 0000003f
[ 495.452847] ESI: 78583a27 EDI: f7d01a78 EBP: 78583171 ESP: df93bd4c
[ 495.459178] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 495.464644] CR0: 8005003b CR2: 0812574c CR3: 1fa03000 CR4: 00000690
[ 495.470980] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 495.477309] DR6: ffff0ff0 DR7: 00000400
[ 495.481224] [<c02165c6>] __delay+0x6/0x10
[ 495.485463] [<f89446aa>] aac_fib_send+0x21a/0x2d0 [aacraid]
[ 495.491306] [<c012363a>] enqueue_task_fair+0x1a/0x30
[ 495.496515] [<f8940a94>] aac_get_adapter_info+0x74/0x620 [aacraid]
[ 495.502942] [<f893df54>] aac_probe_one+0x224/0x450 [aacraid]
[ 495.508830] [<f8944b80>] aac_command_thread+0x0/0x6d0 [aacraid]
...

*** This is caused by the motherboard having a USB controller chipset with no USB hardware

[ 499.829084] uhci_hcd 0000:00:02.2: host controller process error, something bad happened!
[ 499.837347] uhci_hcd 0000:00:02.2: host controller halted, very bad!
[ 499.843790] uhci_hcd 0000:00:02.2: HC died; cleaning up
...
[ 708.005536] aacraid: aac_fib_send: first asynchronous command timed out.
[ 708.005542] Usually a result of a PCI interrupt routing problem;
[ 708.005548] update mother board BIOS or consider utilizing one of
[ 708.005553] the SAFE mode kernel options (acpi, apic etc)
...
[ 708.030099] scsi 4:0:0:0: Attempting to queue an ABORT message
[ 708.030110] CDB: 0x0 0x0 0x0 0x0 0x0 0x0
[ 708.030191] scsi 4:0:0:0: Command already completed
[ 708.030201] aic7xxx_abort returns 0x2002
...
[ 718.100047] scsi 3:0:0:0: Device offlined - not ready after error recovery
...
[ 935.879635] scsi4: At time of recovery, card was paused
[ 935.884941] >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
[ 935.884947] scsi4: Dumping Card State in Message-in phase, at SEQADDR 0x100
[ 935.898648] Card was paused
[ 935.901509] ACCUM = 0xc0, SINDEX = 0x71, DINDEX = 0x8c, ARG_2 = 0x0
[ 935.907839] HCNT = 0x0 SCBPTR = 0x0
[ 935.911393] SCSISIGI[0xe6]:(REQI|BSYI|MSGI|IOI|CDI)
[ 935.916886] ERROR[0x0] SCSIBUSL[0x0] LASTPHASE[0xe0]:(MSGI|IOI|CDI)
[ 935.923912] SCSISEQ[0x12]:(ENAUTOATNP|ENRSELI)
[ 935.928781] SBLKCTL[0x0] SCSIRATE[0x0] SEQCTL[0x10]:(FASTMODE)
[ 935.935235] SEQ_FLAGS[0x0] SSTAT0[0x7]:(DMADONE|SPIORDY|SDONE)
[ 935.941690] SSTAT1[0x3]:(REQINIT|PHA...

Read more...

Revision history for this message
TJ (tj) wrote : aacraid works on 2.6.20-15-generic with Dell PowerEdge PERC/2 RAID controller

For comparison here is a successful boot log from the serial console with Feisty 2.6.20-15-generic.

Revision history for this message
Just Pete (bruinsdj) wrote :

TJ,

Update your RAID firmware. Looks like you're on the same level that others were when experiencing this problem.

From Dell's support site, it looks like this problem is fixed in version 2.8.0 Build 6092
----------------
-Background controller cache flush routine modified to flush smaller number of buffers during I/O without impacting RAID throughput. This change fixes driver timeout (manifesting as loss of drive access and filesystem errors like ext3_get_inode_loc errors) seen under certain Linux configurations.
---------------

You're on the version just prior to this. Note that there are several versions even newer at this point.

Revision history for this message
TJ (tj) wrote :

Thanks for spotting that.

Firmware was updated to the latest available from Dell going by the service tag last year. I checked for more recent releases this week before chasing the bug. The update dates are the same as the ones here *but* checking the build number I see 2.8.0 6099 is claimed for the floppy-disk installer.

I'll grab all the updates again and apply them and see how things go.

Revision history for this message
TJ (tj) wrote : aacraid fails on 2.6.24-15-generic with Dell PowerEdge PERC/2 RAID controller

With the latest (2008-04-04) PERC 2 firmware 2.8.0 6099 the issue remains.

[ 0.000000] Linux version 2.6.24-15-generic (root@PowerEdge6300) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #1 SMP Fri Apr 4 09:18:39 BST 2008 (Ubuntu 2.6.24-15.26-generic)

[ 436.079664] Adaptec aacraid driver 1.1-5[2449]-ms

[ 492.476969] BUG: soft lockup - CPU#2 stuck for 11s! [modprobe:1376]
[ 492.483317]
[ 492.484874] Pid: 1376, comm: modprobe Not tainted (2.6.24-15-generic #1)
[ 492.491642] EIP: 0060:[<c0216641>] EFLAGS: 00000287 CPU: 2
[ 492.497226] EIP is at delay_tsc+0x41/0x50
[ 492.501302] EAX: 0000059e EBX: 0000003f ECX: 00000000 EDX: 0000003f
[ 492.507640] ESI: 17c02b3e EDI: df84f278 EBP: 17c025a0 ESP: df9dfd4c
[ 492.513972] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 492.519443] CR0: 8005003b CR2: 0812574c CR3: 1f97b000 CR4: 00000690
[ 492.525781] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 492.532114] DR6: ffff0ff0 DR7: 00000400
[ 492.536029] [<c02165c6>] __delay+0x6/0x10
[ 492.540264] [<f89496aa>] aac_fib_send+0x21a/0x2d0 [aacraid]
[ 492.546108] [<c012363a>] enqueue_task_fair+0x1a/0x30
[ 492.551318] [<f8945a94>] aac_get_adapter_info+0x74/0x620 [aacraid]
[ 492.557753] [<f8942f54>] aac_probe_one+0x224/0x450 [aacraid]
[ 492.563642] [<f8949b80>] aac_command_thread+0x0/0x6d0 [aacraid]
[ 492.569801] [<c0223136>] pci_device_probe+0x56/0x80
[ 492.574903] [<c027e85e>] driver_probe_device+0x8e/0x190
[ 492.580373] [<c027eace>] __driver_attach+0x9e/0xa0
[ 492.585385] [<c027dc7b>] bus_for_each_dev+0x3b/0x60
[ 492.590491] [<c027e6d6>] driver_attach+0x16/0x20
[ 492.595330] [<c027ea30>] __driver_attach+0x0/0xa0
[ 492.600259] [<c027e00a>] bus_add_driver+0x8a/0x1e0
[ 492.605281] [<c02232e3>] __pci_register_driver+0x53/0xa0
[ 492.610815] [<f8850033>] aac_init+0x33/0x74 [aacraid]
[ 492.616098] [<c0151511>] sys_init_module+0x151/0x1990
[ 492.621377] [<c01778fa>] __do_fault+0x21a/0x410
[ 492.626170] [<c0166421>] handle_fasteoi_irq+0x91/0xf0
[ 492.631465] [<c01053b2>] syscall_call+0x7/0xb
[ 492.636066] =======================

Revision history for this message
TJ (tj) wrote :

It looks as if the root cause of the Dell PERC 2 issue is ACPI config-interrupts. I've been able to boot v2.6.25-rc8 with the kernel option "pci=noacpi".

I've got a bug open upstream:

http://bugzilla.kernel.org/show_bug.cgi?id=10396

a git-rev-list shows 277 commits to ACPI between v2.6.20 and v2.6.22 so it's going to be a big job to narrow it down.

Revision history for this message
TJ (tj) wrote :

I've discovered the cause of the issue. The system contains an i450NX chipset which has peer PCI buses. To work around buggy BIOS situations where these weren't discovered a fix-up was added that looks for and scans them as secondaries of the first root bus.

This causes the later ACPI code to ignore them because they are already in the pci_root_bus list.

I've developed a patch for testing which is attached to the bug at kernel.org:

http://bugzilla.kernel.org/show_bug.cgi?id=10396#c18

The patch applies some DMI matching that looks for the Dell PowerEdge 6300 and if found, doesn't apply the i450NX fix-up. The system is now undergoing testing to discover any side affects of preventing the buses being discovered as secondaries.

$ uname -a
Linux PowerEdge6300 2.6.25-rc8-acpi-pci-bus #3 SMP Wed Apr 9 08:29:54 BST 2008 i686 GNU/Linux

$ grep -i 'aac' /var/log/dmesg
[ 76.282508] Adaptec aacraid driver 1.1-5[2455]-ms
[ 76.282520] bus: 'pci': add driver aacraid
[ 76.407282] bus: 'pci': driver_probe_device: matched device 0000:03:03.0 with driver aacraid
[ 76.407819] bus: 'pci': really_probe: probing driver aacraid with device 0000:03:03.0
[ 76.612034] AAC0: kernel 2.8-1[6099]
[ 76.613521] AAC0: monitor 2.8-1[6099]
[ 76.617518] AAC0: bios 2.8-1[6099]
[ 76.621520] AAC0: serial 8A0376
[ 76.642288] driver: '0000:03:03.0': driver_bound: bound to device 'aacraid'
[ 76.645545] bus: 'pci': really_probe: bound device 0000:03:03.0 to driver aacraid

Revision history for this message
TJ (tj) wrote :

Patch posted to kernel-team mailing list. Hopefully will be incorporated before Hardy release.

Changed in linux-source-2.6.22:
status: Confirmed → In Progress
importance: Undecided → Medium
Changed in linux:
assignee: nobody → kernel-team
importance: Undecided → High
status: New → In Progress
Revision history for this message
TJ (tj) wrote :

Re-allocated i450NX cause to bug #214814

Changed in linux:
importance: High → Undecided
status: In Progress → New
Changed in linux-source-2.6.22:
status: In Progress → Confirmed
Changed in linux:
assignee: kernel-team → ubuntu-kernel-team
Revision history for this message
Matthias Urlichs (smurf) wrote :

Bah. A fix for this really should be in the Hardy kernel.

Revision history for this message
Gerry (gsker) wrote :

TJ Whatever your bug is, it's not the one referred to in this bug.

The original problem is that *-server* kernels experience
  "Host adapter abort request"
messages during bootup and fail to boot
on a Dell PE2650 after the install. (The boot from CD and install work fine).
but the *-generic* kernels never do.

You reported a kernel panic on a Dell 6300 with a -generic kernel during an install.

Is anyone actually dealing with the original bug -- which still exists in Hardy or does the Perc firmware update always fix it? If it does, we should get this onto the front page.

I'm going to go do a firmware upgrade now that I've seen this with Hardy also.

Revision history for this message
Mark Silence (madasi) wrote :

On my Dell 2650 with a Perc3\DI running Gutsy, I upgraded the firmware for the PERC yesterday, and that is now allowing me to boot the -server kernels, where before I would get the aacraid errors reported by others.

Revision history for this message
Gerry (gsker) wrote :

Confirmed. I just did the same thing.

Got the firmware from
http://ftp.us.dell.com/scsi-raid/RAID_FRMW_LX_R168380
edited it with vi to change the shell from /bin/sh to /bin/bash
made it executable
extracted it with
  ./RAID_FRMW_LX_R168380.BIN --extract RAID
(ignoring the rpm errors)
cd RAID
edited adalnx.sh to change the shell to bash and ran it.
Then i ran ./adalnx and it flashed the controller BIOS.

Now if I could just do something similar with the Remote Access card.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi James,

Since you're the original bug reporter, care to comment if the firmware updates mentioned here resolved this for you?

TJ - I'll follow up with your bug at the new report you opened.

Thanks.

Changed in linux:
status: New → Incomplete
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Changed in linux:
status: Unknown → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message
James Troup (elmo) wrote :

Upgrading the SCSI card firmware of the boxes in question did "fix"
this, in as much as the hardy kernel now boots cleanly. Ideally (and
given dapper works just fine with the old firmware) the kernel would
be fixed to support the old firmware too, but pragmatically I realise
that may well never happen.

Changed in linux:
importance: Undecided → Medium
status: Incomplete → Triaged
Revision history for this message
ifone (hellocyf) wrote :

I install ubuntu 8.10 server on pe 2650 could not boot up.stop at initramfs
type exit(),can continue to boot system .but could not startup network interface

dose this bug coursed by pe 2650 firmware or ubuntu?
if it about firmware,how to upgrade it.
if it about ubuntu,and how to deal with.

warry

thanks all

Revision history for this message
Sergio Zanchetta (primes2h) wrote :

The 18 month support period for Gutsy Gibbon 7.10 has reached its end of life -
http://www.ubuntu.com/news/ubuntu-7.10-eol . As a result, we are closing the
linux-source-2.6.22 kernel task. It would be helpful if you could test the
new Jaunty Jackalope 9.04 release and confirm if this issue remains -
http://www.ubuntu.com/getubuntu/releasenotes/904overview. If the issue still exists with the Jaunty
release, please update this report by changing the Status of the "linux (Ubuntu)"
task from "Incomplete" to "New". Also please be sure to run the command below
which will automatically gather and attach updated debug information to this
report. Thanks in advance.

apport-collect -p linux-image-2.6.28-11-generic 149071

Changed in linux-source-2.6.22 (Ubuntu):
status: Confirmed → Won't Fix
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
Esa Häkkinen (esa+hakkinen) wrote :

Report being fixed with Dell firmware update v2.8.1.7692 (build 7692).

Tested OK with Hardy 9.04 and Karmic 9.10 on Dell PowerEdge 2550 with PERC 3/Di (hw pci id 1028:0002).

Problem is in firmware: "PERC has no 64 bit scsi passthrough function."
http://bugzilla.kernel.org/show_bug.cgi?id=9133

Workaround for pre-v.2.8.1.7692: downgrade performance with kernel (2.6.25.4) options:
"aacraid.dacmode=0 aacraid.nondasd=0 aacraid.expose_physicals=0"

PowerEdge 2550 PERC Firmware v2.8.1.7692 is available on Win32 floppy writer program only. I extracted files from floppies http://esa.hakkinen.com/konehuone/dell2550/ and used FreeDOS to flash firmare.

Patch first motherboard BIOS and reboot, then PERC firmware, otherwise bricking will happen. It's good idea to have v2.8.x firmare already patched before this update. readme.txt says v2.6 is minimum, other sources recommend updating from v2.6/v2.7 to 2.8.0 first. Seek ftp://ftp.us.dell.com/sysman/ and dell support site for firmware updates.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Triaged a while ago but has not had any updated comments for quite some time. Please let us know if this issue remains in the current Ubuntu release, http://www.ubuntu.com/getubuntu/download . If the issue remains, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

Changed in linux:
status: Invalid → Unknown
Changed in linux:
importance: Unknown → High
Changed in linux:
status: Unknown → Invalid
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.