AMD 280 Kernel OOPS on Bootup (dapper Flight4 Live CD)

Bug #32070 reported by Joe Kislo
10
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Medium
Ben Collins

Bug Description

The Machine is a HP DL385 with two AMD 280s in it (dual cpu, dual core 2.4GHz). Booting the x86_64 Flight 4 dapper live CD, I get this oops on the dmesg:

[ 148.143131] tg3: eth0: Link is up at 100 Mbps, full duplex.
[ 148.143135] tg3: eth0: Flow control is off for TX and off for RX.
[ 148.459912] device-mapper: 4.4.0-ioctl (2005-01-12) initialised: <email address hidden>
[ 151.754302] Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
[ 151.754309] <0000000000000000>{_stext+2146431208}
[ 151.754316] PGD f3a5f067 PUD 8209d067 PMD 0
[ 151.754320] Oops: 0010 [1] PREEMPT SMP
[ 151.754323] CPU 1
[ 151.754324] Modules linked in: dm_mod md_mod tsdev hw_random shpchp psmouse pcspkr af_packet i2c_amd756 i2c_core serio_raw pci_hotplug floppy evdev tg3 squashfs unionfs loop nls_cp437 isofs ohci_hcd usbcore cciss scsi_mod ide_generic ide_cd cdrom generic amd74xx thermal processor fan capability commoncap vga16fb cfbcopyarea vgastate cfbimgblt cfbfillrect fbcon tileblit font bitblit softcursor
[ 151.754342] Pid: 4457, comm: readahead-list Not tainted 2.6.15-15-amd64-generic #1
[ 151.754344] RIP: 0010:[<0000000000000000>] <0000000000000000>{_stext+2146431208}
[ 151.754348] RSP: 0018:ffff81007d0cfc80 EFLAGS: 00010246
[ 151.754350] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81007d0cfcc8
[ 151.754353] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8100f43e86a8
[ 151.754356] RBP: ffff8100f44ab280 R08: 0000000000000000 R09: ffff81007d0cfdd4
[ 151.754358] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[ 151.754361] R13: ffff810081989c08 R14: 0000000000000000 R15: ffff8100f43e87f0
[ 151.754364] FS: 00002aaaaadfb6d0(0000) GS:ffffffff80428880(0000) knlGS:0000000000000000
[ 151.754367] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 151.754369] CR2: 0000000000000000 CR3: 00000000f3ae1000 CR4: 00000000000006e0
[ 151.754372] Process readahead-list (pid: 4457, threadinfo ffff81007d0ce000, task ffff81007b570e40)
[ 151.754374] Stack: ffffffff880fe169 ffff8100f43e86a8 ffff8100f44ab280 ffff8100f28e3438
[ 151.754381] 000000008020a650 ffff8100f29ff000 0000000000000000 000ffffff50ad2c0
[ 151.754385] ffff8100f43e8680 ffff810080000300
[ 151.754389] Call Trace:<ffffffff880fe169>{:squashfs:squashfs_readpage+217}
[ 151.754398] <ffffffff8020c603>{radix_tree_node_alloc+19} <ffffffff8020cacd>{radix_tree_insert+317}
[ 151.754414] <ffffffff8016c824>{__alloc_pages+116} <ffffffff8016e340>{__do_page_cache_readahead+480}
[ 151.754425] <ffffffff8016e489>{force_page_cache_readahead+105} <ffffffff80166508>{sys_readahead+136}
[ 151.754440] <ffffffff8010fede>{system_call+126}
[ 151.754448]
[ 151.754449] Code: Bad RIP value.
[ 151.754451] RIP <0000000000000000>{_stext+2146431208} RSP <ffff81007d0cfc80>
[ 151.754455] CR2: 0000000000000000
[ 151.754456] <6>ACPI: Power Button (FF) [PWRF]
[ 152.651068] ibm_acpi: ec object not found

-------

I have seen a similar OOPS on my work desktop booting dapper live CD flight 2 (#21223)

I also see this segfault from powernowd in my dmesg, but that may be because I tried to kill it with /etc/init.d/powernowd stop... and it only partially died (1 cpu went back to full throttle, the other one didn't)... so I ended up -9'ing it.

[ 180.038433] powernow-k8: Found 4 AMD Athlon 64 / Opteron processors (version 1.50.4)
[ 180.040518] powernow-k8: 0 : fid 0x10 (2400 MHz), vid 0x8 (1350 mV)
[ 180.040524] powernow-k8: 1 : fid 0xe (2200 MHz), vid 0xa (1300 mV)
[ 180.040528] powernow-k8: 2 : fid 0xc (2000 MHz), vid 0xc (1250 mV)
[ 180.040532] powernow-k8: 3 : fid 0xa (1800 MHz), vid 0xe (1200 mV)
[ 180.040538] cpu_init done, current fid 0x10, vid 0x8
[ 180.043377] powernow-k8: 0 : fid 0x10 (2400 MHz), vid 0x8 (1350 mV)
[ 180.043385] powernow-k8: 1 : fid 0xe (2200 MHz), vid 0xa (1300 mV)
[ 180.043387] powernow-k8: 2 : fid 0xc (2000 MHz), vid 0xc (1250 mV)
[ 180.043390] powernow-k8: 3 : fid 0xa (1800 MHz), vid 0xe (1200 mV)
[ 180.043396] cpu_init done, current fid 0x10, vid 0x8
[ 180.337876] Bluetooth: Core ver 2.8
[ 180.337882] NET: Registered protocol family 31
[ 180.337884] Bluetooth: HCI device and connection manager initialized
[ 180.337897] Bluetooth: HCI socket layer initialized
[ 180.583305] Bluetooth: L2CAP ver 2.8
[ 180.583312] Bluetooth: L2CAP socket layer initialized
[ 180.585993] Bluetooth: RFCOMM socket layer initialized
[ 180.586012] Bluetooth: RFCOMM TTY layer initialized
[ 180.586014] Bluetooth: RFCOMM ver 1.6
[ 188.716986] eth0: no IPv6 routers present
[ 248.584946] powernowd[5576]: segfault at 0000000000500000 rip 00002aaaaac2bc23 rsp 00007fffff80dae0 error 4

Revision history for this message
Joe Kislo (joe-k12s) wrote :

I also found it wierd that powernowd was loaded at all on the opteron chip. I'm almost positive that previous ubuntu releases (hoary, Breezy) didn't load powernowd for opterons. Clearly it freaked out, but I don't know if that's because it's dual core or what.

Lemme know if you need anything... I will be visiting the datacenter where these machines are housed this weekend and can gather further info if you need it.

Matt Zimmerman (mdz)
Changed in linux-source-2.6.15:
assignee: nobody → ben-collins
Revision history for this message
Yuriy Kozlov (yuriy-kozlov) wrote :

Does this still happen with the latest dapper (beta2)?

Also, is this restricted to the live CD or does it happen with an install as well?

Changed in linux-source-2.6.15:
status: Unconfirmed → Needs Info
Revision history for this message
Joe Kislo (joe-k12s) wrote :

Unfortunately, this server is now in production and I do not have access to reboot it freely anymore.

I do have some similar hardware (single dual core machines). But no other Dual CPU Dual Core machines.

I will try some similar hardware, but it won't retest the original case

Revision history for this message
Joe Kislo (joe-k12s) wrote :

So, I can't get access to the Dual Core Dual CPU machine. But if I boot beta2 live dapper on a Dual CPU machine... Powernowd still loads, but doesn't seem to be very happy.

If I try to stop it with
/etc/init.d/powerdnow stop

it won't die

If I try to kill it using
kill 6057

I get this error on the console:
  316.415412] powernowd[6057]: segfault at 0000000000500000 rip 00002aaaaac2bc23 rsp 00007fffffb83700 error 4

I tried it again and got similar results:
[ 352.372750] powernowd[7042]: segfault at 0000000000500000 rip 00002aaaaac2bc23 rsp 00007fffff83efa0 error 4

Not sure if that's terribly helpful. It *did* appear to power down the cpu a little according to /proc/cpuinfo. I had been under the assumption that opteron chips didn't have this ability. But perhaps it was just that the older live CDs never loaded powernowd on an opteron

Revision history for this message
Yuriy Kozlov (yuriy-kozlov) wrote :

If you get access to this hardware again, could you try Edgy and/or Feisty on it?

Revision history for this message
Launchpad Janitor (janitor) wrote : This bug is now reported against the 'linux' package

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this linux-source-2.6.15 kernel bug to the new "linux" package. We appreciate your patience and understanding as we make this transition. Also, if you would be interested in testing the upcoming Intrepid Ibex 8.10 release, it is available at http://www.ubuntu.com/testing . Please let us know your results. Thanks!

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Ralph Janke (txwikinger) wrote :

Unfortunately this bug report is being closed because we received no response to the last inquiry for information. However, the Intrepid Ibex 8.10 Beta release was most recently announced - http://www.ubuntu.com/testing/intrepid/beta . If you are able to confirm this is still an issue with this most recent release please feel free to reopen this report. To reopen the bug report you can click on the current status, under the Status column, and change the Status back to "New". Thanks.

Changed in linux:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.