Random oopsen with Xen on Ubuntu Gutsy

Bug #190010 reported by Matt Mackall
4
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Unassigned

Bug Description

Binary package hint: linux-image-2.6.22-14-xen

I've got an AMD64 machine that's exhibiting random oopses and database
corruption using 2.6.22-14 kernel from Ubuntu Gutsy and Xen
3.1.0-0ubuntu18 also from Gutsy,

Running the same database without Xen under heavy load appears to work
fine. Machine has 3G of ECC and memtest86 works fine as well. Also,
machine ran the database just fine for a year before (unfortunately
simultaneously) switching to Gutsy and Xen.

Any suggestions?

MySQL said:

The relevant bit of the logs is this:

  Jan 10 23:46:26 vegguide mysqld[22376]: InnoDB: Database page corruption on disk or a failed
  Jan 10 23:46:26 vegguide mysqld[22376]: InnoDB: file read of page 1559.
  Jan 10 23:46:26 vegguide mysqld[22376]: InnoDB: You may have to recover from a backup.

A couple of the oopsen pasted below:

[297993.758358] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000001
[297993.758381] printing eip:
[297993.758392] 006c6000 -> *pde = 00000000:6d67f001
[297993.758398] 27f46000 -> *pme = 00000000:00000000
[297993.758405] Oops: 0000 [#1]
[297993.758408] SMP
[297993.758415] Modules linked in: af_packet xt_multiport iptable_filter ip_tables x_tables ipv6 evdev ext3 jbd mbcach e dm_mirror dm_snapshot dm_mod fuse apparmor commoncap
[297993.758448] CPU: 0
[297993.758449] EIP: 0061:[<c0176eec>] Not tainted VLI
[297993.758452] EFLAGS: 00010002 (2.6.22-14-xen #1)
[297993.758468] EIP is at kmem_cache_alloc+0x5c/0xe0
[297993.758474] eax: 00000000 ebx: 00000001 ecx: 00000000 edx: c1bfe8a0
[297993.758480] esi: 00000000 edi: 00000000 ebp: 00000020 esp: e0003c00
[297993.758487] ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0069
[297993.758493] Process pdflush (pid: 4538, ti=e0002000 task=c210ea60 task.ti=e0002000)
[297993.758498] Stack: 00000000 c1c34f60 c028b35d 00000000 c20b4b40 00000003 c20b4b40 c028b35d
[297993.758515] c12a7480 c20b4b40 00000003 c029fd40 c029f25c c013bdba c03d26a0 c2048000
[297993.758532] c03ec4a8 c20b4b40 00000000 00000400 c0290b67 c2048000 00000000 003d0900
[297993.758549] Call Trace:
[297993.758555] [<c028b35d>] skb_clone+0x2d/0x250
[297993.758567] [<c028b35d>] skb_clone+0x2d/0x250
[297993.758575] [<c029fd40>] snap_rcv+0x0/0xa0
[297993.758582] [<c029f25c>] llc_rcv+0xdc/0x2b0
[297993.758589] [<c013bdba>] clocksource_get_next+0x3a/0x40
[297993.758602] [<c0290b67>] netif_receive_skb+0x237/0x420
[297993.758613] [<c027119d>] netif_poll+0x4fd/0xbf0
[297993.758621] [<c010891b>] sched_clock+0x3b/0x80
[297993.758630] [<c011e5f3>] scheduler_tick+0xf3/0x100
[297993.758643] [<c02930de>] net_rx_action+0xde/0x260
[297993.758652] [<c0127302>] __do_softirq+0x92/0x130
[297993.758662] [<c012742c>] do_softirq+0x8c/0x90
[297993.758670] [<c0106e20>] do_IRQ+0x40/0x70
[297993.758677] [<ee08b799>] journal_add_journal_head+0x69/0x160 [jbd]
[297993.758699] [<c0259946>] evtchn_do_upcall+0xb6/0xf0
[297993.758709] [<c01057a6>] hypervisor_callback+0x46/0x4e
[297993.758718] [<ee0c18cc>] walk_page_buffers+0x1c/0x70 [ext3]
[297993.758740] [<ee0c4ad1>] ext3_ordered_writepage+0xe1/0x1a0 [ext3]
[297993.758759] [<ee0c1920>] bget_one+0x0/0x10 [ext3]
[297993.758774] [<c0157a38>] __writepage+0x8/0x30
[297993.758782] [<c0157ef4>] write_cache_pages+0x214/0x310
[297993.758791] [<c0157a30>] __writepage+0x0/0x30
[297993.758801] [<c0158010>] generic_writepages+0x20/0x30
[297993.758810] [<c0158069>] do_writepages+0x49/0x50
[297993.758817] [<c01980c3>] __writeback_single_inode+0x93/0x3c0
[297993.758828] [<c012bb9e>] del_timer_sync+0xe/0x20
[297993.758838] [<c02ff246>] schedule+0x356/0x900
[297993.758847] [<c01f5d9d>] _atomic_dec_and_lock+0x3d/0x70
[297993.758857] [<c019877e>] sync_sb_inodes+0x17e/0x240
[297993.758866] [<c0198c49>] writeback_inodes+0x99/0xd0
[297993.758876] [<c0158715>] wb_kupdate+0x85/0xf0
[297993.758885] [<c0158ab0>] pdflush+0x0/0x260
[297993.758892] [<c0158bf8>] pdflush+0x148/0x260
[297993.758900] [<c0158690>] wb_kupdate+0x0/0xf0
[297993.758908] [<c0136312>] kthread+0x42/0x70
[297993.758915] [<c01362d0>] kthread+0x0/0x70
[297993.758922] [<c0105927>] kernel_thread_helper+0x7/0x10
[297993.758930] =======================
[297993.758934] Code: 02 01 c6 44 02 01 01 89 f9 0f b6 f1 64 a1 08 00 42 c0 8b 94 83 90 00 00 00 85 d2 74 72 8b 42 0c 85 c0 74 6b 8b 5a 0c 0f b7 42 0a <8b> 04 83 89 42 0c 89 fa 84 d2 74 2e 64 a1 08 00 42 c0 c1 e0 06
[297993.759012] EIP: [<c0176eec>] kmem_cache_alloc+0x5c/0xe0 SS:ESP 0069:e0003c00
[297993.759030] Kernel panic - not syncing: Fatal exception in interrupt

[16785.730498] BUG: unable to handle kernel paging request at virtual address 00100104
[16785.730517] printing eip:
[16785.730528] 2d463000 -> *pde = 00000000:45af1001
[16785.730533] 2d506000 -> *pme = 00000000:00000000
[16785.730540] Oops: 0000 [#1]
[16785.730543] SMP
[16785.730550] Modules linked in: xt_multiport iptable_filter ip_tables x_tables ipv6 evdev ext3 jbd mbcache dm_mirror dm_snapshot dm_mod fuse apparmor commonca p
[16785.730581] CPU: 0
[16785.730582] EIP: 0061:[<ee0c18c9>] Not tainted VLI
[16785.730584] EFLAGS: 00010286 (2.6.22-14-xen #1)
[16785.730607] EIP is at walk_page_buffers+0x19/0x70 [ext3]
[16785.730613] eax: 00000000 ebx: fffffffe ecx: ffffffff edx: 00100100
[16785.730619] esi: 00100100 edi: c18404b4 ebp: ffffffff esp: c2149e04
[16785.730625] ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0069
[16785.730631] Process pdflush (pid: 60, ti=c2148000 task=c20fcf90 task.ti=c2148 000)
[16785.730635] Stack: 00000000 c1143210 c1830b20 c18404b4 c1143210 c1143210 ee0c 4ad1 00001000
[16785.730652] 00000000 ee0c1920 00000000 c1830b20 c2149f70 c28b46f4 0000 0000 c2149f70
[16785.730666] 00000000 c0157a38 c1830b20 c0157ef4 00000000 0000000e c015 7a30 c28b46f4
[16785.730681] Call Trace:
[16785.730687] [<ee0c4ad1>] ext3_ordered_writepage+0xe1/0x1a0 [ext3]
[16785.730703] [<ee0c1920>] bget_one+0x0/0x10 [ext3]
[16785.730717] [<c0157a38>] __writepage+0x8/0x30
[16785.730727] [<c0157ef4>] write_cache_pages+0x214/0x310
[16785.730736] [<c0157a30>] __writepage+0x0/0x30
[16785.730745] [<c0158010>] generic_writepages+0x20/0x30

[16785.730753] [<c0158069>] do_writepages+0x49/0x50
[16785.730760] [<c01980c3>] __writeback_single_inode+0x93/0x3c0
[16785.730770] [<c012bb9e>] del_timer_sync+0xe/0x20
[16785.730780] [<c02ff246>] schedule+0x356/0x900
[16785.730789] [<c019877e>] sync_sb_inodes+0x17e/0x240
[16785.730799] [<c0198c49>] writeback_inodes+0x99/0xd0
[16785.730807] [<c0158715>] wb_kupdate+0x85/0xf0
[16785.730816] [<c0158ab0>] pdflush+0x0/0x260
[16785.730823] [<c0158bf8>] pdflush+0x148/0x260
[16785.730830] [<c0158690>] wb_kupdate+0x0/0xf0
[16785.730838] [<c0136312>] kthread+0x42/0x70
[16785.730846] [<c01362d0>] kthread+0x0/0x70
[16785.730852] [<c0105927>] kernel_thread_helper+0x7/0x10
[16785.730861] =======================
[16785.730864] Code: 74 f0 31 c0 39 cb 5b 0f 92 c0 c3 8d b4 26 00 00 00 00 55 57 89 d7 56 53 83 ec 08 89 0c 24 31 c9 89 44 24 04 8b 6a 14 8d 5c 0d 00 <8b> 72 04 3b 1c 24 76 2f 89 d8 29 e8 3b 44 24 1c 73 25 8b 44 24
[16785.730938] EIP: [<ee0c18c9>] walk_page_buffers+0x19/0x70 [ext3] SS:ESP 0069: c2149e04

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Matt,

Sorry for the delayed response. I don't suppose you'd be willing to test the latest xen kernel in the Hardy Alpha release, linux-image-2.6.24-11-xen, and verify if this is still an issue? Thanks.

Changed in linux-source-2.6.22:
status: New → Incomplete
Revision history for this message
Matt Mackall (mpm-selenic) wrote : Re: [Bug 190010] Re: Random oopsen with Xen on Ubuntu Gutsy

Original user reports:

Yeah, I upgraded to Hardy and then had even bigger problems, where the
system crashed from kswapd0 using up all the CPU, just from starting a
VM. I think the Xen kernel may be rather horked in Hardy.

--
Mathematics is the supreme nostalgia of our time.

Changed in linux-source-2.6.22:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → High
status: Incomplete → Triaged
Revision history for this message
Matt Mackall (mpm-selenic) wrote :
Download full text (8.2 KiB)

Looks like this problem might have been related to running 32-bit
kernel/userspace on a 64-bit machine. Switching everything to pure
64-bit seems to have eliminated the problem.

On Thu, 2008-03-27 at 22:08 +0000, Leann Ogasawara wrote:
> ** Changed in: linux (Ubuntu)
> Sourcepackagename: linux-source-2.6.22 => linux
> Importance: Undecided => High
> Assignee: (unassigned) => Ubuntu Kernel Team (ubuntu-kernel-team)
> Status: Incomplete => Triaged
>
> --
> Random oopsen with Xen on Ubuntu Gutsy
> https://bugs.launchpad.net/bugs/190010
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Source Package "linux" in Ubuntu: Triaged
>
> Bug description:
> Binary package hint: linux-image-2.6.22-14-xen
>
> I've got an AMD64 machine that's exhibiting random oopses and database
> corruption using 2.6.22-14 kernel from Ubuntu Gutsy and Xen
> 3.1.0-0ubuntu18 also from Gutsy,
>
> Running the same database without Xen under heavy load appears to work
> fine. Machine has 3G of ECC and memtest86 works fine as well. Also,
> machine ran the database just fine for a year before (unfortunately
> simultaneously) switching to Gutsy and Xen.
>
> Any suggestions?
>
> MySQL said:
>
> The relevant bit of the logs is this:
>
> Jan 10 23:46:26 vegguide mysqld[22376]: InnoDB: Database page corruption on disk or a failed
> Jan 10 23:46:26 vegguide mysqld[22376]: InnoDB: file read of page 1559.
> Jan 10 23:46:26 vegguide mysqld[22376]: InnoDB: You may have to recover from a backup.
>
> A couple of the oopsen pasted below:
>
> [297993.758358] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000001
> [297993.758381] printing eip:
> [297993.758392] 006c6000 -> *pde = 00000000:6d67f001
> [297993.758398] 27f46000 -> *pme = 00000000:00000000
> [297993.758405] Oops: 0000 [#1]
> [297993.758408] SMP
> [297993.758415] Modules linked in: af_packet xt_multiport iptable_filter ip_tables x_tables ipv6 evdev ext3 jbd mbcach e dm_mirror dm_snapshot dm_mod fuse apparmor commoncap
> [297993.758448] CPU: 0
> [297993.758449] EIP: 0061:[<c0176eec>] Not tainted VLI
> [297993.758452] EFLAGS: 00010002 (2.6.22-14-xen #1)
> [297993.758468] EIP is at kmem_cache_alloc+0x5c/0xe0
> [297993.758474] eax: 00000000 ebx: 00000001 ecx: 00000000 edx: c1bfe8a0
> [297993.758480] esi: 00000000 edi: 00000000 ebp: 00000020 esp: e0003c00
> [297993.758487] ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0069
> [297993.758493] Process pdflush (pid: 4538, ti=e0002000 task=c210ea60 task.ti=e0002000)
> [297993.758498] Stack: 00000000 c1c34f60 c028b35d 00000000 c20b4b40 00000003 c20b4b40 c028b35d
> [297993.758515] c12a7480 c20b4b40 00000003 c029fd40 c029f25c c013bdba c03d26a0 c2048000
> [297993.758532] c03ec4a8 c20b4b40 00000000 00000400 c0290b67 c2048000 00000000 003d0900
> [297993.758549] Call Trace:
> [297993.758555] [<c028b35d>] skb_clone+0x2d/0x250
> [297993.758567] [<c028b35d>] skb_clone+0x2d/0x250
> [297993.758575] [<c029fd40>] snap_rcv+0x0/0xa0
> [297993.758582] [<c029f25c>] llc_rcv+0xdc/0x2b0
> [297993.758589] [<c013bdba>] clocksource_get_next+0x3a/0x40
> [29799...

Read more...

Revision history for this message
Tim Gardner (timg-tpi) wrote :

I am optimistically marking this as fix released since your issue appears to have been solved.

Changed in linux:
status: Triaged → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.