EC2 oneiric "BUG: unable to handle kernel paging request at f57ba9a1"

Bug #884320 reported by Eric Hammond
38
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
Medium
Unassigned

Bug Description

I started a new EC2 instance of Ubuntu 11.10 Oneiric using ami-a7f539ce in us-east-1.

About two days later, it became non-responsive to ssh, http, etc.

I found the following kernel oops stack trace in /var/log/syslog.

I stopped/started the instance to get it running again, so the attached reports may not be the exact hardware on which it failed.

---------------------------------------------------------------------------------
Oct 31 13:44:25 a111 kernel: [145151.542222] BUG: unable to handle kernel paging request at f57ba9a1
Oct 31 13:44:25 a111 kernel: [145151.542248] IP: [<c0216200>] swap_count_continued.isra.17+0x190/0x1b0
Oct 31 13:44:25 a111 kernel: [145151.542268] *pdpt = 0000000021498027 *pde = 00000000009ba067 *pte = 0000000000000000
Oct 31 13:44:25 a111 kernel: [145151.542287] Oops: 0002 [#1] SMP
Oct 31 13:44:25 a111 kernel: [145151.542299] Modules linked in: xt_multiport iptable_filter ip_tables x_tables xfs acpiphp
Oct 31 13:44:25 a111 kernel: [145151.542333]
Oct 31 13:44:25 a111 kernel: [145151.542340] Pid: 5942, comm: apache2 Not tainted 3.0.0-12-virtual #20-Ubuntu
Oct 31 13:44:25 a111 kernel: [145151.542356] EIP: 0061:[<c0216200>] EFLAGS: 00010246 CPU: 0
Oct 31 13:44:25 a111 kernel: [145151.542365] EIP is at swap_count_continued.isra.17+0x190/0x1b0
Oct 31 13:44:25 a111 kernel: [145151.542374] EAX: f57ba9a1 EBX: 000009a1 ECX: ec6330c0 EDX: 0000003e
Oct 31 13:44:25 a111 kernel: [145151.542384] ESI: ecc500a0 EDI: 0000003e EBP: e1497d60 ESP: e1497d50
Oct 31 13:44:25 a111 kernel: [145151.542394] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
Oct 31 13:44:25 a111 kernel: [145151.542402] Process apache2 (pid: 5942, ti=e1496000 task=e1fed940 task.ti=e1496000)
Oct 31 13:44:25 a111 kernel: [145151.542413] Stack:
Oct 31 13:44:25 a111 kernel: [145151.542419] ed3e55e0 eae82300 000239a1 fffffff4 e1497d90 c02162ed 00000001 9375b045
Oct 31 13:44:25 a111 kernel: [145151.542449] 80000000 d30f6068 0000003e edc31000 3e400194 000239a1 00000000 e149c070
Oct 31 13:44:25 a111 kernel: [145151.542479] e1497d9c c0218bec 00100073 e1497e28 c0207483 9375b045 80000000 e1fed940
Oct 31 13:44:25 a111 kernel: [145151.542509] Call Trace:
Oct 31 13:44:25 a111 kernel: [145151.542519] [<c02162ed>] __swap_duplicate+0xcd/0x130
Oct 31 13:44:25 a111 kernel: [145151.542530] [<c0218bec>] swap_duplicate+0x1c/0x50
Oct 31 13:44:25 a111 kernel: [145151.542541] [<c0207483>] copy_pte_range+0x373/0x480
Oct 31 13:44:25 a111 kernel: [145151.542552] [<c0208e2f>] copy_page_range+0x15f/0x2d0
Oct 31 13:44:25 a111 kernel: [145151.542565] [<c014df48>] dup_mmap+0x1d8/0x310
Oct 31 13:44:25 a111 kernel: [145151.542575] [<c014e835>] dup_mm+0xd5/0x220
Oct 31 13:44:25 a111 kernel: [145151.542586] [<c0144f71>] ? sched_autogroup_fork+0x51/0x80
Oct 31 13:44:25 a111 kernel: [145151.542598] [<c0646923>] copy_mm+0x7b/0xc8
Oct 31 13:44:25 a111 kernel: [145151.542609] [<c014ef1d>] copy_process.part.27+0x56d/0xba0
Oct 31 13:44:25 a111 kernel: [145151.542620] [<c014f5ce>] copy_process+0x7e/0x90
Oct 31 13:44:25 a111 kernel: [145151.542631] [<c014f6d2>] do_fork+0xb2/0x2d0
Oct 31 13:44:25 a111 kernel: [145151.542642] [<c065a67d>] ? _raw_spin_lock+0xd/0x10
Oct 31 13:44:25 a111 kernel: [145151.542655] [<c02420a4>] ? do_fcntl+0x224/0x2c0
Oct 31 13:44:25 a111 kernel: [145151.542667] [<c01117b4>] sys_clone+0x34/0x40
---------------------------------------------------------------------------------

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: linux-image-3.0.0-12-virtual 3.0.0-12.20
ProcVersionSignature: Ubuntu 3.0.0-12.20-virtual 3.0.4
Uname: Linux 3.0.0-12-virtual i686
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 2011-10-31 16:16 seq
 crw-rw---- 1 root audio 116, 33 2011-10-31 16:16 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 1.23-0ubuntu3
Architecture: i386
ArecordDevices: Error: [Errno 2] No such file or directory
CurrentDmesg: [ 21.269721] ip_tables: (C) 2000-2006 Netfilter Core Team
Date: Mon Oct 31 16:43:14 2011
Ec2AMI: ami-a7f539ce
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-east-1d
Ec2InstanceType: c1.medium
Ec2Kernel: aki-805ea7e9
Ec2Ramdisk: unavailable
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 2011-10-31 16:16 seq
 crw-rw---- 1 root audio 116, 33 2011-10-31 16:16 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 1.23-0ubuntu3
Architecture: i386
ArecordDevices: Error: [Errno 2] No such file or directory
CurrentDmesg: [ 21.269721] ip_tables: (C) 2000-2006 Netfilter Core Team
DistroRelease: Ubuntu 11.10
Ec2AMI: ami-a7f539ce
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-east-1d
Ec2InstanceType: c1.medium
Ec2Kernel: aki-805ea7e9
Ec2Ramdisk: unavailable
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
Package: linux (not installed)
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0
ProcVersionSignature: Ubuntu 3.0.0-12.20-virtual 3.0.4
Tags: oneiric ec2-images
Uname: Linux 3.0.0-12-virtual i686
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dialout dip floppy plugdev sudo video

Revision history for this message
Eric Hammond (esh) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 884320

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Eric Hammond (esh)
description: updated
Eric Hammond (esh)
description: updated
Revision history for this message
Eric Hammond (esh) wrote : BootDmesg.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Eric Hammond (esh) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Eric Hammond (esh) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Eric Hammond (esh) wrote : ProcModules.txt

apport information

Revision history for this message
Eric Hammond (esh) wrote : UdevDb.txt

apport information

Revision history for this message
Eric Hammond (esh) wrote : UdevLog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Stefan Bader (smb) wrote :

There was one bug which has been haunting kernels since 2.6.37 (as reports indicated) which can cause bad dereferences under high(er) load. The related bug number would be bug #854050. A patch was found and got somewhat stuck on its way upstream.
So the fix only went into our tree quite late and was not included in any kernel updates until now. The patch would be:

commit cc1f777ce7526f1fdc25cdf7b99ea75adc9764ce
Author: Konrad Rzeszutek Wilk <email address hidden>
Date: Fri Sep 23 17:02:29 2011 -0400

    UBUNTU: SAUCE: x86/paravirt: Partially revert "remove lazy mode in interrupts"

[The upstream commit will have a different commit message, though]

Revision history for this message
Stefan Bader (smb) wrote :

Could you try the a 3.0.0-14 (or later kernel) to see whether this performs better? The patch mentioned above has not made it into updates, yet. But preview kernel package can be found at:

https://launchpad.net/~kernel-ppa/+archive/pre-proposed?field.series_filter=oneiric

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
Revision history for this message
Eric Hammond (esh) wrote :

Is this bug still on track to be fixed? What is the next step required? Should it have somebody assigned?

Revision history for this message
Stefan Bader (smb) wrote :

I had been asking specifically to try a coming kernel version which I suspect may contains a patch that fixes the problem. Without feedback to that it makes not sense to go on here. Thanks.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Eric Hammond (esh) wrote :

I am unable to reproduce the issue, but haven't been running many Oneiric AMIs so it may be something that doesn't happen a high percentage of the time.

If you have found a bug in the code and have created a fix for it, it seems a shame to not release it.

If you want me to run the new kernel to make sure it does not cause immediate new problems, I'd be happy to test it with my production Oneiric server. Please provide specific instructions on how this would be done in an EC2 instance.

Revision history for this message
Stefan Bader (smb) wrote :

I did not say that the patch (which was done upstream) will not get released. In fact it is in the kernel package 3.0.0-14.23 for Oneiric. If you run an updated kernel (the usual apt-get upgrade path), that contains the change. This was tracked in bug #854050 and I am suspecting that your problem may be the same. However the symptoms are not exactly the same. For that reason I did not want to mark the bug a duplicate right away.
So if you are running 3.0.0-14.23 or later and cannot reproduce the issue any more, then this was a duplicate of the other bug. If it still happens, then please provide a current log (if possible the whole messages starting with the last boot).

Revision history for this message
Eric Hammond (esh) wrote :

Stefan: Thanks for the clarification. It looks like I've been running 3.0.0-14.23 for 16 days and I haven't seen any further issues.

Based on your comment, I'll mark this bug a duplicate of bug #854050.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.