Thrashing and OOM during upgrade from 10.04 to Maverick

Bug #602261 reported by Matt Zimmerman
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
High
Andy Whitcroft

Bug Description

My symptoms are very similar to bug 458299, which is presently considered fixed.

During the upgrade from 10.04 to Maverick on 5 July, my system slowed to a crawl, thrashing so severely that I couldn't even move the mouse or ssh in. I waited for it to pass, rather than risking interrupting the process, but after hours of thrashing, the kernel killed first Chromium, then the Maverick upgrader. This left me with an incomplete upgrade which had to be manually recovered.

It's hard to confirm exactly when this occurred, because I couldn't do anything on the system at the time, but the last thing in the upgrader's term.log is:

Setting up evince-common (2.30.3-0ubuntu2) ...
Installing new version of config file /etc/apparmor.d/abstractions/evince ...

and this lines up with the OOM messages in dmesg (attached).

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: linux-image-2.6.35-6-generic 2.6.35-6.9
Regression: No
Reproducible: No
ProcVersionSignature: Ubuntu 2.6.35-6.9-generic 2.6.35-rc3
Uname: Linux 2.6.35-6-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: mdz 2694 F.... pulseaudio
 /dev/snd/controlC1: mdz 2694 F.... pulseaudio
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xfe020000 irq 49'
   Mixer name : 'Analog Devices AD1984'
   Components : 'HDA:11d41984,17aa20bb,00100400'
   Controls : 32
   Simple ctrls : 20
Card1.Amixer.info:
 Card hw:1 'Q9000'/'Logitech, Inc. QuickCam Pro 9000 at usb-0000:00:1a.7-3, high speed'
   Mixer name : 'USB Mixer'
   Components : 'USB046d:0990'
   Controls : 2
   Simple ctrls : 1
Card1.Amixer.values:
 Simple mixer control 'Mic',0
   Capabilities: cvolume cvolume-joined cswitch cswitch-joined penum
   Capture channels: Mono
   Limits: Capture 0 - 3072
   Mono: Capture 0 [0%] [18.00dB] [off]
Card29.Amixer.info:
 Card hw:29 'ThinkPadEC'/'ThinkPad Console Audio Control at EC reg 0x30, fw 7KHT24WW-1.08'
   Mixer name : 'ThinkPad EC 7KHT24WW-1.08'
   Components : ''
   Controls : 1
   Simple ctrls : 1
Card29.Amixer.values:
 Simple mixer control 'Console',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [on]
Date: Tue Jul 6 14:12:37 2010
Frequency: Once every few months.
HibernationDevice: RESUME=UUID=bc555036-0252-42e8-804b-b34dc22bbcd4
MachineType: LENOVO 6465CTO
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.35-6-generic root=UUID=305dde78-d20a-4248-aaf4-09447b7c5791 ro quiet splash
ProcEnviron:
 LC_COLLATE=C
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/zsh
RelatedPackageVersions: linux-firmware 1.37
SourcePackage: linux
WpaSupplicantLog:

dmi.bios.date: 01/21/2008
dmi.bios.vendor: LENOVO
dmi.bios.version: 7LETB0WW (2.10 )
dmi.board.name: 6465CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr7LETB0WW(2.10):bd01/21/2008:svnLENOVO:pn6465CTO:pvrThinkPadT61:rvnLENOVO:rn6465CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 6465CTO
dmi.product.version: ThinkPad T61
dmi.sys.vendor: LENOVO

Revision history for this message
Matt Zimmerman (mdz) wrote :
Matt Zimmerman (mdz)
summary: - apparmor causes thrashing and OOM during upgrade
+ Thrashing and OOM during upgrade from 10.04 to Maverick
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Matt,

If you could also please test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Pete Graner (pgraner)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
importance: Undecided → High
assignee: nobody → Andy Whitcroft (apw)
Pete Graner (pgraner)
tags: added: lucid
removed: kj-triage maverick needs-upstream-testing
Revision history for this message
Matt Zimmerman (mdz) wrote :

According to comments on IRC, this is suspected to be the result of tracing being left enabled, with large buffers allocated, after a ureadahead profiling run.

Revision history for this message
John Johansen (jjohansen) wrote :

I have been able to duplicate this with the ureadahead bug, in combination with a profile reload. The apparmor_parser depending on your set up is using between 50 - 100 MB to recompile policy, putting an addition stress on the VM. In addition when the policy is loaded the kernel will try to allocate a large number of consecutive pages before falling back to kvmalloc.

We have done a few things to reduce how apparmor is using memory, and how much it is using. The kernel has picked up a patch to reduce the amount of memory being kmalloced in continuous pages. This can reduce the pressure on the kernel memory subsystem.

User space has had some patches applied reducing peak memory usage. Both kernel and user space will be part of alpha3, and help reduce the chance that AppArmor can trigger this condition.

I have not been able to reproduce this bug with out ureadahed consuming large amounts of memory.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I am going to mark this bug as "Won't Fix" Maverick is EOL now and it is conjectured that the patches John referenced and that tracing was left on caused the problem. If this is still a problem on upgrades from Lucid to Precise, please file a new bug. Thanks.

Changed in linux (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.