Kernel deadlock in scheduler on multiple EC2 instance types
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-ec2 (Ubuntu) |
Fix Released
|
High
|
Stefan Bader |
Bug Description
SRU Justification:
Impact: The version of Xen patches we currently use for the ec2 kernel have a serious flaw in the handling of nested spinlocks. This can result in a complete deadlock under certain workloads.
Fix: The spinlock handling code has been substantially restructured in later versions of the patchset. The changes backport this but also enable the use of ticket-spinlocks (as we do now) when compiling with the compatibility level we use.
Testcase: Not easy to reproduce. But feedback with the patchset applied (see comment #32) look good.
--
After running for some indeterminate period of time, the 2.6.32-341-ec2 and 2.6.32-342-ec2 kernels stop responding when running on m2.2xlarge EC2 instances. No console output is emitted. Stack dumps gathered by examining CPU context information show that all VCPUs are stuck waiting on spinlocks. This could be a deadlock in the scheduling code.
ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-
ProcVersionSign
Uname: Linux 2.6.32-341-ec2 x86_64
Architecture: amd64
Date: Fri Feb 10 01:56:17 2012
Ec2AMI: ami-55dc0b3c
Ec2AMIManifest: (unknown)
Ec2Availability
Ec2InstanceType: m1.xlarge
Ec2Kernel: aki-427d952b
Ec2Ramdisk: unavailable
ProcEnviron:
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: linux-ec2
Related branches
visibility: | private → public |
Changed in linux-ec2 (Ubuntu): | |
status: | Incomplete → In Progress |
description: | updated |
Changed in linux-ec2 (Ubuntu): | |
status: | In Progress → Fix Committed |
Overnight an instance running 2.6.32-316 locked up. The stack traces are attached.