Comment 13 for bug 929941

Revision history for this message
Stefan Bader (smb) wrote : Re: Kernel deadlock in scheduler on m2.{2,4}xlarge EC2 instance

Matt,

which commit is a bit complicated to say. Basically yes, the code is a merge between the 2.6.32 kernel code we have for 10.04 and the Xen patches SUSE had at that point in time. The "new" tree I am talking was an effort to pick the patches from a newer release and try to work out what is missing / has changed. Which is not that simple because the rebase their tree onto something (which Xen source I never was able to find out) and then refresh their patchset.

If you want to see yourself, you can find the current code at:
git://kernel.ubuntu.com/ubuntu/ubuntu-lucid.git
(check out the ec2 branch) and I have pushed the results of reworking the newer patchset to
git://kernel.ubuntu.com/smb/ubuntu-lucid.git
into the ec2-next branch there.

And IMO we do have ticket locks. Se drivers/xen/core/spinlocks.c in the current ec2 branch. Also the fact that you actually see interrupt counts for the spinlock IRQ. Compiling the ec2-next (maybe a bit optimistic name) branch and run that, you will notice that spinlock are now directly an event channel but also do not get incremented (because compiling with compat set to 3.0.2 disables the ticket lock code).

Ok, so at least that does rule out the hypervisor poll call to be the problem and we can go forward from there. And to repeat the answer to your last question: yes based on SUSE. Be careful when reading code in the ec2 tree. Is is a bit of a pain because it still contains all of the 2.6.32 upstream xen components, plus the SUSE (whatever xen version that is based on). So arch/x86/xen is not used for the ec2 kernel, but arch/x86/include/mach-xen/asm is as are copies of x86 files with -xen to them and some parts in drivers/xen (those pulled in by CONFIG_XEN).