stray mappings left behind by xfs make xen kernels crash.

Bug #164904 reported by Andres Freund
10
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned
Gutsy
Invalid
Undecided
Unassigned
linux-source-2.6.22 (Ubuntu)
Fix Released
High
Unassigned
Gutsy
Fix Released
High
Unassigned

Bug Description

Binary package hint: linux-source-2.6.22

When xen in combination with xfs is used, the problem described in http://lkml.org/lkml/2007/10/12/298 leads quite fast to crashes with very hard to associate backtraces (as they just need to use vmap somewhere, if I have understood the problem correctly).
If you wish I can attach some of them textually, and more as images (I have an remote access card which allows to take images - quite often I could not directly access any consoles anymore).
After applying the patch out of this mail http://lkml.org/lkml/2007/10/12/310 the kernel survives till now (20 hours of generated high io load) when earlier under IO it survived for approx 30 minutes (sometimes didnt even boot up).

As this bug causes random crashes, can lead to data corruption, is hard to diagnose and the fix is obviously correct I would suggest adding the patch at least to the xen custom flavour if not to the normal kernel for gutsy.

diff -urNd linux-source-2.6.22-2.6.22/debian/binary-custom.d/xen/patchset.old/004-xfs_vmfree.patch linux-source-2.6.22-2.6.22/debian/binary-custom.d/xen/patchset/004-xfs_vmfree.patch
--- linux-source-2.6.22-2.6.22/debian/binary-custom.d/xen/patchset.old/004-xfs_vmfree.patch 1970-01-01 01:00:00.000000000 +0100
+++ linux-source-2.6.22-2.6.22/debian/binary-custom.d/xen/patchset/004-xfs_vmfree.patch 2007-11-23 21:11:12.000000000 +0100
@@ -0,0 +1,14 @@
+diff -urNd custom-source-xen{,old}/fs/xfs/linux-2.6/xfs_buf.c
+--- custom-source-xen.old/fs/xfs/linux-2.6/xfs_buf.c 2007-11-23 21:06:08.000000000 +0100
++++ custom-source-xen/fs/xfs/linux-2.6/xfs_buf.c 2007-11-23 21:05:32.000000000 +0100
+@@ -184,6 +184,10 @@
+ {
+ a_list_t *aentry;
+
++#ifdef CONFIG_XEN
++ vunmap(addr);
++ return;
++#endif
+ aentry = kmalloc(sizeof(a_list_t), GFP_NOWAIT);
+ if (likely(aentry)) {
+ spin_lock(&as_lock);

Tags: cft-2.6.27 xen
Revision history for this message
Andres Freund (andres-anarazel) wrote :

Copy of my mail to kernel-team:

  Show Details
  Hi,

after reading through https://wiki.ubuntu.com/StableReleaseUpdates I think
that bug 164904 is appropriate for gutsy.
Problem:
- XFS delays vunmap and leaves stale mappings. This is considered wrong even
in normal kernel. But as Xen uses those mappings memory corruption can occur

Why I think its appropriate for gutsy
- It causes, sometimes silent, data-corruption and crashes
- Hard to diagnose as the backtraces are quite random
- Obviously correct fix, as the fix is the fallback currently used in tight
memory conditions.
- Patch is applied upstream

Any other Information needed?

Greetings,

Andres

Revision history for this message
Andres Freund (andres-anarazel) wrote :

As additional info, here is the git commit:
commit 7f015072348a14f16d548be557ee58c5c55df0aa
Author: Jeremy Fitzhardinge <email address hidden>
Date: Wed Oct 17 13:55:03 2007 +1000

    [XFS] eagerly remove vmap mappings to avoid upsetting Xen

    XFS leaves stray mappings around when it vmaps memory to make it virtually
    contigious. This upsets Xen if one of those pages is being recycled into a
    pagetable, since it finds an extra writable mapping of the page.

    This patch solves the problem in a brute force way, by making XFS always
    eagerly unmap its mappings.

    SGI-PV: 971902
    SGI-Modid: xfs-linux-melb:xfs-kern:29886a

    Signed-off-by: Jeremy Fitzhardinge <email address hidden>
    Signed-off-by: David Chinner <email address hidden>
    Signed-off-by: Tim Shimmin <email address hidden>

Changed in linux-source-2.6.22:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → High
status: New → Triaged
assignee: nobody → ubuntu-kernel-team
importance: Undecided → High
milestone: none → gutsy-updates
status: New → Triaged
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Andres,

I went ahead and approved with for a Gutsy SRU. Also, I just wanted to let you know this patch is already merged in the Hardy Heron kernel that was released for testing, linux-image-2.6.24-1-generic. Let me know if you need instructions on how to install. Thanks!

Changed in linux:
status: New → Fix Released
Revision history for this message
Andres Freund (andres-anarazel) wrote :

H Leann,

Thanks for approving.

But the 2.6.24 kernel doesnt help me, does it? I think the problem is cosmetic for -generic kernels as far as I read/understood the code, ie. only relevant for xen kernels (and there is no -xen flavour of 2.6.24 yet). Secondly, I dont want to use 2.6.24 on a production machine...

Thanks.

Revision history for this message
Tobias Junghans (tobydox) wrote :

same problem here. PLEASE upload a new linux-image-package with the above patch applied!

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Reassigning to the kernel team to request a xen image for 2.6.24 be made. Also, I'l going to close the Gutsy task against the 'linux' package. The reason is that Gutsy is a valid target for linux-source-2.6.22. However, the 'linux' package is meant for Hardy kernels and later so having a Gutsy task for it doesn't make sense. Thanks!

Changed in linux:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → Medium
status: Fix Released → Triaged
status: New → Invalid
Revision history for this message
Tobias Junghans (tobydox) wrote :

Why hasn't been this fixed in 2.6.22-14.47?

Revision history for this message
pinus (pinus) wrote :

I discovered 3 frozen systems in the last two days, all in combination with I/O errors. The xfs_repair discovered bad inodes. Is it likely that the patch results in a updated Kernel?

Revision history for this message
Peter Krenn (peter-krenn) wrote :

I tried this patch but it didn't work for me. I reinstalled the system with ext3 and also migrated all the domUs to ext3. But again the system keeps crashing like you can see in the attached log-file. Could it be that there is some similar problem with ext3 or do I have some totally different issues?

Revision history for this message
Tobias Junghans (tobydox) wrote :

I'm using the Debian-Etch-Xen-Kernel again (2.6.18) - it works rock solid (uptime of more than 100 days etc.) compared to the Ubuntu-Xen-Kernel which crashed for me almost every day, even without ext3.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Andres,

Care to test the latest linux-image-2.6.24-11-xen that was released? Thanks.

Changed in linux:
status: Triaged → Incomplete
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Andres Freund (andres-anarazel) wrote :

Fixed in 2.6.22.xy - and in all following releases. Sorry for not responding.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Andres,

Thanks for the feedback. Since you are the original bug reporter I'm marking this "Fix Released". Thanks.

Changed in linux-source-2.6.22:
status: Triaged → Fix Released
status: Triaged → Fix Released
Changed in linux:
status: Incomplete → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.