Kernel panic : mempolicy potential use-after-free on server running mongodb

Bug #1233175 reported by Louis Bouchard
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
In Progress
High
Jay Vosburgh
Precise
Won't Fix
High
Jay Vosburgh

Bug Description

PID: 21767 TASK: ffff8800874bdc00 CPU: 12 COMMAND: "mongod"
 #0 [ffff880657cc3820] machine_kexec at ffffffff810393da
 #1 [ffff880657cc3890] crash_kexec at ffffffff810b53f8
 #2 [ffff880657cc3960] oops_end at ffffffff8165e528
 #3 [ffff880657cc3990] die at ffffffff810178d8
 #4 [ffff880657cc39c0] do_trap at ffffffff8165de94
 #5 [ffff880657cc3a20] do_invalid_op at ffffffff81014f65
 #6 [ffff880657cc3ac0] invalid_op at ffffffff8166796b
    [exception RIP: slab_node+46]
    RIP: ffffffff8115a66e RSP: ffff880657cc3b70 RFLAGS: 00010097
    RAX: 0000000000000000 RBX: ffff880657802c00 RCX: 00000000e62f6aef
    RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff880abf18a288
    RBP: ffff880657cc3b80 R8: 0000000000000001 R9: 0000000100100010
    R10: 0000000000000000 R11: 0000000000000022 R12: 0000000000000002
    R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000020
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
 #7 [ffff880657cc3b88] get_any_partial at ffffffff816496a0
 #8 [ffff880657cc3c18] __slab_alloc at ffffffff816498cf
 #9 [ffff880657cc3cc8] __kmalloc_node_track_caller at ffffffff81166f07
#10 [ffff880657cc3d38] __alloc_skb at ffffffff815364c8
#11 [ffff880657cc3d88] __netdev_alloc_skb at ffffffff81536b14
#12 [ffff880657cc3da8] enic_rq_alloc_buf at ffffffffa005484c [enic]
#13 [ffff880657cc3e08] enic_poll_msix at ffffffffa00559ff [enic]
#14 [ffff880657cc3e58] net_rx_action at ffffffff81545274
#15 [ffff880657cc3ec8] __do_softirq at ffffffff8106f5f8
#16 [ffff880657cc3f38] call_softirq at ffffffff81667bec
#17 [ffff880657cc3f50] do_softirq at ffffffff81016305
#18 [ffff880657cc3f70] irq_exit at ffffffff8106f9de
#19 [ffff880657cc3f80] do_IRQ at ffffffff816684a3
--- <IRQ stack> ---
#20 [ffff880544d8bd48] ret_from_intr at ffffffff8165d82e
    [exception RIP: __slab_free+737]
    RIP: ffffffff81649467 RSP: ffff880544d8bdf8 RFLAGS: 00000202
    RAX: 0000000000000001 RBX: ffffffffff0a0210 RCX: 0000000180aa00a9
    RDX: 0000000180aa00aa RSI: ffffea002afc6201 RDI: ffff880657806200
    RBP: ffff880544d8bea8 R8: 0000000000000001 R9: 0000000000000000
    R10: ffff8800874be020 R11: ffff8800874be030 R12: ffff880544d8be33
    R13: 000000000000000d R14: ffffffff81191895 R15: ffff880544d8bdb8
    ORIG_RAX: ffffffffffffff54 CS: 0010 SS: 0018
#21 [ffff880544d8be30] __change_pid at ffffffff81087dca
#22 [ffff880544d8beb0] kmem_cache_free at ffffffff81163634
#23 [ffff880544d8bef0] __mpol_put at ffffffff81159937
#24 [ffff880544d8bf00] do_exit at ffffffff8106c75c
#25 [ffff880544d8bf70] sys_exit at ffffffff8106caf7
#26 [ffff880544d8bf80] system_call_fastpath at ffffffff81665982
    RIP: 00007f6f476b8f37 RSP: 00007f68cbcfdbb0 RFLAGS: 00000202
    RAX: 000000000000003c RBX: ffffffff81665982 RCX: ffffffffffffffff
    RDX: 00007f68cbcfe700 RSI: 00007f6f478c9250 RDI: 0000000000000000
    RBP: 0000000000000000 R8: 00007f68cbcfe700 R9: 00007f68e82a0370
    R10: 000000007fffffff R11: 0000000000000246 R12: ffffffff8106caf7
    R13: ffff880544d8bf78 R14: 0000000000000003 R15: 00007f68f8744a10
    ORIG_RAX: 000000000000...

Louis Bouchard (louis)
Changed in linux (Ubuntu):
status: New → Triaged
assignee: nobody → Louis Bouchard (louis-bouchard)
importance: Undecided → High
Louis Bouchard (louis)
Changed in linux (Ubuntu Precise):
status: New → Triaged
assignee: nobody → Louis Bouchard (louis-bouchard)
importance: Undecided → High
tags: added: kernel-da-key precise
Revision history for this message
Louis Bouchard (louis) wrote :
Download full text (10.9 KiB)

Here is an analysis of the kernel core dump captured for this issue :

crash> sys
     KERNEL: vmlinux-3.2.0-38-generic
    DUMPFILE: VmCore
        CPUS: 24
        DATE: Wed Sep 18 22:34:35 2013
      UPTIME: 1 days, 11:33:14
LOAD AVERAGE: 2.04, 2.09, 2.16
       TASKS: 6656
    NODENAME: ddb-mongo41
     RELEASE: 3.2.0-38-generic
     VERSION: #61-Ubuntu SMP Tue Feb 19 12:18:21 UTC 2013
     MACHINE: x86_64 (2533 Mhz)
      MEMORY: 47.9 GB
       PANIC: "[127932.907100] kernel BUG at /build/buildd/linux-3.2.0/mm/mempolicy.c:1638!"
         PID: 21767
     COMMAND: "mongod"
        TASK: ffff8800874bdc00 [THREAD_INFO: ffff880544d8a000]
         CPU: 12
       STATE: EXIT_DEAD (PANIC)

Analysis
========
This is the backtrace of the panic task :

crash> bt
PID: 21767 TASK: ffff8800874bdc00 CPU: 12 COMMAND: "mongod"
 #0 [ffff880657cc3820] machine_kexec at ffffffff810393da
 #1 [ffff880657cc3890] crash_kexec at ffffffff810b53f8
 #2 [ffff880657cc3960] oops_end at ffffffff8165e528
 #3 [ffff880657cc3990] die at ffffffff810178d8
 #4 [ffff880657cc39c0] do_trap at ffffffff8165de94
 #5 [ffff880657cc3a20] do_invalid_op at ffffffff81014f65
 #6 [ffff880657cc3ac0] invalid_op at ffffffff8166796b
    [exception RIP: slab_node+0x2e]
    RIP: ffffffff8115a66e RSP: ffff880657cc3b70 RFLAGS: 00010097
    RAX: 0000000000000000 RBX: ffff880657802c00 RCX: 00000000e62f6aef
    RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff880abf18a288
    RBP: ffff880657cc3b80 R8: 0000000000000001 R9: 0000000100100010
    R10: 0000000000000000 R11: 0000000000000022 R12: 0000000000000002
    R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000020
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
 #7 [ffff880657cc3b88] get_any_partial at ffffffff816496a0
 #8 [ffff880657cc3c18] __slab_alloc at ffffffff816498cf
 #9 [ffff880657cc3cc8] __kmalloc_node_track_caller at ffffffff81166f07
#10 [ffff880657cc3d38] __alloc_skb at ffffffff815364c8
#11 [ffff880657cc3d88] __netdev_alloc_skb at ffffffff81536b14
#12 [ffff880657cc3da8] enic_rq_alloc_buf at ffffffffa005484c [enic]
#13 [ffff880657cc3e08] enic_poll_msix at ffffffffa00559ff [enic]
#14 [ffff880657cc3e58] net_rx_action at ffffffff81545274
#15 [ffff880657cc3ec8] __do_softirq at ffffffff8106f5f8
#16 [ffff880657cc3f38] call_softirq at ffffffff81667bec
#17 [ffff880657cc3f50] do_softirq at ffffffff81016305
#18 [ffff880657cc3f70] irq_exit at ffffffff8106f9de
#19 [ffff880657cc3f80] do_IRQ at ffffffff816684a3
--- <IRQ stack> ---
#20 [ffff880544d8bd48] ret_from_intr at ffffffff8165d82e
    [exception RIP: __slab_free+0x2e1]
    RIP: ffffffff81649467 RSP: ffff880544d8bdf8 RFLAGS: 00000202
    RAX: 0000000000000001 RBX: ffffffffff0a0210 RCX: 0000000180aa00a9
    RDX: 0000000180aa00aa RSI: ffffea002afc6201 RDI: ffff880657806200
    RBP: ffff880544d8bea8 R8: 0000000000000001 R9: 0000000000000000
    R10: ffff8800874be020 R11: ffff8800874be030 R12: ffff880544d8be33
    R13: 000000000000000d R14: ffffffff81191895 R15: ffff880544d8bdb8
    ORIG_RAX: ffffffffffffff54 CS: 0010 SS: 0018
#21 [ffff880544d8be30] __change_pid at ffffffff81087dca
#22 [ffff880544d8beb0] kmem_cache_free at ffffffff81163634
#23 [ffff...

Louis Bouchard (louis)
Changed in linux (Ubuntu Precise):
status: Triaged → In Progress
penalvch (penalvch)
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
lois garcia (lois-garcia-f) wrote :

I have one server, previously affected by the bug, that has been stable for 8 days on 3.8.0-30-generic.

We also just provisioned 24 servers with 3.2.0-57-generic (not yet in production).

If I can provide any information to you that would help, please let me know through the ticket.

Revision history for this message
Louis Bouchard (louis) wrote :

@christopher,

Lois was provided with a custom kernel that includes a patch to be tested. The patch comes from apw, following a kernel dump analysis that I did. From what we could gather, a possible race condition could be responsible for the panics.

So far, I have yet to get the confirmation from Lois that ONLY the kernel with the custom patch has fixed the problem, or that a newer kernel have stabilized the situation.

Since the 3.8.* kernel is available on precise, I'm not sure that identifying the specific commit that fixes the issue would be useful. Even if we identify the commit, a backport might not be posssible.

I would advise to go to the newer kernel

I'm sorry if this bug appears to be inactive, but the long period w/o any comment is caused by the fact that the issue does not happen on regular intervals

Changed in linux (Ubuntu):
status: Incomplete → Triaged
status: Triaged → In Progress
Revision history for this message
lois garcia (lois-garcia-f) wrote :

Gentlemen, so far, both the custom patched kernel and 3.8.0-30-generic have been stable. We will keep one server on this kernel, and a number of servers on 3.2.0-55-generic and on .57, so you'll have data to compare.

Revision history for this message
Munehisa Kamata (kamatam-amazon) wrote :

Hi,

We also have experienced this issue with 3.2.0-57-generic. Thanks to the core dump analysis by Louis Bouchard, I could notice that accessing current->mempolicy in interrupt context is totally bad idea, and then found the following commit.

 http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=e7b691b085fda913830e5280ae6f724b2a63c824

This fix was introduced in 3.6-rc1, that's why 3.8 kernel hasn't experienced this issue. Can you backport the fix to 12.04's 3.2 kernel?

Chris J Arges (arges)
Changed in linux (Ubuntu Precise):
assignee: Louis Bouchard (louis-bouchard) → Chris J Arges (arges)
Changed in linux (Ubuntu):
assignee: Louis Bouchard (louis-bouchard) → nobody
Revision history for this message
Chris J Arges (arges) wrote :

Can those affected by this issue test this build with the patch identified by @kamatam?
http://people.canonical.com/~arges/lp1233175/

Thanks!

Revision history for this message
Munehisa Kamata (kamatam-amazon) wrote :

Hi Chris,

Does anyone of you have a repro case? Although the patch itself is really straightforward, I don't have a reliable repro case of this race unfortunately.

Revision history for this message
Chris J Arges (arges) wrote :

A similar case is here:
http://<email address hidden>/msg351591.html

It seems like using NUMA with high load MongoDB workloads are factors in causing this crash.

Revision history for this message
Munehisa Kamata (kamatam-amazon) wrote :

Unfortunately, we don't have a repro case yet. Do you really need a repro case to proceed this?

Revision history for this message
Chris J Arges (arges) wrote :

A way to verify the patch is required for any SRU. A simple reproducer is always best, but if this problem occurs with high probability within a known amount of time then running for that known amount of time could also assist in validating and verifying the fix.

Revision history for this message
Munehisa Kamata (kamatam-amazon) wrote :

I will not be able to provide a reproducer of this immediately. If you agree, please keep this open until I can have it or someone comes here with his/her reproducer.

Chris J Arges (arges)
Changed in linux (Ubuntu Precise):
assignee: Chris J Arges (arges) → nobody
Jay Vosburgh (jvosburgh)
Changed in linux (Ubuntu):
assignee: nobody → Jay Vosburgh (jvosburgh)
Changed in linux (Ubuntu Precise):
assignee: nobody → Jay Vosburgh (jvosburgh)
Revision history for this message
Steve Langasek (vorlon) wrote :

The Precise Pangolin has reached end of life, so this bug will not be fixed for that release

Changed in linux (Ubuntu Precise):
status: In Progress → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.