multipath crash generating core dump

Bug #1687004 reported by Rafael David Tinoco
28
This bug affects 2 people
Affects Status Importance Assigned to Milestone
multipath-tools (Ubuntu)
Fix Released
Medium
Unassigned
Trusty
Fix Released
Medium
Unassigned

Bug Description

[Impact]

 * mutexes are being allocated from memory with no checks and, if allocation fails, it causes these kind of segfaults later in the code execution.
 expect multipath daemon to crash and not run any checkers on path groups.
 * not checking path groups, in an event of failure, the mpath won't change path prios.
 * openstack relies on flushing device maps frequently when using iscsi.

[Test Case]

 * i'm fixing this based on a dump analysis and not on reproduction.
 * if you disallow memory overcommit - facilitating memory exhaustion - you would be able to reproduce that by stressing multipathd with paths being flushed, but that is theory only.

[Regression Potential]

* the patch is changing the locking mechanism for log thread, based on upstream commit.

* major change is to use the mutexes from stack instead of allocating from heap.

* multipath log thread could not work as designed.

* tested by reported and reported to be good.

* What releases are affected ?

 The following releases already got the fix
 - Xenial/Yakkety/Zesty/Artful

 Note that Debian also has the fix.
 Meaning that ONLY Trusty is affected by this bug.

* This SRU contained fixes for 2 LP bugs:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1687004
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1695789

[Other Info]

It was brought to my attention that:

multipath-tools: 0.4.9-3ubuntu7.15

Faced a crash and generated a dump.

## multipath (trusty) crashed and its dump shows:

(gdb) bt full
#0 __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66
        __PRETTY_FUNCTION__ = "__pthread_mutex_lock"
        type = 0
#1 0x00007f48700b606e in flush_logqueue () at log_pthread.c:39
        empty = 0
#2 0x00007f48700b611b in log_thread (et=0x0) at log_pthread.c:57
No locals.
#3 0x00007f4870964184 in start_thread (arg=0x7f4870d8b700) at pthread_create.c:312
        __res = <optimized out>
        pd = 0x7f4870d8b700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139949107623680, -3200163692152804016, 0, 0, 139949107624384, 139949107623680, 3244383534590274896,
                3244383107817352528}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#4 0x00007f486fdb537d in __ecvt_r (value=9.532824124368238e-130, ndigit=0, decpt=0x0, sign=0x0, buf=0x7f4870d8b9c0 "\220R\267pH\177", len=139949107623680)
    at efgcvt_r.c:218
        d = 0
        f = 3.2378592100206092e-319
        exponent = 1893250816
#5 0x0000000000000000 in ?? ()
No symbol table info available.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

(gdb) bt
#0 __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66
#1 0x00007f48700b606e in flush_logqueue () at log_pthread.c:39
#2 0x00007f48700b611b in log_thread (et=0x0) at log_pthread.c:57
#3 0x00007f4870964184 in start_thread (arg=0x7f4870d8b700) at pthread_create.c:312
#4 0x00007f486fdb537d in __ecvt_r (value=9.532824124368238e-130, ndigit=0, decpt=0x0, sign=0x0, buf=0x7f4870d8b9c0 "\220R\267pH\177", len=139949107623680)
    at efgcvt_r.c:218
#5 0x0000000000000000 in ?? ()
(gdb) frame 3
#3 0x00007f4870964184 in start_thread (arg=0x7f4870d8b700) at pthread_create.c:312
warning: Source file is more recent than executable.
312 THREAD_SETMEM (pd, result, CALL_THREAD_FCT (pd));
(gdb) ptype pd
type = struct pthread {
    union {
        tcbhead_t header;
        void *__padding[24];
    };
    list_t list;
    pid_t tid;
    pid_t pid;
    void *robust_prev;
    struct robust_list_head robust_head;
    struct _pthread_cleanup_buffer *cleanup;
    struct pthread_unwind_buf *cleanup_jmp_buf;
    int cancelhandling;
    int flags;
    struct pthread_key_data specific_1stblock[32];
    struct pthread_key_data *specific[32];
    _Bool specific_used;
    _Bool report_events;
    _Bool user_stack;
    _Bool stopped_start;
    int parent_cancelhandling;
    int lock;
    int setxid_futex;
    hp_timing_t cpuclock_offset;
    struct pthread *joinid;
    void *result;
    struct sched_param schedparam;
    int schedpolicy;
    void *(*start_routine)(void *);
    void *arg;
    td_eventbuf_t eventbuf;
    struct pthread *nextevent;
    struct _Unwind_Exception exc;
    void *stackblock;
    size_t stackblock_size;
    size_t guardsize;
    size_t reported_guardsize;
    struct priority_protection_data *tpp;
    struct __res_state res;
    char end_padding[];
} *

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

(gdb) print pd->start_routine
$1 = (void *(*)(void *)) 0x7f48700b60bd <log_thread>
(gdb) frame 2
#2 0x00007f48700b611b in log_thread (et=0x0) at log_pthread.c:57
57 flush_logqueue();
(gdb) list
52 while (1) {
53 pthread_mutex_lock(logev_lock);
54 pthread_cond_wait(logev_cond, logev_lock);
55 pthread_mutex_unlock(logev_lock);
56
57 flush_logqueue();
58 }
59 return NULL;
60 }
61
(gdb) frame 1
#1 0x00007f48700b606e in flush_logqueue () at log_pthread.c:39
39 pthread_mutex_lock(logq_lock);
(gdb) list
34 static void flush_logqueue (void)
35 {
36 int empty;
37
38 do {
39 pthread_mutex_lock(logq_lock);
40 empty = log_dequeue(la->buff);
41 pthread_mutex_unlock(logq_lock);
42 if (!empty)
43 log_syslog(la->buff);
(gdb) print logq_lock
$2 = (pthread_mutex_t *) 0x0
(gdb) frame 2
#2 0x00007f48700b611b in log_thread (et=0x0) at log_pthread.c:57
57 flush_logqueue();
(gdb) print logev_lock
$3 = (pthread_mutex_t *) 0x0
(gdb) print logev_cond
$4 = (pthread_cond_t *) 0x0

From pthread implementation:

/* Lock a mutex. */
extern int pthread_mutex_lock (pthread_mutex_t *__mutex)
__THROWNL __nonnull ((1));

# define __THROWNL __attribute__ ((__nothrow__))
# define __nonnull(params) __attribute__ ((__nonnull__ params))

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

The function attributes don't play a role here because this code was called by a callback and they would warn only at compile time.

(gdb) list log_thread_start
62 void log_thread_start (pthread_attr_t *attr)
63 {
64 logdbg(stderr,"enter log_thread_start\n");
65
66 logq_lock = (pthread_mutex_t *) malloc(sizeof(pthread_mutex_t));
67 logev_lock = (pthread_mutex_t *) malloc(sizeof(pthread_mutex_t));
68 logev_cond = (pthread_cond_t *) malloc(sizeof(pthread_cond_t));
69
70 pthread_mutex_init(logq_lock, NULL);
71 pthread_mutex_init(logev_lock, NULL);
72 pthread_cond_init(logev_cond, NULL);
73
74 if (log_init("multipathd", 0)) {
75 fprintf(stderr,"can't initialize log buffer\n");
76 exit(1);
77 }
78 pthread_create(&log_thr, attr, log_thread, NULL);
79
80 return;
81 }

Meaning that the code:

71 pthread_mutex_init(logev_lock, NULL);

likely failed to initialize the "logev_lock" pthread_mutex and THAT caused the error and segfault later.
(From this analysis, this is the real problem to solve)

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

There is a commit:

commit 123cb42439a3f31bef49adb4325a16cb9b8c0aeb
Author: Hannes Reinecke <email address hidden>
Date: Tue Jan 8 14:54:03 2013 +0100

Make log_pthread more robust

We don't need to allocate memory for mutexes, we can just
be using static variables. And valgrind complained about
logqueue flush from shutdown, so don't do this.
The normal shutdown process should be flushing the log
queue anyway.

Signed-off-by: Hannes Reinecke <email address hidden>

That takes care of what happened in this crash.

I'm providing this commit as a SRU attempt soon.

tags: added: sts
Changed in multipath-tools (Ubuntu):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Rafael David Tinoco (inaddy)
milestone: none → ubuntu-14.04.5
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

inaddy@(mpathcrash):multipath-tools$ git tag --contains 123cb42439a3f31bef49adb4325a16cb9b8c0aeb
0.5.0
0.6.0
0.6.1
0.6.2
0.6.3
0.6.4
0.7.0
0.7.1

Only affects Trusty. Depends on:

From 472791a91c763aa71e5f85ef38af01fe07d427c3 Mon Sep 17 00:00:00 2001
From: Christophe Varoqui <email address hidden>
Date: Sat, 12 Jan 2013 14:07:09 +0100
Subject: [PATCH] Add missing log functions from Hannes tree

log_thread_flush()
log_reset()

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I'm providing the following PPA:

https://launchpad.net/~inaddy/+archive/ubuntu/lp1687004

With a multipath-tools hotfixed version. Attaching debdiffs soon. I won't subscribe sponsors yet just because I want to get feedback about this fix before doing it.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I'll be providing a fix for this bug together with the fix for:

https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1695789

tags: added: sts-sponsor
tags: removed: sts-sponsor
description: updated
Eric Desrochers (slashd)
Changed in multipath-tools (Ubuntu Trusty):
assignee: nobody → Rafael David Tinoco (inaddy)
status: New → In Progress
importance: Undecided → Medium
Changed in multipath-tools (Ubuntu):
status: In Progress → Fix Released
assignee: Rafael David Tinoco (inaddy) → nobody
Revision history for this message
Eric Desrochers (slashd) wrote :

Uploaded in Trusty upload queue, now waiting for SRU approval in order to start building in trusty-proposed.

- Eric

tags: added: sts-sru-needed
Eric Desrochers (slashd)
description: updated
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Rafael, or anyone else affected,

Accepted multipath-tools into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/multipath-tools/0.4.9-3ubuntu7.16 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in multipath-tools (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Change of SRU verification policy

As part of a recent change in the Stable Release Update verification policy we would like to inform that for a bug to be considered verified for a given release a verification-done-$RELEASE tag needs to be added to the bug where $RELEASE is the name of the series the package that was tested (e.g. verification-done-xenial). Please note that the global 'verification-done' tag can no longer be used for this purpose.

Thank you!

Revision history for this message
Eric Desrochers (slashd) wrote :

## Verification ##

As Rafael has previous stated the fix is based on a dump analysis and not on reproduction.

More details is provided in LP: #1695789.

- Eric

tags: added: verification-done-trusty
removed: verification-needed
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for multipath-tools has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package multipath-tools - 0.4.9-3ubuntu7.16

---------------
multipath-tools (0.4.9-3ubuntu7.16) trusty; urgency=medium

  * Fixes multipathd crash on usa after free (mpp->alias) (LP: #1695789)
     + d/p/strdup_multipath_alias.patch
  * Fixes multipathd crash on log thread initialization (LP: #1687004)
     + d/p/add-missing-log-functions-from-hannes-tree.patch
     + d/p/make-log-pthread-more-robust.patch

 -- Rafael David Tinoco <email address hidden> Mon, 01 May 2017 17:09:08 +0000

Changed in multipath-tools (Ubuntu Trusty):
status: Fix Committed → Fix Released
Changed in multipath-tools (Ubuntu Trusty):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in multipath-tools (Ubuntu):
milestone: ubuntu-14.04.5 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.