multipath segmentation Fault (libmultipath: update waiter handling)

Bug #1354114 reported by Rafael David Tinoco
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
multipath-tools (Debian)
Fix Released
Unknown
multipath-tools (Ubuntu)
Fix Released
Undecided
Rafael David Tinoco
Precise
Fix Released
Undecided
Rafael David Tinoco
Trusty
Fix Released
Undecided
Rafael David Tinoco

Bug Description

[Impact]

 * Multipath can cause segmentation fault due to wrong code and can
   possibly cause user to loose access to multipath devices.

[Test Case]

 * To use multipath and wait for the problem to occur sometime (inevitable).

[Regression Potential]

 * Patch 1/4 tries to fix the issue. Patch 2/4 fixes the 1/4.
 * Patch 3/4 discovers 1/4 was no good. Patch 4/4 fixes 3/4.

 * Fix based on upstream code (96f8146) + subsequent patches.
 * Followed this code development until the issue was addressed.

[Other Info]

 * Original bug description:

----------------

It was brought to me (~inaddy) the following situation with multipathd:

#####

Program terminated with signal 6, Aborted.
#0 0x00007fbc6ae09445 in raise () from /lib/x86_64linuxgnu/
libc.so.6
(gdb) bt
#0 0x00007fbc6ae09445 in raise () from /lib/x86_64linuxgnu/
libc.so.6
#1 0x00007fbc6ae0cbab in abort () from /lib/x86_64linuxgnu/
libc.so.6
#2 0x00007fbc6ae0210e in ?? () from /lib/x86_64linuxgnu/
libc.so.6
#3 0x00007fbc6ae021b2 in __assert_fail () from /lib/x86_64linuxgnu/
libc.so.6
#4 0x00007fbc6b849efb in pthread_mutex_lock () from /lib/x86_64linuxgnu/
libpthread.so.0
#5 0x00007fbc6b1cba5f in free_waiter (data=0x1691de0) at waiter.c:44
#6 0x00007fbc6b1cc25a in waitevent (et=0x1691de0) at waiter.c:204
#7 0x00007fbc6b847e9a in start_thread () from /lib/x86_64linuxgnu/
libpthread.so.0
#8 0x00007fbc6aec54bd in clone () from /lib/x86_64linuxgnu/
libc.so.6
#9 0x0000000000000000 in ?? ()

--------------------------------------------------------------------------------------------
#5 0x00007fbc6b1cba5f in free_waiter (data=0x1691de0) at waiter.c:44
44 lock(wp>
vecs>
lock);
(gdb) print wp>
vecs>
lock
$1 = {mutex = 0x168c280, depth = 1}
In pthread_mutex_lock.c:62 there's an assert that fails:
#4 0x00007fbc6b849efb in __pthread_mutex_lock (mutex=0xfefefefefefefeff) at pthread_mutex_lock.c:62
62 assert (mutex>_
data._owner == 0);
In this run:
(gdb) p *wp>
vecs>
lock>
mutex
$3 = {_data = {lock = 1, __count = 0, __owner = 49, __nusers = 0, __kind = 0, __spins = 0, __list = {_prev = 0x0, __next = 0xffffffff}},
__size = "\001\000\000\000\000\000\000\000\061", '\000' <repeats 23 times>"\377, \377\377\377\000\000\000", __align = 1}
so __owner is 49 and not 0.
Note that 49 is somewhat strange; it's expected to be a pid_t obtained via
pid_t id = THREAD_GETMEM (THREAD_SELF, tid);

According to https://bugzilla.redhat.com/show_bug.cgi?id=570278 , this assert failure could be an expected behaviour if, for some reason the multipath code was trying to release a mutex that has already been freed.

The multipath-tools package is up to date (0.4.9-3ubuntu5)

I do not find obvious thing related in http://git.opensvc.com/gitweb.cgi?p=multipath-tools%2F.git except may be
http://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=commitdiff;h=5ee9f716549d913aeb9800041f78ee9c6a50d860

#####

In between Precise's version and Upstream there are the following patches touching waiter.c:

d887f4a = signal waiter thread to stop waiting on dm events
5ee9f71 = simplify multipath signal handlers
af4fd6d = Fix race condition in stop_waiter_thread()
e1fcc59 = multipath: clean up code for stopping the waiter threads
03ec4ef = multipath: fix shutdown crashes
4dfdaf2 = multipath: Update multipath device on show topology
c301a3f = Race condition when calling stop_waiter_thread()
96f8146 = libmultipath: update waiter handling

This specific one: 96f8146 (libmultipath: update waiter handling)

"""
The current 'waiter' structure accesses fields which belong
to the main 'mpp' structure, which has a totally different
lifetime.
"""

Shows that due to different lifetime between different structures, there can be use-after-free segfaults (what seems to be happening).

waiter.c:44 = lock(wp->vecs->lock);

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Attaching SRU proposal.

"""
Description: [PATCH] libmultipath: update waiter handling

The current 'waiter' structure accesses fields which belong
to the main 'mpp' structure, which has a totally different
lifetime. With this patch most of these dependencies are
removed and the 'waiter' structure can run independently
of the main 'mpp' structure, reducing the risk of
use-after-free faults.
"""

Judging by upstream fix:

# <email address hidden>:/bugs/00067428/sources/upstream$ git tag --contains 96f8146
0.5.0

All Ubuntu versions are affected:

# <email address hidden>:/bugs/00067428/sources/upstream$ rmadison multipath-tools
 multipath-tools | 0.4.8-14ubuntu4 | lucid | source, amd64, armel, i386, ia64, powerpc, sparc
 multipath-tools | 0.4.8-14ubuntu4.10.04.2 | lucid-updates | source, amd64, armel, i386, ia64, powerpc, sparc
 multipath-tools | 0.4.9-3ubuntu5 | precise | source, amd64, armel, armhf, i386, powerpc
 multipath-tools | 0.4.9-3ubuntu5.1 | precise-updates | source, amd64, armel, armhf, i386, powerpc
 multipath-tools | 0.4.9-3ubuntu7 | trusty | source, amd64, arm64, armhf, i386, powerpc, ppc64el
 multipath-tools | 0.4.9-3ubuntu8 | utopic | source, amd64, arm64, armhf, i386, powerpc, ppc64el

summary: - Precise multipath segmentation Fault
+ multipath segmentation Fault (libmultipath: update waiter handling)
Changed in multipath-tools (Ubuntu):
assignee: nobody → Rafael David Tinoco (inaddy)
status: New → Confirmed
description: updated
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

It looks like the fix above introduces regressions (actually other new bugs):

commit 96f81469ff993b6063bb8829d9b336590510466d
Author: Hannes Reinecke <email address hidden>
Date: Mon May 4 16:46:58 2009 +0200

    libmultipath: update waiter handling

    The current 'waiter' structure accesses fields which belong
    to the main 'mpp' structure, which has a totally different
    lifetime. With this patch most of these dependencies are
    removed and the 'waiter' structure can run independently
    of the main 'mpp' structure, reducing the risk of
    use-after-free faults.

    Signed-off-by: Hannes Reinecke <email address hidden>

Introduces this problem:

commit c301a3f09203edf91df5a9adf4e32ea2a7238cda
Author: Hannes Reinecke <email address hidden>
Date: Wed May 25 14:40:19 2011 +0200

    Race condition when calling stop_waiter_thread()

    We cannot access the waiter structure from other threads as
    the lifetime is totally different and it might be deleted
    at any time.
    So we better store the pthread id in the calling thread and
    just send a signal to the thread.

    References: bnc#642846

    Signed-off-by: Hannes Reinecke <email address hidden>

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

So probably to last commits touching libmultipath/waiter.c (and fixing this issue) would be:

1)

commit e1fcc5933ac44683cdee1a02304e1115abec3ff5
Author: Benjamin Marzinski <email address hidden>
Date: Sat May 19 01:37:03 2012 -0500

    multipath: clean up code for stopping the waiter threads

    The way multipathd currently stops the waiter threads needs some work.
    Right now they are stopped by being sent the SIGUSR1 signal. However their
    cleanup code assumes that they are being cancelled, just like all the other
    threads are. There's no reason for them to be so unnecessarily
    complicated and different from the other threads

    This patch does a couple of things. First, it removes the mutex from
    the event_thread. This wasn't doing anything. It was designed to protect
    the wp->mapname variable, which the waiter threads were checking to see
    if they should quit. However, the mutex was only ever being used by the
    thread itself, and it clearly didn't need to serialize with itself. Also,
    the function to clear the mapname, signal_waiter(), was set with
    pthread_cleanup_push(), which never got called early, since the threads
    weren't being cancelled. Thus, the mapname never got cleared
    until the pthreads were about to shut down.

    The patch also rips out all the signal stopping code, and just uses
    pthread_cancel. There already are cancellation points in the waiter
    thread code. Between the cancellation points, both explicit and implicit,
    and the fact that the waiter threads will never be killed except when the
    killer is holding the vecs lock, there shouldn't be any place where the
    waiter thread can access freed data.

    To make sure the waiter thread cleans itself up properly, the dmt
    has been moved into the event_thread structure, and is destroyed in
    free_waiter() if necessary.

    Signed-off-by: Benjamin Marzinski <email address hidden>

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

And 2)

commit af4fd6d4efc5dbd13daaf4117c4a95fc7a99eafb
Author: Hannes Reinecke <email address hidden>
Date: Tue Jan 8 14:54:08 2013 +0100

    Fix race condition in stop_waiter_thread()

    The signal handler might run before we had a chance to
    set the 'waiter' context to '0', so better do it previously.

    Signed-off-by: Hannes Reinecke <email address hidden>

Fixing the 1) race condition (introduced).

After all these commits we have other commits simplifying way multipath signal handlers work that might not be suitable for SRU. Since these commits are "fixing" this issue during multipath development now I'll separate wether all these changes are suitable for a SRU for Precise, Trusty and Utopic.

description: updated
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "precise_multipath-tools_0.4.9-3ubuntu5.2.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
description: updated
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Submitted to debian bug tracking system:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=757508

Waiting for merges/fixes.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Revision history for this message
Brian Murray (brian-murray) wrote :

The patches you've added use the same numbers as existing patches in the debain/patches directory which may cause some confusion. I'll just go ahead and a 1 in the numbers so 0001 becomes 0011.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package multipath-tools - 0.4.9-3ubuntu9

---------------
multipath-tools (0.4.9-3ubuntu9) utopic; urgency=medium

  * Added 0011-libmultipath-update-waiter-handling.patch (LP: #1354114)
  * Added 0012-Race-condition-when-calling-stop_waiter_thread.patch (LP: #1354114)
  * Added 0013-multipath-clean-up-code-for-stopping-the-waiter-thre.patch (LP: #1354114)
  * Added 0014-Fix-race-condition-in-stop_waiter_thread.patch (LP: #1354114)
 -- Rafael David Tinoco <email address hidden> Tue, 02 Sep 2014 15:43:38 -0700

Changed in multipath-tools (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Thank you very much Brian.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Does this need review for Precise & Trusty for the SRUs to happen ?
Thanks in advance.
-Rafael

Changed in multipath-tools (Ubuntu Trusty):
status: New → Confirmed
Changed in multipath-tools (Ubuntu Precise):
assignee: nobody → Rafael David Tinoco (inaddy)
status: New → Confirmed
Changed in multipath-tools (Ubuntu Trusty):
assignee: nobody → Rafael David Tinoco (inaddy)
Revision history for this message
Marc Deslauriers (mdeslaur) wrote :

Thanks for the precise and trusty debdiffs. I have uploaded them for processing by the SRU team. I have slightly altered them:
- I have renamed the patches so they match the names in Utopic
- I have changed the release number to better suit Trusty.

Thanks!

Changed in multipath-tools (Ubuntu Precise):
status: Confirmed → In Progress
Changed in multipath-tools (Ubuntu Trusty):
status: Confirmed → In Progress
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello Rafael, or anyone else affected,

Accepted multipath-tools into trusty-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/multipath-tools/0.4.9-3ubuntu7.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in multipath-tools (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Chris J Arges (arges) wrote :

Hello Rafael, or anyone else affected,

Accepted multipath-tools into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/multipath-tools/0.4.9-3ubuntu5.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in multipath-tools (Ubuntu Precise):
status: In Progress → Fix Committed
Revision history for this message
Sebastien Bacher (seb128) wrote :

seems like that got uploaded, unsubscribing the sponsors

Mathew Hodson (mhodson)
Changed in multipath-tools (Debian):
importance: Undecided → Unknown
status: New → Unknown
Changed in multipath-tools (Debian):
status: Unknown → Fix Released
tags: added: cts
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Starting verification for Trusty and Precise...

### Precise before using package in -proposed

root@trytrusty:~# multipath -ll
lun2 (149455400000000009feaa66a15201fac7bf681e6f5437e7d) dm-3 IET,VIRTUAL-DISK
size=5.0G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 7:0:0:0 sdh 8:112 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 6:0:0:0 sdg 8:96 active ready running
lun1 (14945540000000000049be5a45b84b7f6b1f734ee26d3d7d9) dm-1 IET,VIRTUAL-DISK
size=5.0G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 5:0:0:0 sdf 8:80 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 4:0:0:0 sdc 8:32 active ready running
2luns1 (14945540000000000a442400296ddc9a76bdb59e62b95154a) dm-2 IET,VIRTUAL-DISK
size=5.0G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 2:0:0:1 sdd 8:48 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 3:0:0:1 sde 8:64 active ready running
2luns0 (149455400000000007dc14f47242331ed7394e2272e4b7346) dm-4 IET,VIRTUAL-DISK
size=5.0G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 3:0:0:0 sdb 8:16 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 2:0:0:0 sda 8:0 active ready running

### Trusty before using package in -proposed

lun2 (149455400000000009feaa66a15201fac7bf681e6f5437e7d) dm-2 IET,VIRTUAL-DISK
size=5.0G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 6:0:0:0 sdf 8:80 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 7:0:0:0 sde 8:64 active ready running
lun1 (14945540000000000049be5a45b84b7f6b1f734ee26d3d7d9) dm-1 IET,VIRTUAL-DISK
size=5.0G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 3:0:0:0 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 2:0:0:0 sda 8:0 active ready running
2luns1 (14945540000000000a442400296ddc9a76bdb59e62b95154a) dm-4 IET,VIRTUAL-DISK
size=5.0G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 5:0:0:1 sdh 8:112 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 4:0:0:1 sdd 8:48 active ready running
2luns0 (149455400000000007dc14f47242331ed7394e2272e4b7346) dm-3 IET,VIRTUAL-DISK
size=5.0G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 5:0:0:0 sdg 8:96 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 4:0:0:0 sdb 8:16 active ready running

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (3.3 KiB)

### Precise multipath daemon restart

root@tryprecise:~# dpkg -l | grep multipath-tools
ii multipath-tools 0.4.9-3ubuntu5.3 maintain multipath block device access
root@tryprecise:~# service multipath-tools restart
 * Stopping multipath daemon multipathd [ OK ]
 * Starting multipath daemon multipathd [ OK ]
root@tryprecise:~# multipath -ll
lun2 (149455400000000009feaa66a15201fac7bf681e6f5437e7d) dm-2 IET,VIRTUAL-DISK
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 7:0:0:0 sde 8:64 active ready running
  `- 6:0:0:0 sdf 8:80 active ready running
lun1 (14945540000000000049be5a45b84b7f6b1f734ee26d3d7d9) dm-1 IET,VIRTUAL-DISK
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 2:0:0:0 sda 8:0 active ready running
  `- 3:0:0:0 sdc 8:32 active ready running
2luns1 (14945540000000000a442400296ddc9a76bdb59e62b95154a) dm-4 IET,VIRTUAL-DISK
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 4:0:0:1 sdd 8:48 active ready running
  `- 5:0:0:1 sdh 8:112 active ready running
2luns0 (149455400000000007dc14f47242331ed7394e2272e4b7346) dm-3 IET,VIRTUAL-DISK
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 4:0:0:0 sdb 8:16 active ready running
  `- 5:0:0:0 sdg 8:96 active ready running

# Trusty multipath daemon restart

root@trytrusty:~# dpkg -l | grep multipath-tools
ii multipath-tools 0.4.9-3ubuntu7.2 amd64 maintain multipath block device access
root@trytrusty:~# service multipath-tools restart
 * Stopping multipath daemon multipathd [ OK ]
 * Starting multipath daemon multipathd [ OK ]
root@trytrusty:~# multipath -ll
lun2 (149455400000000009feaa66a15201fac7bf681e6f5437e7d) dm-3 IET,VIRTUAL-DISK
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 6:0:0:0 sdg 8:96 active ready running
  `- 7:0:0:0 sdh 8:112 active ready running
lun1 (14945540000000000049be5a45b84b7f6b1f734ee26d3d7d9) dm-1 IET,VIRTUAL-DISK
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 4:0:0:0 sdc 8:32 active ready running
  `- 5:0:0:0 sdf 8:80 active ready running
2luns1 (14945540000000000a442400296ddc9a76bdb59e62b95154a) dm-2 IET,VIRTUAL-DISK
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 2:0:0:1 sdd 8:48 active ready running
  `- 3:0:0:1 sde 8:64 active ready running
2luns0 (149455400000000007dc14f47242331ed7394e2272e4b7346) dm-4 IET,VIRT...

Read more...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (5.9 KiB)

# Precise failover test - removed 1 path during writes. failover after 10 seconds (iscsi timeout)

root@tryprecise:~# dd if=/dev/zero of=/dev/mapper/2luns0 bs=1M count=2000
2000+0 records in
2000+0 records out
2097152000 bytes (2.1 GB) copied, 137.125 s, 15.3 MB/s

ct 14 16:46:01 tryprecise kernel: [ 2000.944119] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295390028, last ping 4295391280, now 4295392532
Oct 14 16:46:01 tryprecise kernel: [ 2000.945011] connection2:0: detected conn error (1011)
Oct 14 16:46:01 tryprecise kernel: [ 2000.945014] connection6:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295390028, last ping 4295391280, now 4295392532
Oct 14 16:46:01 tryprecise kernel: [ 2000.945884] connection6:0: detected conn error (1011)
Oct 14 16:46:01 tryprecise iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3)
Oct 14 16:46:01 tryprecise iscsid: Kernel reported iSCSI connection 6:0 error (1011) state (3)
Oct 14 16:46:05 tryprecise kernel: [ 2005.232037] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295391098, last ping 4295392352, now 4295393604
Oct 14 16:46:05 tryprecise kernel: [ 2005.232944] connection4:0: detected conn error (1011)
Oct 14 16:46:06 tryprecise iscsid: Kernel reported iSCSI connection 4:0 error (1011) state (3)
Oct 14 16:46:37 tryprecise iscsid: connect to 172.16.22.51:3260 failed (No route to host)
Oct 14 16:47:38 iscsid: last message repeated 30 times
Oct 14 16:48:01 iscsid: last message repeated 11 times
Oct 14 16:48:01 tryprecise kernel: [ 2121.440066] session2: session recovery timed out after 120 secs
Oct 14 16:48:01 tryprecise kernel: [ 2121.440077] session6: session recovery timed out after 120 secs
Oct 14 16:48:02 tryprecise iscsid: connect to 172.16.22.51:3260 failed (No route to host)
Oct 14 16:48:05 iscsid: last message repeated 2 times
Oct 14 16:48:05 tryprecise kernel: [ 2125.792083] session4: session recovery timed out after 120 secs
Oct 14 16:48:06 tryprecise kernel: [ 2125.800111] sd 5:0:0:0: rejecting I/O to offline device
Oct 14 16:48:06 tryprecise kernel: [ 2125.800530] sd 5:0:0:0: [sdg] killing request
Oct 14 16:48:06 tryprecise kernel: [ 2125.800540] sd 5:0:0:0: rejecting I/O to offline device
...
Oct 14 16:48:06 tryprecise kernel: [ 2126.154444] Write(10): 2a 00 00 2d 3c 00 00 04 00 00
Oct 14 16:48:06 tryprecise kernel: [ 2126.154593] device-mapper: multipath: Failing path 8:112.
Oct 14 16:48:06 tryprecise multipathd: 2luns0: sdg - readsector0 checker reports path is down
Oct 14 16:48:06 tryprecise multipathd: checker failed path 8:96 in map 2luns0
Oct 14 16:48:06 tryprecise multipathd: 2luns0: remaining active paths: 1
Oct 14 16:48:06 tryprecise multipathd: 2luns1: sdh - readsector0 checker reports path is up
Oct 14 16:48:06 tryprecise multipathd: checker failed path 8:112 in map 2luns1
Oct 14 16:48:06 tryprecise multipathd: 2luns1: remaining active paths: 1
Oct 14 16:48:06 tryprecise multipathd: dm-4: add map (uevent)
Oct 14 16:48:06 tryprecise multipathd: dm-4: devmap already registered
Oct 14 16:48:06 tryprecise multipathd: dm-3: add map (uevent)
Oct 14 16:48:06 tryprecise multipathd: dm-3: devmap alr...

Read more...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (5.8 KiB)

# Trusty failover test - removed 1 path during writes. failover after 10 seconds (iscsi timeout)

root@trytrusty:~# dd if=/dev/zero of=/dev/mapper/2luns0 bs=1M count=2000
2000+0 records in
2000+0 records out
2097152000 bytes (2.1 GB) copied, 139.481 s, 15.0 MB/s

Oct 14 16:50:29 trytrusty multipathd: 2luns0: remaining active paths: 2
Oct 14 16:50:29 trytrusty multipathd: 2luns1: sde - readsector0 checker reports path is up
Oct 14 16:50:29 trytrusty multipathd: 8:64: reinstated
Oct 14 16:50:29 trytrusty multipathd: 2luns1: remaining active paths: 2
Oct 14 16:50:29 trytrusty multipathd: lun1: sdf - readsector0 checker reports path is up
Oct 14 16:50:29 trytrusty multipathd: 8:80: reinstated
Oct 14 16:50:29 trytrusty multipathd: lun1: remaining active paths: 2
Oct 14 16:50:29 trytrusty multipathd: lun2: sdh - readsector0 checker reports path is up
Oct 14 16:50:29 trytrusty multipathd: 8:112: reinstated
Oct 14 16:50:29 trytrusty multipathd: lun2: remaining active paths: 2
Oct 14 16:51:12 trytrusty kernel: [ 2299.056036] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295464558, last ping 4295465809, now 4295467060
Oct 14 16:51:12 trytrusty kernel: [ 2299.056803] connection4:0: detected conn error (1011)
Oct 14 16:51:12 trytrusty kernel: [ 2299.056805] connection6:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295464558, last ping 4295465809, now 4295467060
Oct 14 16:51:12 trytrusty kernel: [ 2299.057604] connection6:0: detected conn error (1011)
Oct 14 16:51:13 trytrusty iscsid: Kernel reported iSCSI connection 4:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)
Oct 14 16:51:13 trytrusty iscsid: Kernel reported iSCSI connection 6:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)
Oct 14 16:51:17 trytrusty kernel: [ 2303.312139] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295465620, last ping 4295466871, now 4295468124
Oct 14 16:51:17 trytrusty kernel: [ 2303.313099] connection2:0: detected conn error (1011)
Oct 14 16:51:17 trytrusty iscsid: Kernel reported iSCSI connection 2:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)
Oct 14 16:51:49 trytrusty iscsid: connect to 172.16.22.51:3260 failed (No route to host)
Oct 14 16:53:13 trytrusty kernel: [ 2419.680167] session4: session recovery timed out after 120 secs
Oct 14 16:53:13 trytrusty kernel: [ 2419.680179] session6: session recovery timed out after 120 secs
Oct 14 16:53:17 trytrusty kernel: [ 2423.776056] session2: session recovery timed out after 120 secs
Oct 14 16:53:18 trytrusty kernel: [ 2423.784028] sd 3:0:0:0: rejecting I/O to offline device
Oct 14 16:53:18 trytrusty kernel: [ 2423.784374] sd 3:0:0:0: [sdb] killing request
Oct 14 16:53:18 trytrusty kernel: [ 2423.784393] sd 3:0:0:0: rejecting I/O to offline device
Oct 14 16:53:18 trytrusty kernel: [ 2423.784764] sd 3:0:0:0: [sdb] killing request
Oct 14 16:53:18 trytrusty kernel: [ 2423.784769] sd 3:0:0:0: rejecting I/O to offline device
Oct 14 16:53:18 trytrusty kernel: [ 2423.784779] sd 3:0:0:0: [sdb] Unhandled error code
Oct 14 16:53:18 trytrusty kernel: [ 2423.784780] sd 3:0:0:0: [sdb]
Oct 1...

Read more...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I'm confirming both: Precise and Trusty SRUs are operational and multipath seems to be fine.

Changing tags to verification-done.

Thank you.

Rafael Tinoco

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package multipath-tools - 0.4.9-3ubuntu5.3

---------------
multipath-tools (0.4.9-3ubuntu5.3) precise; urgency=medium

  * 1002-fix-linking-library.patch: Move $(LIBDEPS) to the end of the
    link line to fix underlinking with ld --as-needed (LP: #1004243)
 -- Adam Conrad <email address hidden> Mon, 13 Oct 2014 07:47:07 -0600

Changed in multipath-tools (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
Mathew Hodson (mhodson) wrote :

This bug was fixed in the package multipath-tools - https://launchpad.net/ubuntu/+source/multipath-tools/0.4.9-3ubuntu7.2

---------------
multipath-tools (0.4.9-3ubuntu7.2) trusty; urgency=medium

  * 1002-fix-linking-library.patch: Move $(LIBDEPS) to the end of the
    link line to fix underlinking with ld --as-needed (LP: #1004243)
 -- Adam Conrad <email address hidden> Mon, 13 Oct 2014 07:47:07 -0600

Changed in multipath-tools (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.