udaddy is failing to get address handle in Ubuntu 14.10 (Mellanox)

Bug #1364442 reported by bugproxy
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Unassigned
libibverbs (Ubuntu)
Fix Released
High
Adam Conrad
Trusty
Won't Fix
High
Adam Conrad

Bug Description

---Problem Description---
udaddy is giving segmentation fault in ubuntu14.10 guest VM.

---uname output---
Linux ubuntu 3.16.0-9-generic #14-Ubuntu SMP Fri Aug 15 15:03:36 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux

---Additional Hardware Info---
0001:00:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]

Machine Type = P8

---Steps to Reproduce---
Install two P8 machines with Power KVM build releases.
Then do PCI pass through of the Mellanox Technologies MT27500 Family [ConnectX-3] adapter to ubuntu 14.10 guest VM on one P8 server.
Also, do PCI pass through similarly in another P8 machine to another guest VM.

Then install all of the OFED packages available in ubuntu repo in the ubuntu 14.10 guest VM.

On the server side Ubuntu 14.10 guest VM:
================================
root@ubuntu:~# udaddy
udaddy: starting server
receiving data transfers
sending replies
Segmentation fault
root@ubuntu:~# echo $?
139

root@ubuntu:~# dmesg | tail
[ 67.760069] systemd-logind[1035]: Failed to start user service: Unknown unit: user@1000.service
[ 67.761358] systemd-logind[1035]: New session c1 of user ubuntu.
[ 84.157069] mlx4_en: eth1: frag:0 - size:1526 prefix:0 align:0 stride:1536
[ 113.906624] sda2: WRITE SAME failed. Manually zeroing.
[ 1207.663187] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v2.2-1 (Feb 2014)
[ 1372.419099] udaddy[10624]: unhandled signal 11 at 000000000000001c nip 00003fffb2833940 lr 00003fffb28332f8 code 30001
[ 1391.318442] udaddy[10625]: unhandled signal 11 at 000000000000001c nip 00003fff843c3940 lr 00003fff843c32f8 code 30001
[ 1605.929122] udaddy[10641]: unhandled signal 11 at 000000000000001c nip 00003fff97203940 lr 00003fff972032f8 code 30001
[ 1869.130536] udaddy[10648]: unhandled signal 11 at 000000000000001c nip 00003fff94523940 lr 00003fff945232f8 code 30001
[ 2124.361751] udaddy[10652]: unhandled signal 11 at 000000000000001c nip 00003fff89d33940 lr 00003fff89d332f8 code 30001

On the client node with other distro guest VM:
==================================
[root@localhost ~]# udaddy -s 10.10.10.15
udaddy: starting client
udaddy: connecting
initiating data transfers
receiving data transfers

root@ubuntu:~# uname -a
Linux ubuntu 3.16.0-9-generic #14-Ubuntu SMP Fri Aug 15 15:03:36 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux

root@ubuntu:~# which udaddy
/usr/bin/udaddy
root@ubuntu:~# dpkg -S /usr/bin/udaddy
rdmacm-utils: /usr/bin/udaddy
root@ubuntu:~# dpkg --list | grep rdmacm-utils
ii rdmacm-utils 1.0.16-1 ppc64el Examples for the librdmacm library
root@ubuntu:~#

root@ubuntu:~# lspci
0000:00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
0001:00:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]

root@ubuntu:~# lspci -v
0000:00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
        Subsystem: Red Hat, Inc Device 0001
        Flags: bus master, fast devsel, latency 0, IRQ 17
        I/O ports at 0020 [size=32]
        Memory at 100b0000000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at 100b0040000 [disabled] [size=256K]
        Capabilities: [40] MSI-X: Enable+ Count=3 Masked-
        Kernel driver in use: virtio-pci

0001:00:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
        Subsystem: IBM Device 04b5
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Memory at 130b0000000 (64-bit, non-prefetchable) [size=1M]
        Memory at 130a0000000 (64-bit, prefetchable) [size=32M]
        Expansion ROM at 130b0100000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
        Capabilities: [48] Vital Product Data
        Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [148] Device Serial Number f4-52-14-03-00-0c-df-50
        Capabilities: [108] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [154] Advanced Error Reporting
        Kernel driver in use: mlx4_core

Userspace tool common name: /usr/bin/udaddy

The userspace tool has the following bit modes: 64-bit

Userspace rpm: rdmacm-utils

I will look into this tomorrow I have to read the code in libmlx4 and libibverbs in ubuntu 14.10. There is a lot of changes for RoCE UD that I am not sure what Ubuntu 14.10 took.
I just tried to run it in Ubuntu 14.10 but just using the same Ubuntu machine as server and client and I get this
udaddy -s 20.20.20.20
udaddy: starting client
udaddy: connecting
udaddy: failure creating address handle
test complete
return status -1

Ubuntu is missing some code to make this work. The issue is that libmlx4 and libibverbs is missing the code for RoCE UD neighboor code.
Here I found the series of patches needed:
For libmlx4:
[PATCH libmlx4 V4 0/2] Add RoCE IP based addressing support for UD QPs
[PATCH libmlx4 V4 1/2] Add ibv_query_port caching support
[PATCH libmlx4 V4 2/2] Add RoCE IP based addressing support for UD QPs

For libibverbs:
 [PATCH libibverbs V5 0/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
[PATCH libibverbs V5 1/2] Add ibv_port_cap_flags
[PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-115089 severity-high targetmilestone-inin1410
Luciano Chavez (lnx1138)
affects: ubuntu → libibverbs (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2014-09-19 16:05 EDT-------
> Carol, can you attach the missing libraries that you want Canonical to pick
> up.
libibverbs & libmlx4 -1 ?

The libraries are in Canonical, they just need to pick up some patches that I pointed for libibverbs and libmlx4-1

Changed in libibverbs (Ubuntu):
assignee: nobody → Taco Screen team (taco-screen-team)
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2014-11-03 18:29 EDT-------
Carol L. Soto 2014-10-20 17:28:38 EDT added following comments:

Created attachment 93583 [details]
proposed patch [PATCH libibverbs V5 1/2] Add ibv_port_cap_flags

Created attachment 93584 [details]
proposed patch [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution

Created attachment 93585 [details]
proposed patch [PATCH libmlx4 V4 1/2] Add ibv_query_port caching support

Created attachment 93586 [details]
proposed patch [PATCH libmlx4 V4 2/2] Add RoCE IP based addressing support for UD QPs

Revision history for this message
Steve Langasek (vorlon) wrote :

The patches are not attached to the launchpad bug via the bug proxy. Could someone please attach them directly here?

Revision history for this message
Breno Leitão (breno-leitao) wrote : Re: [Bug 1364442] Re: udaddy is failing to get address handle in Ubuntu 14.10 (Mellanox)

HI Steve,

Yes, the patches were attached internally only. We are reposting it.

On Mon, Nov 3, 2014 at 4:44 PM, Steve Langasek <<email address hidden>
> wrote:

> The patches are not attached to the launchpad bug via the bug proxy.
> Could someone please attach them directly here?
>
> --
> You received this bug notification because you are a member of Taco
> Screen team, which is a bug assignee.
> https://bugs.launchpad.net/bugs/1364442
>
> Title:
> udaddy is failing to get address handle in Ubuntu 14.10 (Mellanox)
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/libibverbs/+bug/1364442/+subscriptions
>

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libibverbs (Ubuntu):
status: New → Confirmed
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2014-11-10 18:08 EDT-------
Canonical,

Any status on it?

Revision history for this message
Breno Leitão (breno-leitao) wrote :
Revision history for this message
Breno Leitão (breno-leitao) wrote :
Revision history for this message
Breno Leitão (breno-leitao) wrote :
Revision history for this message
Breno Leitão (breno-leitao) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "libibverbs-add_ibv_port_cap_flags.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
bugproxy (bugproxy)
tags: added: targetmilestone-inin1504
removed: targetmilestone-inin1410
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-02-11 14:02 EDT-------
(In reply to comment #31)
------- Comment
> From breno-leitao 2014-11-04 01:02:11 UTC-------

------- Comment From
> janitor 2014-11-05 17:55:47 UTC-------

------- Comment From breno-leitao
> 2014-12-02 00:45:46 UTC-------

------- Comment From breno-leitao 2014-12-02
> 00:42:55 UTC-------

------- Comment From breno-leitao 2014-12-02 00:45:13
> UTC-------

The attachment
> "libibverbs-add_ibv_port_cap_flags.patch" seems to be a patch. If it isn't,
> please remove the "patch" flag from the attachment, remove the "patch" tag,
> and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.
> [This is an automated message performed by a Launchpad user owned by
> ~brian-murray, for any issues please contact him.]

Yes it is a patch.

Steve Langasek (vorlon)
Changed in libibverbs (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Steve Langasek (vorlon)
Steve Langasek (vorlon)
Changed in libibverbs (Ubuntu):
assignee: Steve Langasek (vorlon) → Adam Conrad (adconrad)
Steve Langasek (vorlon)
Changed in libibverbs (Ubuntu):
importance: Undecided → High
status: Confirmed → Triaged
Steve Langasek (vorlon)
Changed in libibverbs (Ubuntu Trusty):
assignee: nobody → Adam Conrad (adconrad)
importance: Undecided → High
status: New → Triaged
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-09-24 17:42 EDT-------
*** Bug 128933 has been marked as a duplicate of this bug. ***

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-05-31 21:04 EDT-------
Any update? Issue has been seen with 15.10 and 14.04.4 as well. Thanks.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

15.10 is past end of life.

16.10 which is current development target has 1.2.0-2ubuntu1. Are the patches in question part of that upstream release? If yes, this bug is fix released in yakkety.

tags: added: roce
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Frank Heimes (fheimes) wrote :

On 17.04 / Zesty several RoCE related packages (like rdma, ib, etc.) were updated to the latest upstream versions, incl. rdmacm-utils.

Hence, with the current package stack from Zesty this issue got resolved (at least on s390x):
$ udaddy
udaddy: starting server
receiving data transfers
sending replies
data transfers complete
test complete
return status 0
$

$ udaddy -s 192.168.1.141
udaddy: starting client
udaddy: connecting
initiating data transfers
receiving data transfers
data transfers complete
test complete
return status 0
$

$ ls -la /dev/infiniband/*
crw------- 1 root root 231, 64 Feb 16 2017 /dev/infiniband/issm0
crw------- 1 root root 231, 65 Feb 16 2017 /dev/infiniband/issm1
crw------- 1 root root 231, 66 Feb 16 2017 /dev/infiniband/issm2
crw------- 1 root root 231, 67 Feb 16 2017 /dev/infiniband/issm3
crw-rw-rw- 1 root root 10, 53 Feb 16 2017 /dev/infiniband/rdma_cm
crw-rw-rw- 1 root root 231, 224 Feb 16 2017 /dev/infiniband/ucm0
crw-rw-rw- 1 root root 231, 225 Feb 16 2017 /dev/infiniband/ucm1
crw------- 1 root root 231, 0 Feb 16 2017 /dev/infiniband/umad0
crw------- 1 root root 231, 1 Feb 16 2017 /dev/infiniband/umad1
crw------- 1 root root 231, 2 Feb 16 2017 /dev/infiniband/umad2
crw------- 1 root root 231, 3 Feb 16 2017 /dev/infiniband/umad3
crw-rw-rw- 1 root root 231, 192 Feb 16 2017 /dev/infiniband/uverbs0
crw-rw-rw- 1 root root 231, 193 Feb 16 2017 /dev/infiniband/uverbs1
$

tags: added: s390x
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@frank-heimes note this bug report is about ppc64el only and was not raised for the s390x engagement.

Changed in libibverbs (Ubuntu):
status: Triaged → Fix Released
Changed in libibverbs (Ubuntu Trusty):
status: Triaged → Won't Fix
Changed in ubuntu-z-systems:
status: Triaged → Fix Released
tags: removed: s390x
tags: added: zesty
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.