Ubuntu 15.04 [genwqe_start] err: could not setup servicelayer

Bug #1392021 reported by bugproxy
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Unassigned
Utopic
Fix Released
Undecided
Chris J Arges
Vivid
Fix Released
High
Unassigned

Bug Description

[Impact]
The IBM GenWQE Accelerator Adapter does not work correctly in the latest 3.16 kernel.

[Test Case]
1) Create a guest system on PowerKVM using image disk file, install from virutal SCSI DVD. Especially assign the GenWQE card via PCI passthrough.
2) Start up the guest
3) lspci -knd :044b
3) dmesg|grep genwqe
4) ) modprobe genwqe_card.ko
    No module is assigned to the card
5) ls -l /dev/genwq* shows no file at all.
    One device file would be expected here.

[Fix]
The following patches cleanly cherry-pick into 3.16
1451f41 GenWQE: Support blocking when DDCB queue is busy
08e4906 GenWQE: Fix problem when reading HSI and Retc
d9c11d4 GenWQE: Fix checkpatch complaints
bc407dd GenWQE: Check return code of pci_sriov_enable
2d880cc GenWQE: Do not modify return code of genwqe_set_interrupt_capability
26d8f6f GenWQE: Update author information
64df2ec GenWQE: Remove sysfs entry for driver version
95a8825 GenWQE: Check pci_get_totalvfs return code
32182cd misc: remove DEFINE_PCI_DEVICE_TABLE usage
5b35b20 GenWQE: Remove unnecessary include
7276883 misc/GenWQE: fix pci_enable_msi usage
d584f69 GenWQE: Increase driver version number
93b772b GenWQE: Improve hardware error recovery
fb14545 GenWQE: Add support for EEH error recovery
c1f732a GenWQE: Add sysfs interface for bitstream reload

--

== Comment: #0 - Christian Rund <email address hidden> - 2014-11-11 10:45:12 ==
---Problem Description---
In our opinion the 2.0.15 version of the IBM GenWQE Accelerator Adapter driver part of Ubuntu 15.04 is not working. A higher version would be needed. The driver is part of the linux-image-extra-3.16.0-24-generic package.

[ 3.330906] genwqe 0001:00:02.0: enabling device (0140 -> 0142)
[ 3.332443] genwqe 0001:00:02.0: ibm,query-pe-dma-windows(26200000) 1 8000000 20000001 returned 0
[ 3.333159] genwqe 0001:00:02.0: ibm,create-pe-dma-window(27200000) 1 8000000 20000001 10 1f returned -1 (liobn = 0x0 starting addr = 0 0)
[ 4.403223] genwqe 0001:00:02.0: [genwqe_start] err: could not setup servicelayer!
[ 4.403333] genwqe 0001:00:02.0: err: cannot start card services! (err=-5)
[ 4.404471] genwqe: probe of 0001:00:02.0 failed with error -5
[321140.194392] genwqe_card: module verification failed: signature and/or required key missing - tainting kernel

modinfo genwqe_card
filename: /lib/modules/3.16.0-24-generic/kernel/drivers/misc/genwqe/genwqe_card.ko
license: GPL
version: 2.0.15
description: GenWQE Card
author: Michal Jung <email address hidden>
author: Joerg-Stephan Vogt <email address hidden>
author: Michael Ruettger <email address hidden>
author: Frank Haverkamp <email address hidden>
srcversion: 69FBCA52AFAF3B71342E43B
alias: pci:v00001014d0000044Bsv00001014sd0000044Bbc12sc00i00*
alias: pci:v00001014d00000000sv00000000sd0000035Fbc12sc00i00*
alias: pci:v00001014d0000044Bsv00000000sd0000035Fbc12sc00i00*
alias: pci:v00001014d00000000sv00000000sd00000000bc12sc00i00*
alias: pci:v00001014d0000044Bsv00000000sd00000000bc12sc00i00*
alias: pci:v00001014d0000044Bsv00001014sd0000035Fbc12sc00i00*
depends: crc-itu-t
intree: Y
vermagic: 3.16.0-24-generic SMP mod_unload modversions
signer: Magrathea: Glacier signing key
sig_key: 32:F4:D0:34:89:C6:7C:D7:71:67:94:F6:0C:00:D7:F7:E8:D2:78:0E
sig_hashalgo: sha512

Contact Information = <email address hidden>, Frank Haverkamp <email address hidden>

---uname output---
Linux tulg3 3.16.0-24-generic #32-Ubuntu SMP Tue Oct 28 13:06:19 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux

---Additional Hardware Info---
0001:00:02.0 Processing accelerators: IBM GenWQE Accelerator Adapter
Class: 1200 VendorID: 1014 DeviceId: 044b assigned to the PowerKVM guest via PCI passthrough (vfio-pci)

Machine Type = 8284-22A PowerKVM

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 1) Create a guest system on PowerKVM using image disk file, install from virutal SCSI DVD. Especially assign the GenWQE card via PCI passthrough.
2) Start up the guest
3) lspci -knd :044b
3) dmesg|grep genwqe
4) ) modprobe genwqe_card.ko
    No module is assigned to the card
5) ls -l /dev/genwq* shows no file at all.
    One device file would be expected here.

Stack trace output:
 no

Oops output:
 no

System Dump Info:
  The system is not configured to capture a system dump.

*Additional Instructions for <email address hidden>, Frank Haverkamp <email address hidden>:
-Attach sysctl -a output output to the bug.

== Comment: #2 - Frank Haverkamp <email address hidden> - 2014-11-12 05:09:24 ==
Christian and I tried out CVS version of the driver and that works. Ubuntu picked and unfortunate intermediate version of the driver which had broken irq registration. This was fixed later by Sebastian Ott and Kleber and I added on top of that some more patches which did cleanups and more importantly added System p specific recovery features e.g. like EEH handlers and a method to reload the bitstream for p.

Therefore it would be great if someone could have the Ubuntu folks pickup the latest version from Kernel.org.

I know that Greg KH has a good version of the code in his tree. We need to check if that has made it yet into Linux version (both should be the same).

== Comment: #3 - Frank Haverkamp <email address hidden> - 2014-11-12 07:58:03 ==
Christian has checked the linux.git version of the genwqe_card driver compiled against his Ubuntu kernel and found it working.

SHA is 206c5f60a3d902bc4b56dab2de3e88de5eb06108.
Patched affecting the driver are usually prefixed with GenWQE or genwqe

7276883f1f98cd0a92fdc049f69bdc0912f7fc16 misc/GenWQE: fix pci_enable_msi usage
was the one which fixed the problem introduced by
a30d0108b09ae46d24594a2e699c4dad21bb4af4 Use pci_enable_msi_exact() instead of pci_enable_msi_block()

So please have a look how we convince the Ubuntu folks to update their version of the code. Thanks.

== Comment: #5 - Christian Rund <email address hidden> - 2014-11-12 08:26:17 ==
Confirm that version 2.0.25 built against the mainline kernel is working for us.
Used given SHA based on Linux 3.18-rc4, paid attention to the new genwqe_card.h file.

To make matters worse the picked 2.0.15 version leads to system hung soon after the dmesg messages described above (i.e. when the genwqe_card module is loaded).
Thus raising priority to P2 normal.

Hope Franks and my comments answer 'more info'. Setting state back to open.

crund@tulg3:~/driver-core/drivers/misc/genwqe$ make -C /lib/modules/3.16.0-24-generic/build SUBDIRS=/home/crund/driver-core/drivers/misc/genwqe EXTRA_CFLAGS="-I/home/crund/driver-core/include/uapi -Wuninitialized -DCONFIG_GENWQE_PLATFORM_ERROR_RECOVERY=1" modules
make: Entering directory '/usr/src/linux-headers-3.16.0-24-generic'
  CC [M] /home/crund/driver-core/drivers/misc/genwqe/card_base.o
  CC [M] /home/crund/driver-core/drivers/misc/genwqe/card_dev.o
  CC [M] /home/crund/driver-core/drivers/misc/genwqe/card_ddcb.o
  CC [M] /home/crund/driver-core/drivers/misc/genwqe/card_sysfs.o
  CC [M] /home/crund/driver-core/drivers/misc/genwqe/card_debugfs.o
  CC [M] /home/crund/driver-core/drivers/misc/genwqe/card_utils.o
  LD [M] /home/crund/driver-core/drivers/misc/genwqe/genwqe_card.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC /home/crund/driver-core/drivers/misc/genwqe/genwqe_card.mod.o
  LD [M] /home/crund/driver-core/drivers/misc/genwqe/genwqe_card.ko
make: Leaving directory '/usr/src/linux-headers-3.16.0-24-generic'
crund@tulg3:~/driver-core/drivers/misc/genwqe$ su
Password:
root@tulg3:/home/crund/driver-core/drivers/misc/genwqe# cat /sys/kernel/debug/genwqe/genwqe0_card/info
genwqe driver version: 2.0.25
    Device Name/Type: 0001:00:02.0 Physical CardIdx: 0
    SLU/APP Config : 0x00000b0330342260/0x00000002475a4950
    Build Date : 2/26/2014
    Base Clock : 175 MHz
    Arch/SVN Release: 3/b
    Bitstream : 1

Revision history for this message
bugproxy (bugproxy) wrote : guest XML file

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-118810 severity-medium targetmilestone-inin1504
Revision history for this message
bugproxy (bugproxy) wrote : sysctl -a output

Default Comment by Bridge

Dave Heller (hellerda)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2014-11-12 19:52 EDT-------
Hi Canonical,

The request is to pick up the latest upstream driver as describe above. Thx.

tags: added: targetmilestone-inin---
removed: targetmilestone-inin1504
penalvch (penalvch)
tags: added: cherry-pick
Changed in linux (Ubuntu):
importance: Undecided → High
status: New → Triaged
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2014-11-19 15:21 EDT-------
Changed status as driver is available

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Vivid):
status: Triaged → Fix Released
Changed in linux (Ubuntu Utopic):
assignee: nobody → Chris J Arges (arges)
status: New → In Progress
Chris J Arges (arges)
description: updated
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Utopic):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-utopic' to 'verification-done-utopic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-utopic
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2014-12-02 18:22 EDT-------
Updated the system to 3.16.0-25 using the vivid-proposed repository following the instructions on https://wiki.ubuntu.com/Testing/EnableProposed. The linux-image-extra-3.16.0-25-generic package was part of the update but still contains the faulty 2.0.15 version showing the I/O error below at boot time and on manual rmmod genwqe_card and modprobe genwqe_card.

crund@tulg3:~$ uname -a
Linux tulg3 3.16.0-25-generic #33-Ubuntu SMP Tue Nov 4 12:05:54 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux

dmesg output:
[ 5.899584] genwqe 0001:00:02.0: [genwqe_start] err: could not setup servicelayer!
[ 5.899702] genwqe 0001:00:02.0: err: cannot start card services! (err=-5)
[ 5.900906] genwqe: probe of 0001:00:02.0 failed with error -5
[ 5.900982] genwqe 0002:00:03.0: enabling device (0140 -> 0142)
[ 5.902130] genwqe 0002:00:03.0: ibm,query-pe-dma-windows(2026) 1 8000000 20000009 returned 0
[ 5.909833] genwqe 0002:00:03.0: ibm,create-pe-dma-window(2027) 1 8000000 20000009 10 23 returned -1 (liobn = 0x0 starting addr = 0 0)
[ 6.777774] init: plymouth-upstart-bridge main process ended, respawning
[ 6.979612] genwqe 0002:00:03.0: [genwqe_start] err: could not setup servicelayer!
[ 6.979726] genwqe 0002:00:03.0: err: cannot start card services! (err=-5)
[ 6.980833] genwqe: probe of 0002:00:03.0 failed with error -5
[ 7.016438] init: plymouth-splash main process (1001) terminated with status 1
[ 32.081696] systemd-logind[1104]: New seat seat0.
[ 32.090494] systemd-logind[1104]: Failed to start user service: Unknown unit: user@1002.service
[ 32.098073] systemd-logind[1104]: New session 1 of user crund.
[ 251.204016] genwqe 0001:00:02.0: [genwqe_start] err: could not setup servicelayer!
[ 251.204127] genwqe 0001:00:02.0: err: cannot start card services! (err=-5)
[ 251.205240] genwqe: probe of 0001:00:02.0 failed with error -5
[ 252.275974] genwqe 0002:00:03.0: [genwqe_start] err: could not setup servicelayer!
[ 252.276083] genwqe 0002:00:03.0: err: cannot start card services! (err=-5)
[ 252.277177] genwqe: probe of 0002:00:03.0 failed with error -5

Revision history for this message
penalvch (penalvch) wrote :
Changed in linux (Ubuntu Vivid):
status: Fix Released → Triaged
Revision history for this message
Chris J Arges (arges) wrote :

Christian Rund,
Can you please test the -26 utopic-proposed version which should have the patchset I mentioned above in it.
Thanks,

Here is a link:
https://launchpad.net/ubuntu/+source/linux/3.16.0-26.35

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2014-12-03 16:56 EDT-------
Successfully tested the 3.16.0-26-generic packages of which the extended package has the right (2.0.25) version of the genwqe_card module in. I was able to rmmod, modprobe and on reboot the system successfully loaded the genwqe device driver. I could run compression/decompression workload successfully.

That said, I had to manually install the .deb packages found on https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/6612025 as 'apt' was not showing the 3.16.0-26 kernel updates from the 'trusty-proposed' repository for my vivid 15.04 system.

crund@tulg3:~$ uname -a
Linux tulg3 3.16.0-26-generic #35-Ubuntu SMP Tue Dec 2 16:38:01 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux
crund@tulg3:~$ modinfo genwqe_card
filename: /lib/modules/3.16.0-26-generic/kernel/drivers/misc/genwqe/genwqe_card.ko

Revision history for this message
Luis Henriques (henrix) wrote :

As per comment #9, I'm tagging this as verified in utopic.

tags: added: verification-done-utopic
removed: verification-needed-utopic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (16.7 KiB)

This bug was fixed in the package linux - 3.16.0-26.35

---------------
linux (3.16.0-26.35) utopic; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1398118

  [ Upstream Kernel Changes ]

  * Revert "drm/nouveau: punt fbcon resume out to a workqueue"
  * Revert "drm/nouveau/kms: take more care when pulling down accelerated
    fbcon"

linux (3.16.0-26.34) utopic; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1395892

  [ Chris J Arges ]

  * [Config] CONFIG_SCOM_DEBUGFS=y for powerpc/powerpc64-smp ppc64el/generic
    - LP: #1395855

  [ Tim Gardner ]

  * [Config] CONFIG_GENWQE_PLATFORM_ERROR_RECOVERY=1 for powerpc/ppc64el
    - LP: #1392021

  [ Upstream Kernel Changes ]

  * Revert "usb: dwc3: dwc3-omap: Disable/Enable only wrapper interrupts in
    prepare/complete"
    - LP: #1393401
  * Revert "iwlwifi: mvm: treat EAPOLs like mgmt frames wrt rate"
    - LP: #1393401
  * Revert "block: all blk-mq requests are tagged"
    - LP: #1393401
  * ACPI / blacklist: add Win8 OSI quirks for some Dell laptop models
    - LP: #1339456
  * PCI: Remove "no hotplug settings from platform" warning
    - LP: #1390182
  * drm/nouveau/kms: take more care when pulling down accelerated fbcon
    - LP: #1386695
  * drm/nouveau: punt fbcon resume out to a workqueue
    - LP: #1386695
  * drm/tilcdc: Fix the error path in tilcdc_load()
    - LP: #1393401
  * builddeb: put the dbg files into the correct directory
    - LP: #1393401
  * switch iov_iter_get_pages() to passing maximal number of pages
    - LP: #1393401
  * fuse: honour max_read and max_write in direct_io mode
    - LP: #1393401
  * usb: phy: return -ENODEV on failure of try_module_get
    - LP: #1393401
  * PM / clk: Fix crash in clocks management code if !CONFIG_PM_RUNTIME
    - LP: #1393401
  * rt2x00: support Ralink 5362.
    - LP: #1393401
  * wireless: rt2x00: add new rt2800usb devices
    - LP: #1393401
  * NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes
    - LP: #1393401
  * nfs: fix duplicate proc entries
    - LP: #1393401
  * ext4: check EA value offset when loading
    - LP: #1393401
  * jbd2: free bh when descriptor block checksum fails
    - LP: #1393401
  * ext4: don't check quota format when there are no quota files
    - LP: #1393401
  * target: Fix queue full status NULL pointer for SCF_TRANSPORT_TASK_SENSE
    - LP: #1393401
  * vfs: fix data corruption when blocksize < pagesize for mmaped data
    - LP: #1393401
  * ext4: fix mmap data corruption when blocksize < pagesize
    - LP: #1393401
  * ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT
    - LP: #1393401
  * qla_target: don't delete changed nacls
    - LP: #1393401
  * target: Fix APTPL metadata handling for dynamic MappedLUNs
    - LP: #1393401
  * iser-target: Disable TX completion interrupt coalescing
    - LP: #1393401
  * ext4: don't orphan or truncate the boot loader inode
    - LP: #1393401
  * ext4: add ext4_iget_normal() which is to be used for dir tree lookups
    - LP: #1393401
  * ext4: fix reservation overflow in ext4_da_write_begin
    - LP: #1393401
  * ext4: Replace open coded mdata csum feature to helper function
    - LP: #1393401
  * ext4: move error ...

Changed in linux (Ubuntu Utopic):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote :

Marking Vivid as fix released because it will be rebased on 3.18 soon and have these commits anyway

Changed in linux (Ubuntu Vivid):
status: Triaged → Fix Released
bugproxy (bugproxy)
tags: added: targetmilestone-inin1504
removed: targetmilestone-inin---
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.