Stratton: ISST-LTE:UbuntuKVM: Failed to hotplug virtual devices to guest running Ubuntu 16.04.1 on UbuntuKVM16.04.1 #179

Bug #1625986 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Undecided
Tim Gardner
Yakkety
Fix Released
Undecided
Unassigned

Bug Description

== Comment: #0 - Frank P. Novak <email address hidden> - 2016-08-15 11:36:32 ==
---Problem Description---

Briggs&Stratton GA1 mustfix
IBM-ISST
KVM
Linux OS
Ubuntu
No milestone

@garychengg garychengg

@jackt-smc jackt-smc
9 participants
@haochanh
@itskin
@dougmill-ibm
@rogerc-smc
@nadiafry
@drbrent
@mzipse
@garychengg
@jackt-smc
Notifications

You?re receiving notifications because you?re subscribed to this repository.
Lock conversation
@haochanh
haochanh commented 21 days ago

On the HOST, I run this command and observe those error on the guest:
root@micro:~# uname -a
Linux micro 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:05:18 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
root@micro:~# ls -l /microg1-g2-xfs
total 788760392
-rw-r--r-- 1 libvirt-qemu kvm 21474836480 Jul 25 10:28 microg1.hotplug.img
-rw-r--r-- 1 libvirt-qemu kvm 644245094400 Jul 25 10:53 microg1_lv.raw.img
-rw-r--r-- 1 libvirt-qemu kvm 107374182400 Jul 25 10:53 microg1.raw.img
-rw-r--r-- 1 libvirt-qemu kvm 370680332288 Jul 25 10:53 microg2_lv.qcow2.img
-rw-r--r-- 1 libvirt-qemu kvm 84473282560 Jul 25 10:53 microg2.qcow2.img

root@micro:~# virsh attach-disk microg1 --source /microg1-g2-xfs/microg1.hotplug.img --target vdd

On the GUEST: I got this error and NO disk is added in.
root@microg1:~# cat /var/log/kern.log |tail -30
Jul 25 10:31:18 microg1 kernel: [242207.245064] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.245166] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.245241] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.245326] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.245413] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.245488] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.245564] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.245654] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.245732] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.245811] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.245898] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.245972] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.246046] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.246124] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.246198] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.246292] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.246363] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.246434] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.246508] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.246582] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.246656] rpaphp: pci_hp_register failed with error -16
Jul 25 10:31:18 microg1 kernel: [242207.520069] pci 0000:00:04.0: [1af4:1001] type 00 class 0x010000
Jul 25 10:31:18 microg1 kernel: [242207.520283] pci 0000:00:04.0: reg 0x10: [io 0x10000-0x1003f]
Jul 25 10:31:18 microg1 kernel: [242207.520339] pci 0000:00:04.0: reg 0x14: [mem 0x00000000-0x00000fff]
Jul 25 10:31:18 microg1 kernel: [242207.521180] iommu: Adding device 0000:00:04.0 to group 0
Jul 25 10:31:18 microg1 kernel: [242207.521309] pci 0000:00:04.0: BAR 1: assigned [mem 0x100a0000000-0x100a0000fff]
Jul 25 10:31:18 microg1 kernel: [242207.521391] pci 0000:00:04.0: BAR 0: assigned [io 0x10040-0x1007f]
Jul 25 10:31:18 microg1 kernel: [242207.521527] virtio-pci 0000:00:04.0: enabling device (0000 -> 0003)
Jul 25 10:31:18 microg1 kernel: [242207.522264] virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
Jul 25 10:39:35 microg1 kernel: [242704.508536] XFS (loop0): Unmounting Filesystem
@haochanh haochanh added the IBM-HST-ISST label 21 days ago
@mzipse mzipse added the Linux OS / KVM label 21 days ago
@garychengg garychengg was assigned by mzipse 21 days ago
@jackt-smc jackt-smc was assigned by garychengg 21 days ago
@dougmill-ibm dougmill-ibm added Ubuntu KVM labels 18 days ago
@haochanh
haochanh commented 14 days ago

We try to add a virtual net device and got the same error "pci_hp_register failed with error -16" however the virtual NIC is added in.
@nadiafry nadiafry added the Briggs&Stratton GA1 mustfix label 14 days ago
@itskin
itskin commented 13 days ago

Approve System Test mustfix classification. Reason=While both hotplug of disk and network both generate same error message, at least NIC succeeds but disk consistently fails.
@dougmill-ibm
dougmill-ibm commented 13 days ago

Note, error -16 is EBUSY. I have not yet found the circumstance(s) under which EBUSY is returned from pci_hp_register().
@rogerc-smc
rogerc-smc commented 11 days ago

Not sure what I am missing, but I haven't been able to replicate the issue exactly:

root@104-173:/mnt/a# ls -l
total 527863828
drwx------ 2 root root 16384 Aug 3 16:19 lost+found
-rw-r--r-- 1 libvirt-qemu kvm 536953094144 Aug 3 18:28 data_disk.qcow2
-rw-r--r-- 1 libvirt-qemu kvm 1791688704 Aug 4 10:31 os_disk.qcow2

root@104-173:/mnt/a# virsh attach-disk Guest1_Xenial --source /mnt/a/data_disk.qcow2 --target vdd
Disk attached successfully

On the Guest OS, I am only getting this error:
Aug 4 10:39:46 105-214 kernel: [ 253.633490] RTAS: event: 3, Type: Unknown, Severity: 1
I also don't see any disk being added in until after I reboot the Guest.

Also, if I switch up the attach-disk command a little bit:
root@104-173:/mnt/a# virsh attach-disk Guest1_Xenial --source /mnt/a/data_disk.qcow2 --target sdc
Disk attached successfully
I no longer see an error in the Guest OS, although I still won't see any disk being added until I reboot the Guest.
@haochanh
haochanh commented 11 days ago

The purpose of this hotplug is we can use the disk/nic live, without reboot the guest.
Not sure why you do not see the pci_hp_register failed with error -16.
@nadiafry
nadiafry commented 10 days ago

So, in a way, SuperMicro has recreated this bug as the disk wasn't showing up until after a guest reboot.
@drbrent
drbrent commented 6 days ago

ISST updated to newer kernel, -34. Problem appears to be gone. IBM to test again. Close if not recreated.
@haochanh
haochanh commented 3 days ago

I have verified the disk is added in without reboot the guest on -34 kernel.
However, I still see this error "rpaphp: pci_hp_register failed with error -16" only on the first trial and it is gone on a consequence attempt until you reboot the guest then it shows up error again only on the 1st attempt.

I run this command on the guest "tail -f /var/log/kern.log", 3 times attempt add/remove from the Host and collect the log below...
microg1-hotplug.txt
@haochanh
haochanh commented 3 days ago

The hotplug function is working, the error messages maybe harmless, annoying but it is there on the guest for the first time attempt to do hotplug. Please advise us on we should close this or not. Thanks.

== Comment: #2 - TYREL N. DATWYLER <email address hidden> - 2016-08-16 11:29:20 ==
In the past we have built the rpaphp code as a module and not auto-loaded it to avoid these messages in qemu guests. There is now an upstream patch that fixes this all together.

commit e2413a7dae52fab290b7a8d11ec8579657bab95b
Author: Tyrel Datwyler <email address hidden>
Date: Mon Jul 11 17:16:27 2016 -0500

    PCI: rpaphp: Fix slot registration for multiple slots under a PHB

    The underlying slot hotplug registration code assumed multiple slots, but
    the actual implementation is broken for multiple slots.

    This went unnoticed for years do to the fact that PowerVM seems to only
    ever provide a single hotplug slot per PHB.

    Under qemu/kvm the hotplug slot model aligns more with x86 where
    multiple slots are presented under a single PHB. As seen in the
    following each additional slot after the first fails to register due to
    each slot always being compared against the first child node of the PHB
    in the device tree.

      rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1
      rpaphp: Slot [Slot 0] registered
      rpaphp: pci_hp_register failed with error -16
      rpaphp: pci_hp_register failed with error -16
      rpaphp: pci_hp_register failed with error -16
      rpaphp: pci_hp_register failed with error -16

    The registration logic is fixed so that each slot is compared
    against the existing child devices of the PHB in the device tree to
    determine present slots vs empty slots.

      rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1
      rpaphp: Slot [C0] registered
      rpaphp: Slot [C1] registered
      rpaphp: Slot [C2] registered
      rpaphp: Slot [C3] registered
      rpaphp: Slot [C4] registered

    Signed-off-by: Tyrel Datwyler <email address hidden>
    Reviewed-by: Nathan Fontenot <email address hidden>
    [mpe: Massage changelog]
    Signed-off-by: Michael Ellerman <email address hidden>

== Comment: #5 - Scott E. Garfinkle <email address hidden> - 2016-08-23 19:20:14 ==
Well, you can also see the patch at https://patchwork.kernel.org/patch/9224345/

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-145071 severity-medium targetmilestone-inin1604
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Yakkety):
assignee: Taco Screen team (taco-screen-team) → nobody
status: New → Fix Released
Changed in linux (Ubuntu Xenial):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Revision history for this message
Breno Leitão (breno-leitao) wrote :

Tim,

Per your previous comment, it seems that this bug is already in process of SRU for 4.4 (16.04) kernel, correct?

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Breno - yes, the patch mentioned in the bug description has been submitted for review.

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Seth Forshee (sforshee) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (3.2 KiB)

------- Comment From <email address hidden> 2016-10-21 12:09 EDT-------
Making Chanh comment external:

(In reply to comment #18)
> This bug is awaiting verification that the kernel in -proposed solves the
> problem. Please test the kernel and update this bug with the results. If the
> problem is solved, change the tag 'verification-needed-xenial' to
> 'verification-done-xenial'.
>
> If verification is not done by 5 working days from today, this fix will be
> dropped from the source code, and this bug will be closed.
>
> See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to
> enable and use -proposed. Thank you!

I try to run this and the disk can be added/removed but still hit the error on kernel -45.
root@microg1:~# uname -r
4.4.0-45-generic
root@microg1:~# dmesg -T |grep "rpaphp" |tail
[Fri Oct 21 10:55:45 2016] rpaphp: pci_hp_register failed with error -16
[Fri Oct 21 10:55:45 2016] rpaphp: pci_hp_register failed with error -16
[Fri Oct 21 10:55:45 2016] rpaphp: pci_hp_register failed with error -16
[Fri Oct 21 10:55:45 2016] rpaphp: pci_hp_register failed with error -16
[Fri Oct 21 10:55:45 2016] rpaphp: pci_hp_register failed with error -16
[Fri Oct 21 10:55:45 2016] rpaphp: pci_hp_register failed with error -16
[Fri Oct 21 10:55:45 2016] rpaphp: pci_hp_register failed with error -16
[Fri Oct 21 10:55:45 2016] rpaphp: pci_hp_register failed with error -16
[Fri Oct 21 10:55:45 2016] rpaphp: pci_hp_register failed with error -16
[Fri Oct 21 10:55:45 2016] rpaphp: pci_hp_register failed with error -16
root@microg1:~# lsblk |grep "disk"
sda 8:0 0 40G 0 disk
sdb 8:16 0 80G 0 disk
root@microg1:~# lsblk |grep "disk"
sda 8:0 0 40G 0 disk
sdb 8:16 0 80G 0 disk
vda 253:0 0 10G 0 disk
root@microg1:~# lsblk |grep "disk"
sda 8:0 0 40G 0 disk
sdb 8:16 0 80G 0 disk

Which exact packages do I need to install from the proposed ? Thanks

Here is the command I run on the Host to do the hotplug disk.
root@micro:~# virsh attach-disk microg1 /var/lib/libvirt/images/hotplug-disk.raw.img --target vdw
Disk attached successfully

root@micro:~# virsh detach-disk microg1 --target vdw
Disk detached successfully

------- Comment From <email address hidden> 2016-10-21 12:30 EDT-------
I use the proposed to upgrade kernel to -46 and do not see this issue. Thanks
root@microg1:~# uname -a
Linux microg1 4.4.0-46-generic #67-Ubuntu SMP Thu Oct 20 15:02:14 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
root@microg1:~# lsblk |grep "disk"
sda 8:0 0 40G 0 disk
sdb 8:16 0 80G 0 disk
root@microg1:~# lsblk |grep "disk"
sda 8:0 0 40G 0 disk
sdb 8:16 0 80G 0 disk
vda 253:0 0 10G 0 disk
root@microg1:~# dmesg |grep "rpaphp"
[ 86.273739] rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1
[ 86.273981] rpaphp: Slot [C16] registered
[ 86.274076] rpaphp: Slot [C17] registered
[ 86.274174] rpaphp: Slot [C18] registered
[ 86.274290] rpaphp: Slot [C19] registered
*********
root@microg1:~# lsblk |grep "disk"
sda 8:0 0 40G 0 disk
sdb 8:16 0 80G 0 disk
vda 253:0 0 10G 0 disk
root@microg1:~# lsblk |grep...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.4.0-47.68

---------------
linux (4.4.0-47.68) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1636941

  * Add a driver for Amazon Elastic Network Adapters (ENA) (LP: #1635721)
    - lib/bitmap.c: conversion routines to/from u32 array
    - net: ethtool: add new ETHTOOL_xLINKSETTINGS API
    - net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
    - [config] enable CONFIG_ENA_ETHERNET=m (Amazon ENA driver)

  * unexpectedly large memory usage of mounted snaps (LP: #1636847)
    - [Config] switch squashfs to single threaded decode

 -- Kamal Mostafa <email address hidden> Wed, 26 Oct 2016 10:47:55 -0700

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.