CPU hot unplug fails after migrating a CPU hotplugged guest from source

Bug #1677552 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Fix Released
Medium
Christian Ehrhardt 

Bug Description

== Comment: #0 - Balamuruhan S <email address hidden> - 2017-02-28 03:21:57 ==
---Problem Description---
CPU hot unplug fails after migrating a CPU hotplugged guest from source

Perform CPU hotplug before migration,

# virsh setvcpus avocado-vt-vm1-migration 64 --live

Hotplugged CPUs in source are available from guest XML and reflected from inside guest.

# virsh -c 'qemu:///system' migrate --live --domain avocado-vt-vm1-migration --desturi qemu+ssh://9.40.192.188/system --timeout 60

Migration is success without any issue

# virsh -c 'qemu+ssh://9.40.192.188/system' setvcpus avocado-vt-vm1-migration 8 --live
error: operation failed: vcpu unplug request timed out

---uname output---
# uname -a Linux c158f2u09os 4.10.0-9-generic #11-Ubuntu SMP Mon Feb 20 13:45:11 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

# qemu-img --version
qemu-img version 2.8.0(Debian 1:2.8+dfsg-2ubuntu1)
Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers

# dpkg -l | grep libvirt
ii libvirt-bin 2.5.0-3ubuntu2 ppc64el programs for the libvirt library
ii libvirt-clients 2.5.0-3ubuntu2 ppc64el Programs for the libvirt library
ii libvirt-daemon 2.5.0-3ubuntu2 ppc64el Virtualization daemon
ii libvirt-daemon-system 2.5.0-3ubuntu2 ppc64el Libvirt daemon configuration files
ii libvirt-dev:ppc64el 2.5.0-3ubuntu2 ppc64el development files for the libvirt library
ii libvirt-glib-1.0-0:ppc64el 1.0.0-1 ppc64el libvirt GLib and GObject mapping library
ii libvirt0:ppc64el 2.5.0-3ubuntu2 ppc64el library for interfacing with different virtualization systems
ii python-libvirt 3.0.0-2 ppc64el libvirt Python bindings

Machine Type = Tuleta

---Steps to Reproduce---
1. Created guest with shared storage in NFS
2. Enabled ports 49152:49216 in iptables, virt_use_nfs -> on
3. Mounted the image location in destination and started migration.
4. Perform CPU hotplug to guest in source before migration
5. Perform live migration to other host.
6. CPU Hot unplug fails with "error: operation failed: vcpu unplug request timed out"

Contact Information = Balamuruhan S / <email address hidden>

Userspace tool common name: virsh (libvirt)

The userspace tool has the following bit modes: ppc64le

Userspace rpm: libvirt-bin, libvirt-daemon

Userspace tool obtained from project website: na

*Additional Instructions for Balamuruhan S / <email address hidden>:
-Post a private note with access information to the machine that the bug is occuring on.
-Attach ltrace and strace of userspace application.

== Comment: #4 - Balamuruhan S <email address hidden> - 2017-03-01 03:17:07 ==

== Comment: #5 - Balamuruhan S <email address hidden> - 2017-03-01 03:17:38 ==

== Comment: #9 - Shivaprasad G. Bhat <email address hidden> - 2017-03-24 05:26:57 ==
On new ubuntu kernel 4.10.0.13, with in-kernel hotplug/unplug code, the newly hotplugged core post migration can be unplugged. The cores hotplugged before the migration cannot be unplugged post migration.

Discussed with Bharata and he believes the issue belongs to qemu.

== Comment: #11 - BHARATA BHASKER RAO <email address hidden> - 2017-03-27 01:49:59 ==
Usually when a device hotplug is done at the source and the guest is migrated, the QEMU cmdline at the target is appended with the hot added device at the source. For example,

At the source:
qemu ... -smp 4,maxcpus=8
(qemu) device_add host-spapr-cpu-core,core-id=4,id=core4

At the target, QEMU is started like this before starting the migration:
qemu ... -smp 4,maxcpus=8 -device host-spapr-cpu-core,core-id=4,id=core4

Thus the hot added CPU at the source became a cold-plugged CPU at the target. This works.

What is done differently here is that libvirt is in fact doing a hotplug at the target via QEMU monitor before the migration. In this situation when the guest is migrated from source to target, the DRC state information for the added CPU at the target will be wrong. The hot added CPU at the target will never undergo the DRC state transitions via RTAS set allocation calls. Hence subsequent hot unplug fails at the target.

This situation can be fixed by migrating the DRC state information and updating the same for the hot added CPU at the target. This is what Jianjun Duan's DRC state migration patchset achieves. I have verified (using his old patchset, v5) that this problem disappears when DRC state migration is done.

So the fix for this is to get Jianjun's DRC migration patchset upstream and then to 1704.

Revision history for this message
bugproxy (bugproxy) wrote : guest_Sosreport

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-152065 severity-critical targetmilestone-inin1704
Revision history for this message
bugproxy (bugproxy) wrote : Host_Sosreport

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → qemu (Ubuntu)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I agree to your statement: "So the fix for this is to get Jianjun's DRC migration patchset upstream and then to 1704".

I checked and found that this has some pretty long history [1].
I looked at the context but you are so much more aware of the details that I have some questions for you below.

You wrote you tested v5 which I found is the last one before v6 stated to: "Split from Power specific patches". I guess you mean you tried [2] then right?

I didn't find the future of the ppc part after v6. The diffstat changed tremendously v5 -> v6.. Is the ppc specific code no more needed in the newer revisions, or was it just dropped (or ppc parts even accepted already) to get some better progress on the generic content first?

The cover letter text always states about "spapr DRC state" (also RTAS) which sounds very PPC specific, yet the last revisions seemed to be more generic - but maybe only to implement the tailq bits in general. Do you happen to know if the negative effect is
a) only affecting ppc (I know x86 has no perfect hotplug history anyway)?
b) a regression that worked with old versions (maybe because libvirt did start qemu differently as you outlined)?

I found that the latter v17 was queued and pulled in [3].
Changes are in v2.9.0-rc0 and later, does that now mean all you need is in 2.9?
Or are the ppc changes that got split v5->v6 still missing and need to be rebased on top on what went upstream with v17 just recently?

[1]: http://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg04084.html
[2]: http://lists.nongnu.org/archive/html/qemu-devel/2016-10/msg00270.html
[3]: https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg05234.html

Changed in qemu (Ubuntu):
status: New → Incomplete
importance: Undecided → Medium
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-03-30 09:24 EDT-------
(In reply to comment #17)
> I agree to your statement: "So the fix for this is to get Jianjun's DRC
> migration patchset upstream and then to 1704".
>
> I checked and found that this has some pretty long history [1].
> I looked at the context but you are so much more aware of the details that I
> have some questions for you below.
>
> You wrote you tested v5 which I found is the last one before v6 stated to:
> "Split from Power specific patches". I guess you mean you tried [2] then
> right?

Right, I tried with ppc specific patches from [2] on mainline QEMU which already has QTAILQ bits in it.

>
> I didn't find the future of the ppc part after v6. The diffstat changed
> tremendously v5 -> v6.. Is the ppc specific code no more needed in the newer
> revisions, or was it just dropped (or ppc parts even accepted already) to
> get some better progress on the generic content first?

Yes, I think so.

>
> The cover letter text always states about "spapr DRC state" (also RTAS)
> which sounds very PPC specific, yet the last revisions seemed to be more
> generic - but maybe only to implement the tailq bits in general. Do you
> happen to know if the negative effect is
> a) only affecting ppc (I know x86 has no perfect hotplug history anyway)?

Yes, this affects only hotplug on PPC.

> b) a regression that worked with old versions (maybe because libvirt did
> start qemu differently as you outlined)?

This is not a regression for CPU hotplug because this situation never worked right from the beginning.

>
> I found that the latter v17 was queued and pulled in [3].
> Changes are in v2.9.0-rc0 and later, does that now mean all you need is in
> 2.9?
> Or are the ppc changes that got split v5->v6 still missing and need to be
> rebased on top on what went upstream with v17 just recently?

2.9 has the generic changes (QTAILQ migration) in, but PPC bits (DRC state migration) yet need to be upstreamed first.

Jianjun is working on this and he will be upstreaming the PPC bits soon. However I don't think we have enough time to get his changes into 2.9 now.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thank you a lot for clarifying my questions!

Ok, for now I'll keep an eye open and you ping the bug once there is an update on the upstreaming of this.

Changed in qemu (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → ChristianEhrhardt (paelzer)
bugproxy (bugproxy)
tags: added: severity-high
removed: severity-critical
Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (5.6 KiB)

------- Comment From <email address hidden> 2017-04-27 16:58 EDT-------
I've finally had the opportunity to test the patch set I've sent to qemu mailing list ("[PATCH 0/4 v7] migration/ppc: migrating DRC, ccs_list and pending_events?") that fixes this issue using libvirt. Until then I've tested using QEMU alone. virsh still reports the same error, but the hot unplug is successful in the VM after the migration. Apparently my QEMU patch set alone is not enough to fix this virsh behavior.

In my test I've used 2 Ubuntu 17.04 P8 hosts. I had to compile libvirt from scratch to make it work with the compiled upstream QEMU + my patch set:

- source host:

root@source:/home/danielhb/usr/bin#
root@source:/home/danielhb/usr/bin# ./virsh start dhb_ub1704_nfs
Domain dhb_ub1704_nfs started

root@source:/home/danielhb/usr/bin# ./virsh console dhb_ub1704_nfs
Connected to domain dhb_ub1704_nfs
Escape character is ^]
Password:
Last login: Thu Apr 27 13:48:41 CDT 2017 on hvc0
danielhb@Ub1704NFS:~$ lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Model: 2.0 (pvr 004d 0200)
Model name: POWER8 (architected), altivec supported
Hypervisor vendor: KVM
Virtualization type: para
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0
danielhb@Ub1704NFS:~$
root@source:/home/danielhb/usr/bin# ./virsh setvcpus dhb_ub1704_nfs 2 --live
root@source:/home/danielhb/usr/bin# ./virsh -c 'qemu:///system' migrate --live --domain dhb_ub1704_nfs --desturi qemu+ssh://<target_ip>/system --timeout 60 --verbose
Migration: [100 %]
root@source:/home/danielhb/usr/bin#

- In the destination host:

root@target:/home/danielhb/usr/bin# ./virsh console dhb_ub1704_nfs
Connected to domain dhb_ub1704_nfs
Escape character is ^]

Ub1704NFS login: danielhb
Password:
danielhb@Ub1704NFS:~$
danielhb@Ub1704NFS:~$ lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 2
NUMA node(s): 1
Model: 2.0 (pvr 004d 0200)
Model name: POWER8 (architected), altivec supported
Hypervisor vendor: KVM
Virtualization type: para
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0,1

Migration was successful and the VM is reporting 2 CPUs, one of them was hotplugged before the migration.

- Hot unplugged one CPU using the source host:

root@source:/home/danielhb/usr/bin# ./virsh -c 'qemu+ssh://<target_ip>/system' setvcpus dhb_ub1704_nfs 1 --live
error: operation failed: vcpu unplug request timed out

Same error as reported in the bug.

- Back on the target host here is the message that appears on libvirtd log:

/home/danielhb/usr/sbin# 2017-04-27 19:58:31.779+0000: 52854: error : qemuDomainHotplugDelVcpu:5403 : operation failed: vcpu unplug request timed out

- However, the VM reports that the hot unplug was successful:

root@target:/home/danielhb/usr/bin# ./virsh console dhb_ub1704_nfs
Con...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for the Update Daniel, are you now looking into libvirt as well or what are the next steps.
I don't want to loose sync with everybody expecting the ball is in the field of the other - so I thought I better ask.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-04-28 10:27 EDT-------
> Thanks for the Update Daniel, are you now looking into libvirt as well or
> what are the next steps.
> I don't want to loose sync with everybody expecting the ball is in the field
> of the other - so I thought I better ask.

At this moment I am working in the reviews of the QEMU patchset. I can look further into the libvirt issue after the review is taken care of.

Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (3.3 KiB)

------- Comment From <email address hidden> 2017-05-03 09:11 EDT-------
Update: it turns out that Libvirt isn't at fault here. I've followed up Mike suspicion on the situation:

----
MICHAEL D. ROTH 2017-05-02
One thing to verify would be whether or not unplug is completely done on the target. Even though the guest might show unplug successful, depending on how it sets the DRC states for the device QEMU may or may not have finalized the object, which would result in QEMU never emitting the device-deleted QMP event.
----

And he was right. The patch set wasn't allowing the hot unplug to happen as expected in the target system after the migration. The QMP event was never fired and then Libvirt would simply hang out waiting for response until the timeout.

I've fixed the issue in QEMU side. Here is my latest test with my new patch set and Libvirt upstream:

-- source host:

# ./virsh start dhb_ub1704_nfs
Domain dhb_ub1704_nfs started
# ./virsh setvcpus dhb_ub1704_nfs 2 --live
#
# ./virsh -c 'qemu:///system' migrate --live --domain dhb_ub1704_nfs --desturi qemu+ssh://9.40.193.37/system --timeout 60 --verbose
Migration: [100 %]
# ./virsh -c 'qemu+ssh://9.40.193.37/system' setvcpus dhb_ub1704_nfs 1 --live

#

--- destination host after migration and remote hot unplug:

# ./virsh console dhb_ub1704_nfs
Connected to domain dhb_ub1704_nfs
Escape character is ^]

danielhb@Ub1704NFS:~$ lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Model: 2.0 (pvr 004d 0200)
Model name: POWER8 (architected), altivec supported
Hypervisor vendor: KVM
Virtualization type: para
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0
danielhb@Ub1704NFS:~$ dmesg | tail -n 5
[ 5.361640] audit: type=1400 audit(1493815576.200:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/connman/scripts/dhclient-script" pid=651 comm="apparmor_parser"
[ 5.363306] audit: type=1400 audit(1493815576.200:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/lxc-start" pid=691 comm="apparmor_parser"
[ 5.880997] cgroup: new mount options do not match the existing superblock, will be ignored
[ 115.040357] pseries-hotplug-cpu: CPU with drc index 10000008 already exists
[ 115.058023] cpu 1 (hwid 8) Ready to die...
danielhb@Ub1704NFS:~$
#
# ./virsh qemu-monitor-command dhb_ub1704_nfs --hmp info cpus ; ./virsh qemu-monitor-command dhb_ub1704_nfs --hmp info hotpluggable-cpus
* CPU #0: nip=0xc00000000009f22c thread_id=183233

Hotpluggable CPUs:
type: "host-spapr-cpu-core"
vcpus_count: "1"
CPUInstance Properties:
core-id: "3"
type: "host-spapr-cpu-core"
vcpus_count: "1"
CPUInstance Properties:
core-id: "2"
type: "host-spapr-cpu-core"
vcpus_count: "1"
CPUInstance Properties:
core-id: "1"
type: "host-spapr-cpu-core"
vcpus_count: "1"
qom_path: "/machine/unattached/device[0]"
CPUInstance Properties:
core-id: "0"

I'll clean the code up and resend the patch set to the mailing for approval. This alone will ...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-05-22 16:42 EDT-------
Update: the review process of the patch set is still ongoing in the QEMU mailing list. The reason is that we're taking the opportunity to do required clean-ups in the existing code, such as removing the code that implemented the event_scan interface RTAS stub, consolidating functions and so on. The patch set was also split in two.

The design choice that took considerable time was how to eliminate the opaque elements that couldn't be migrated. CPU unplug and PCI unplug are unafected by it, but it directly impacts memory (LMB) unplug code. LMB unplug requires the release of several DRCs for each DIMM and the state of this removal must be somehow preserved for each callback call - today this is done by the opaque argument. The design went from doing a full LMB scan inside the callback every time to migrating the information and finally the design we have today :recoving the state of the released DIMMs only if it got erased in the migration process.

One of the new clean up patches (removing the RTAS event_scan stub) was already accepted in ppc-for-2.10 branch. I expect that this week will be the final adjustments for both patch sets.

Daniel

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-05-31 12:56 EDT-------
The patches were accepted upstream. Not all patches from the patch series were applied - pending_events and css_list migration are still pending - but with the current applied patches the bug is already solved. I'll keep working on these 2 patches nevertheless.

You can find them at:

https://github.com/qemu/qemu/commit/bff3063837a76b37a4bbbfe614324ca38e859f2b
https://github.com/qemu/qemu/commit/0cffce56ae3501c5783d779f97993ce478acf856
https://github.com/qemu/qemu/commit/318347234d7069b62d38391dd27e269a3107d668
https://github.com/qemu/qemu/commit/a50919dddf148b0a2008db4a0593dbe69e1059c0
https://github.com/qemu/qemu/commit/16ee99805e069601ba3ce9da524bab377ab03866

Let me know if you're still experiencing the problem after applying these 5 patches.

Daniel

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-07-06 15:27 EDT-------
Any news, Christian? Have you got any problems with the backport of the upstreamed patches I mentioned in my last message?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Daniel, thanks for bringing those upstream.
I haven't had any problems since I'll tackle them along with the qemu merge for Artful.
Aligning those with other tasks and PTO means it will take a bit longer to get to them.

OTOH adding the patches to current qemu just to throw them away a few weeks later when merging the newer version would be kind of pointless.

bugproxy (bugproxy)
tags: added: targetmilestone-inin1710
removed: targetmilestone-inin1704
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I see the changed being part of qemu 2.10 release candidates and hope to Merge that if not big blocker shows up.

tags: added: p9-virt-stack
tags: added: virt-fixed-by-2.10
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.5 KiB)

This bug was fixed in the package qemu - 1:2.10~rc3+dfsg-0ubuntu1

---------------
qemu (1:2.10~rc3+dfsg-0ubuntu1) artful; urgency=medium

  * Merge with Debian unstable (2.8) and Upstream 2.10-rci3; This fixes
    a set of bugs
    - [FFE] Qemu 2.10 in Artful (LP: #1699968)
    - CPU hot unplug fails after migrating a CPU hotplugged guest
      from source (LP: #1677552)
    - [Feature] KNL/KNM: Numa Distance on KVM(LP: #1647902)
    - New KVM 288 Pass Through (LP: #1672447)
    - aarch64: MSI is not supported by interrupt controller (LP: #1706630)
  * Remaining changes:
    - qemu-kvm to systemd unit
      - d/qemu-kvm-init: script for QEMU KVM preparation modules, ksm,
        hugepages and architecture specifics
      - d/qemu-kvm.service: systemd unit to call qemu-kvm-init
      - d/qemu-system-common.install: install systemd unit and helper script
      - d/qemu-system-common.maintscript: clean old sysv and upstart scripts
      - d/qemu-system-common.qemu-kvm.default: defaults for
        /etc/default/qemu-kvm
      - d/rules: install /etc/default/qemu-kvm
    - Enable nesting by default
      - set nested=1 module option on intel. (is default on amd)
      - re-load kvm_intel.ko if it was loaded without nested=1
      - d/p/ubuntu/expose-vmx_qemu64cpu.patch: expose nested kvm by default
        in qemu64 cpu type.
      - d/p/ubuntu/enable-svm-by-default.patch: Enable nested svm by default
        in qemu64 on amd
    - libvirt/qemu user/group support
      - qemu-system-common.postinst: remove acl placed by udev, and add udevadm
        trigger.
      - qemu-system-common.preinst: add kvm group if needed
    - Distribution specific machine type
      - d/p/ubuntu/define-ubuntu-machine-types.patch: define distro machine
        types to ease future live vm migration.
      - d/qemu-system-x86.NEWS Info on fixed machine type defintions
    - improved dependencies
      - Make qemu-system-common depend on qemu-block-extra
      - Make qemu-utils depend on qemu-block-extra
      - let qemu-utils recommend sharutils
    - s390x support
      - Create qemu-system-s390x package
      - Include s390-ccw.img firmware
      - Enable numa support for s390x
    - ppc64[le] support
      - d/qemu-system-ppc.links provide usr/bin/qemu-system-ppc64le symlink
      - Enable seccomp for ppc64el
      - bump libseccomp-dev dependency, 2.3 is the minimum for ppc64
    - arch aware kvm wrappers
    - disable missing x32 architecture
    - update VCS links
  * Added changes
      - d/rules: or32 is now named or1k (since 4a09d0bb)
      - d/qemu-system-common.docs: new paths since (ac06724a)
      - d/qemu-system-common.install: qmp-commands.txt removed, but replaced
        by qapi-schema.json which is already packaged (since 4d8bb958)
      - Updates in debian/patches to match qemu 2.10
        - d/p/02_kfreebsd.patch: utimensat is no more optional upstream
        - d/p/ubuntu/enable-svm-by-default.patch: target-i386 -> target/i386
        - d/p/ubuntu/expose-vmx_qemu64cpu.patch: target-i386 -> target/i386
        - d/p/ubuntu/define-ubuntu-machine-types.patch: new 2.10 ubuntu types
        - update VCS-git to match the Artful branch
      - s390x pack...

Read more...

Changed in qemu (Ubuntu):
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.