[regression] libvirt cannot start guests on NUMA node 1

Bug #1404388 reported by Richard Laager
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Fix Released
High
Unassigned
Trusty
Fix Released
High
Unassigned

Bug Description

=====================================================
SRU Justification:
Impact: libvirt cannot start on numa nodes other than 0
Test case: See the libvirt xml below, use it to start on numa node 1
Regression potential: this is a cherrypick of an upstream patch. However
it did require some backporting, introducing a greater chance for error.
=====================================================

On Ubuntu Trusty, with libvirt 1.2.2, libvirt cannot start guests on NUMA node 1 (probably any node other than 0):
 $ virsh start pen2.office.wiktel.com
 error: Failed to start domain pen2.office.wiktel.com
 error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory

This worked on Ubuntu Precise with libvirt 0.9.8. Rebuilding libvirt 1.2.8 from vivid on Trusty works. It boots, and I've verified it allocates the memory from the correct NUMA node. Bisecting using released libvirt versions narrows it down to broken on 1.2.6 and working on 1.2.7.

There were a number of NUMA changes in 1.2.7, so it's not immediately clear which commit may be causing this. And it may take more than one commit being cherry-picked to address this.

So before I spend any more time bisecting, I'd like to know how you'd like to proceed on this. For example, if you just want to SRU libvirt 1.2.8 into Trusty, I won't bother narrowing it down further.

The relevant libvirt XML:
  <vcpu placement='static' cpuset='6-11'>2</vcpu>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>

The NUMA topology from virsh capabilities:
    <topology>
      <cells num='2'>
        <cell id='0'>
          <memory unit='KiB'>24682480</memory>
          <cpus num='6'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0'/>
            <cpu id='1' socket_id='0' core_id='1' siblings='1'/>
            <cpu id='2' socket_id='0' core_id='2' siblings='2'/>
            <cpu id='3' socket_id='0' core_id='8' siblings='3'/>
            <cpu id='4' socket_id='0' core_id='9' siblings='4'/>
            <cpu id='5' socket_id='0' core_id='10' siblings='5'/>
          </cpus>
        </cell>
        <cell id='1'>
          <memory unit='KiB'>16513168</memory>
          <cpus num='6'>
            <cpu id='6' socket_id='1' core_id='0' siblings='6'/>
            <cpu id='7' socket_id='1' core_id='1' siblings='7'/>
            <cpu id='8' socket_id='1' core_id='2' siblings='8'/>
            <cpu id='9' socket_id='1' core_id='8' siblings='9'/>
            <cpu id='10' socket_id='1' core_id='9' siblings='10'/>
            <cpu id='11' socket_id='1' core_id='10' siblings='11'/>
          </cpus>
        </cell>
      </cells>
    </topology>

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1404388] [NEW] [regression] libvirt cannot start guests on NUMA node 1

Quoting Richard Laager (<email address hidden>):
> Public bug reported:

Thanks for reporting and looking into this bug.

> On Ubuntu Trusty, with libvirt 1.2.2, libvirt cannot start guests on NUMA node 1 (probably any node other than 0):
> $ virsh start pen2.office.wiktel.com
> error: Failed to start domain pen2.office.wiktel.com
> error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory
>
> This worked on Ubuntu Precise with libvirt 0.9.8. Rebuilding libvirt
> 1.2.8 from vivid on Trusty works. It boots, and I've verified it
> allocates the memory from the correct NUMA node. Bisecting using
> released libvirt versions narrows it down to broken on 1.2.6 and working
> on 1.2.7.
>
> There were a number of NUMA changes in 1.2.7, so it's not immediately
> clear which commit may be causing this. And it may take more than one
> commit being cherry-picked to address this.
>
> So before I spend any more time bisecting, I'd like to know how you'd
> like to proceed on this. For example, if you just want to SRU libvirt
> 1.2.8 into Trusty, I won't bother narrowing it down further.

Unfortunately we can't just SRU 1.2.8 into Trusty. Hopefully the
actual fix will come down to a few patches we can cherrypick.

Revision history for this message
Richard Laager (rlaager) wrote :

I bisected this down to 7e72ac787848b7434c9359a57c1e2789d92350f8. I cherrypicked that for the 1.2.2 package in Trusty. This required cherrypicking aa668fccf078bf9833047776549a5a06435cf470 too. One hunk required a trivial manual reapplication. I've attached the refreshed patch.

Revision history for this message
Richard Laager (rlaager) wrote :

I can confirm that this patch, applied to 1.2.2-0ubuntu13.1.8, fixes the problem. The guest on my test system starts correctly (where it didn't before) and I've confirmed via /proc/PID/numa_maps that memory is allocated from node 1 as per the guest's configuration.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Cherrypicked patches" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Changed in libvirt (Ubuntu):
status: New → Fix Released
Changed in libvirt (Ubuntu Trusty):
status: New → Confirmed
importance: Undecided → High
Changed in libvirt (Ubuntu):
importance: Undecided → High
description: updated
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello Richard, or anyone else affected,

Accepted libvirt into trusty-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/libvirt/1.2.2-0ubuntu13.1.9 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in libvirt (Ubuntu Trusty):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Richard Laager (rlaager) wrote :

Yes, that package fixes the problem.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Richard Laager (rlaager) wrote :

What else is required before this package enters trusty-updates?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

It has been in -proposed for > 7 days, so it should be promoted any day now.

Revision history for this message
Richard Laager (rlaager) wrote :

It's been another 7 days, and I don't see it in trusty-updates. Is some manual process required?

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 1.2.2-0ubuntu13.1.9

---------------
libvirt (1.2.2-0ubuntu13.1.9) trusty-proposed; urgency=medium

  * apparmor libvirt-qemu template: allow reading charm-specific ceph config
    and allow reading under /tmp and /var/tmp (for SRU only) (LP: #1403648)
  * numa-cgroups-fix-cpuset-mems-init.patch - cherrypicked, refreshed patch
    (by Richard Laager) to fix failure to start on numa node 1 (LP: #1404388)
  * libvirt-qemu: add r to sgabios.bin (LP: #1393548)
 -- Serge Hallyn <email address hidden> Tue, 06 Jan 2015 10:39:15 -0600

Changed in libvirt (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for libvirt has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.