nova compute is crashing with the error TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'

Bug #1376307 reported by Numan Siddique
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Critical
Matt Riedemann

Bug Description

nova compute is crashing with the below error when nova compute is started

2014-10-01 14:50:26.854 ^[[00;32mDEBUG nova.virt.libvirt.driver [^[[00;36m-^[[00;32m] ^[[01;35m^[[00;32mUpdating host stats^[[00m ^[[00;33mfrom (pid=9945) update_status /opt/stack/nova/nova/virt/libvirt/driver.py:6361^[[00m
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 449, in fire_timers
    timer()
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 58, in __call__
    cb(*args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 167, in _do_send
    waiter.switch(result)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 207, in main
    result = function(*args, **kwargs)
  File "/opt/stack/nova/nova/openstack/common/service.py", line 490, in run_service
    service.start()
  File "/opt/stack/nova/nova/service.py", line 181, in start
    self.manager.pre_start_hook()
  File "/opt/stack/nova/nova/compute/manager.py", line 1152, in pre_start_hook
    self.update_available_resource(nova.context.get_admin_context())
  File "/opt/stack/nova/nova/compute/manager.py", line 5946, in update_available_resource
    nodenames = set(self.driver.get_available_nodes())
  File "/opt/stack/nova/nova/virt/driver.py", line 1237, in get_available_nodes
    stats = self.get_host_stats(refresh=refresh)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5771, in get_host_stats
    return self.host_state.get_host_stats(refresh=refresh)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 470, in host_state
    self._host_state = HostState(self)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6331, in __init__
    self.update_status()
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6387, in update_status
    numa_topology = self.driver._get_host_numa_topology()
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4828, in _get_host_numa_topology
    for cell in topology.cells])
TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'
2014-10-01 14:50:26.989 ^[[01;31mERROR nova.openstack.common.threadgroup [^[[00;36m-^[[01;31m] ^[[01;35m^[[01;31munsupported operand type(s) for /: 'NoneType' and 'int'^[[00m

Seems like the commit https://github.com/openstack/nova/commit/6a374f21495c12568e4754800574e6703a0e626f
is the cause.

Tags: libvirt
Revision history for this message
Numan Siddique (numansiddique) wrote :
Changed in nova:
assignee: nobody → Numan Siddique (numansiddique)
status: New → In Progress
Revision history for this message
Numan Siddique (numansiddique) wrote :
Revision history for this message
Numan Siddique (numansiddique) wrote :

The issue is seen with libvirt version 0.9.8 (i am using ubuntu 12.04).
libvirt 0.9.8 do not return memory information in the topology when nova calls virConnect.getCapabilities()

Eg..
<topology>
      <cells num='1'>
        <cell id='0'>
          <cpus num='4'>
            <cpu id='0'/>
            <cpu id='1'/>
            <cpu id='2'/>
            <cpu id='3'/>
          </cpus>
        </cell>
      </cells>
    </topology>

Below is the topology information for libvirt 1.1.1
<topology>
      <cells num='1'>
        <cell id='0'>
          <memory unit='KiB'>8388140</memory>
          <cpus num='4'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0'/>
            <cpu id='1' socket_id='1' core_id='0' siblings='1'/>
            <cpu id='2' socket_id='2' core_id='0' siblings='2'/>
            <cpu id='3' socket_id='3' core_id='0' siblings='3'/>
          </cpus>
        </cell>
      </cells>
    </topology>

Because of this LibvirtConfigCapsNUMACell.memory is None and hence the error

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/125459

Revision history for this message
YAMAMOTO Takashi (yamamoto) wrote :

Ryu CI is hitting this bug.

Revision history for this message
Daniel Berrange (berrange) wrote :

Libvirt 0.9.8 is no longer a supported version for Juno or later. You must have libvirt >= 0.9.11. If you really must stay on such old Ubuntu releases then enable their cloud archive repository which provides newer libvirt + QEMU

Changed in nova:
status: In Progress → Won't Fix
Revision history for this message
Roman Bogorodskiy (novel) wrote :

I think using 0.9.11 is not new enough to fix this problem.

libvirt support for including memory information for cells introduced in this commit:

http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=d3092c60f79cda9af713ae5933aac380e09a157e

The first release that includes that seems to be 1.0.4:

(18:59) novel@kloomba:~/code/libvirt[master] %> git-describe --contains d3092c60
v1.0.4-rc1~131
(18:59) novel@kloomba:~/code/libvirt[master] %>

PS I have a similar fix for this bug with unit tests added. Not sure if I should submit it though.

Revision history for this message
Numan Siddique (numansiddique) wrote :

I installed libvirt 0.9.11 and I got the below output when virConnectGetCapabilities(conn) libvirt API is called (http://libvirt.org/guide/html-single/#Application_Development_Guide-Connections-Capability_Info) . It is not reporting the memory parameter in the topology.

Roman - You can submit your tests on top of my patch (co authoring yourself) if Daniel thinks this bug needs to be reopened.

I couldn't test libvirt 0.9.11 + devstack. I will do it tomorrow to verify the crash.

Capabilities:
<capabilities>

  <host>
    <uuid>ce8aa809-65e1-41d2-a65d-b5bd3d1d590c</uuid>
    <cpu>
      <arch>x86_64</arch>
      <model>Westmere</model>
      <vendor>Intel</vendor>
      <topology sockets='4' cores='1' threads='1'/>
      <feature name='rdtscp'/>
      <feature name='pdpe1gb'/>
      <feature name='hypervisor'/>
      <feature name='avx'/>
      <feature name='osxsave'/>
      <feature name='xsave'/>
      <feature name='x2apic'/>
      <feature name='vmx'/>
      <feature name='pclmuldq'/>
      <feature name='ss'/>
      <feature name='vme'/>
    </cpu>
    <power_management>
      <suspend_mem/>
      <suspend_disk/>
      <suspend_hybrid/>
    </power_management>
    <migration_features>
      <live/>
      <uri_transports>
        <uri_transport>tcp</uri_transport>
      </uri_transports>
    </migration_features>
    <topology>
      <cells num='1'>
        <cell id='0'>
          <cpus num='4'>
            <cpu id='0'/>
            <cpu id='1'/>
            <cpu id='2'/>
            <cpu id='3'/>
          </cpus>
        </cell>
      </cells>
    </topology>
  </host>
...
..

Revision history for this message
sailajay (y-sailaja) wrote :

I am seeing the same error with 0.9.13

Changed in nova:
status: Won't Fix → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Numan Siddique (<email address hidden>) on branch: master
Review: https://review.openstack.org/125459

Revision history for this message
Matt Riedemann (mriedem) wrote :

I'm seeing the same thing on RHEL 6.5 with libvirt 0.10.2:

[root@rhel62 ~]# libvirtd --version
libvirtd (libvirt) 0.10.2
[root@rhel62 ~]# rpm -q libvirt
libvirt-0.10.2-29.el6.x86_64
[root@rhel62 ~]#

I've marked the bug as juno-rc-potential but I think this is a trigger for a juno-rc2 spin.

tags: added: juno-rc-potential
Changed in nova:
importance: Undecided → Critical
tags: added: libvirt
Matt Riedemann (mriedem)
Changed in nova:
status: In Progress → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/126181

Changed in nova:
assignee: Numan Siddique (numansiddique) → Matt Riedemann (mriedem)
status: Confirmed → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

I found this in the libvirt docs, there are a lot of different versions pointed out though for when things work a certain way:

http://libvirt.org/formatdomain.html#elementsNUMATuning

So I'm not sure if this is a libvirt version issue only or also a qemu version issue, e.g.:

"memnode
Optional memnode elements can specify memory allocation policies per each guest NUMA node. For those nodes having no corresponding memnode element, the default from element memory will be used. Attribute cellid addresses guest NUMA node for which the settings are applied. Attributes mode and nodeset have the same meaning and syntax as in memory element. This setting is not compatible with automatic placement. QEMU Since 1.2.7"

And:

"Each cell element specifies a NUMA cell or a NUMA node. cpus specifies the CPU or range of CPUs that are part of the node. memory specifies the node memory in kibibytes (i.e. blocks of 1024 bytes). Since 1.2.7 all cells should have id attribute in case referring to some cell is necessary in the code, otherwise the cells are assigned ids in the increasing order starting from 0. Mixing cells with and without the id attribute is not recommended as it may result in unwanted behaviour. Since 1.2.9 the optional attribute memAccess can control whether the memory is to be mapped as "shared" or "private". This is valid only for hugepages-backed memory.

This guest NUMA specification is currently available only for QEMU/KVM."

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/126181
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8ba0d9188d492028fcf4e65f908aa2d3db571952
Submitter: Jenkins
Branch: master

commit 8ba0d9188d492028fcf4e65f908aa2d3db571952
Author: Matt Riedemann <email address hidden>
Date: Sun Oct 5 05:56:35 2014 -0700

    Disable libvirt NUMA topology support if libvirt < 1.0.4

    If you're not at a new enough version of libvirt, the compute service
    fails on startup because VirtNUMATopologyCellUsage is not fully
    populated.

    This add a min version check before trying to get host NUMA topology
    information.

    Closes-Bug: #1376307

    Change-Id: I00f6325cb554bc5e34d9f0fe651af39630f35b5d

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (proposed/juno)

Fix proposed to branch: proposed/juno
Review: https://review.openstack.org/126299

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Numan Siddique (<email address hidden>) on branch: master
Review: https://review.openstack.org/125459
Reason: https://review.openstack.org/#/c/126181/ addresses this bug

Thierry Carrez (ttx)
Changed in nova:
milestone: none → juno-rc2
Thierry Carrez (ttx)
tags: removed: juno-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (proposed/juno)

Reviewed: https://review.openstack.org/126299
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0251b53966eaa9e724377a300ea247367fd778c7
Submitter: Jenkins
Branch: proposed/juno

commit 0251b53966eaa9e724377a300ea247367fd778c7
Author: Matt Riedemann <email address hidden>
Date: Sun Oct 5 05:56:35 2014 -0700

    Disable libvirt NUMA topology support if libvirt < 1.0.4

    If you're not at a new enough version of libvirt, the compute service
    fails on startup because VirtNUMATopologyCellUsage is not fully
    populated.

    This add a min version check before trying to get host NUMA topology
    information.

    Closes-Bug: #1376307

    Change-Id: I00f6325cb554bc5e34d9f0fe651af39630f35b5d
    (cherry picked from commit 8ba0d9188d492028fcf4e65f908aa2d3db571952)

Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-rc2 → 2014.2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/128894

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)
Download full text (7.7 KiB)

Reviewed: https://review.openstack.org/128894
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9825784742d010a902ff149765269ad32a8a0dfd
Submitter: Jenkins
Branch: master

commit 7c9aa6da92805f20083203a6ec8f93b1b592fc13
Author: He Jie Xu <email address hidden>
Date: Sun Oct 5 00:20:01 2014 +0800

    Fix pci_request_id break the upgrade from icehouse to juno

    commit a8a5d44c8aca218f00649232c2b8a46aee59b77e add pci_request_id
    as one item for the request_network tuple. But the icehouse code
    assume only three items in the tuple.

    This patch filters pci_request_id out from the tuple.

    Cherry-Pick from:
    https://review.openstack.org/#/c/126144/6

    Change-Id: I991e1c68324fe92fac647583f3ec8f6aec637913
    Closes-Bug: #1377447

commit 10a5eecd0973096b57efd31f8b27d7295a44ab89
Author: Andreas Jaeger <email address hidden>
Date: Thu Oct 9 12:22:36 2014 +0200

    Updated translations

    Commands run:-
    $ python setup.py extract_messages
    $ python setup.py update_catalog --no-fuzzy-matching \
      --ignore-obsolete=true
    $ source \
      ../openstack-infra/project-config/jenkins/scripts/common_translation_update.sh
    $ setup_loglevel_vars
    $ cleanup_po_file nova

    Change-Id: I64b2b468f7edd44dbb445b5b4e68b65c3fa53d9e

commit 3f9003270efd9ac036f3c229b36baa0bb05203bf
Author: Russell Bryant <email address hidden>
Date: Wed Oct 8 12:14:31 2014 +0000

    Fix broken cert revocation

    Cert revocation was broken by
    32b0adb591f80ad2c5c19519b4ffc2b55dbea672. os.chdir() never returns
    anything, so this method would always raise an exception. The proper
    way to handle an error from os.chdir() is to catch OSError.

    There were existing tests for this code, but they conveniently mocked
    os.chdir() to return values that are never actually returned. The
    tests were fixed to match the real behavior.

    Change-Id: I7549bb60a7d43d53d6f81eecea31cbb9720cc8b6
    Closes-bug: #1376368
    (cherry picked from commit c8538208da00c3b0d0646629c9d668aa69944b85)

commit 6ed57972093835f449ad645b3783bbb8b3c4245e
Author: Russell Bryant <email address hidden>
Date: Fri Oct 3 16:41:03 2014 -0400

    Update rpc version aliases for juno

    Update all of the rpc client API classes to include a version alias
    for the latest version implemented in Juno. This alias is needed when
    doing rolling upgrades from Juno to Kilo. With this in place, you can
    ensure all services only send messages that both Juno and Kilo will
    understand.

    Closes-bug: #1378786
    Change-Id: Ia81538130bf8530b70b5f55c7a3d565903ff54b4
    (cherry picked from commit f98d725103c53e767a1cddb0b7e2c3822309db17)

commit ee3594072a7ef1c3f5661021fb31118069cbd646
Author: Tristan Cacqueray <email address hidden>
Date: Fri Oct 3 19:53:42 2014 +0000

    Mask passwords in exceptions and error messages

    When a ProcessExecutionError is thrown by processutils.ssh_execute(),
    the exception may contain information such as password. Upstream
    applications that just log the message (as several appear to do)
    could inadvertently expose these passwords to a u...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.