Nova loses ability to build instances

Bug #956313 reported by David Kranz
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Vish Ishaya

Bug Description

This was running Tempest against the latest essex devstack on libvirt/kvm/precise. It is not entirely reproducible but kind of. Running the latest tempest tests ends up failing, usually in the floating ips test, and can no longer create new vms so the rest of the tests fail. Attempts to create new instances end up in the error state. There are these tracebacks in the logs.

Tempest transcript:

$ TEMPEST_CONFIG=tempest.essex.conf nosetests -vx tempest
All public and private addresses for ... ok
Providing a network type should filter ... ok
List addresses by network should fail if network name not valid ... ok
List addresses request should fail if server id not in system ... ok
List of all extensions ... ok
The expected flavor details should be returned ... ok
flavor details are not returned for non existant flavors ... ok
List of all flavors should contain the expected flavor ... ok
The detailed list of flavors should be filtered by disk space ... ok
The detailed list of flavors should be filtered by RAM ... ok
Only the expected number of flavors (detailed) should be returned ... ok
The list of flavors should start from the provided marker ... /usr/lib/python2.6/site-packages/unittest2-0.5.1-py2.6.egg/unittest2/case.py:349: DeprecationWarning: Use of a TestResult without an addExpectedFailure method is deprecated
  DeprecationWarning)
ok
The list of flavors should be filtered by disk space ... ok
The list of flavors should be filtered by RAM ... ok
Only the expected number of flavors should be returned ... ok
The list of flavors should start from the provided marker ... ok
Detailed list of all flavors should contain the expected flavor ... ok
Positive test:Allocation of a new floating IP to a project ... ok
Positive test:Associate the provided floating IP to a specific server ... ok
Negative test:Association of a non existant floating IP ... ok
Positive test:Deletion of valid floating IP from project ... ok
Negative test:Deletion of a nonexistant floating IP ... ok
Positive test:Dissociate the provided floating IP ... ok
Negative test:Dissociation of a non existant floating IP should fail ... ok
The metadata value/key pair should be deleted from the image ... ok
Negative test: Shouldnt be able to delete metadata ... ok
The value for a specific metadata key should be returned ... ok
Negative test: Get on nonexistant image should not happen ... ok
All metadata key/value pairs for an image should be returned ... ok
Negative test: List on nonexistant image ... ok
The metadata for the image should match the new values ... ok
The value provided for the given meta item should be set for the image ... ok
Negative test: Metadata should not be set to a nonexistant image ... ok
Negative test: Metadata item should not be set to a ... ok
The metadata for the image should match the updated values ... ok
Negative test:An update should not happen for a nonexistant image ... ok
An image for the provided server should be created ... ok
An image should not be created if the server instance is removed ... ok
Keypairs with duplicate names should not be created ... ok
Keypairs with name being an empty string should not be created ... ok
Keypairs with name longer than 255 chars should not be created ... ok
Keypair should be created, verified and deleted ... ok
Keypair should not be created with a non RSA public key ... ok
Keypair should be created with a given public key ... ok
Non-existant key deletion should throw a proper error ... ok
Keypairs created should be available in the response list ... ok
Positive test:Should be able to GET the details of floatingIP ... ok
Negative test:Should not be able to GET the details ... SKIP: Skipping until Nova Bug 940500 is fixed
Positive test:Should return the list of floating IPs ... ok
ERROR

======================================================================
ERROR: test suite for <class 'tempest.tests.test_list_images.ListImagesTest'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/nose-1.1.2-py2.6.egg/nose/suite.py", line 208, in run
    self.setUp()
  File "/usr/lib/python2.6/site-packages/nose-1.1.2-py2.6.egg/nose/suite.py", line 291, in setUp
    self.setupContext(ancestor)
  File "/usr/lib/python2.6/site-packages/nose-1.1.2-py2.6.egg/nose/suite.py", line 314, in setupContext
    try_run(context, names)
  File "/usr/lib/python2.6/site-packages/nose-1.1.2-py2.6.egg/nose/util.py", line 478, in try_run
    return func()
  File "/cygdrive/c/source/tempest/tempest/tests/test_list_images.py", line 34, in setUpClass
    cls.servers_client.wait_for_server_status(cls.server1['id'], 'ACTIVE')
  File "/cygdrive/c/source/tempest/tempest/services/nova/json/servers_client.py", line 147, in wait_for_server_status
    raise exceptions.BuildErrorException(server_id=server_id)
BuildErrorException: Server cf256252-528c-4d69-a4a5-e6889bccf380 failed to build and is in ERROR status

----------------------------------------------------------------------
Ran 49 tests in 319.152s

From nova-compute log:
(nova.rpc.common): TRACE: channel, method_sig, args, content = read_timeout(timeout)
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 94, in read_timeout
(nova.rpc.common): TRACE: return self.method_reader.read_method()
(nova.rpc.common): TRACE: File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/method_framing.py", line 221, in read_method
(nova.rpc.common): TRACE: raise m
(nova.rpc.common): TRACE: timeout: timed out
(nova.rpc.common): TRACE:
2012-03-15 14:57:24 ERROR nova.compute.manager [-] [instance: 2b8b0ddf-7850-4319-b151-10ff7d482287] Instance failed network setup
(nova.compute.manager): TRACE: Traceback (most recent call last):
(nova.compute.manager): TRACE: File "/opt/stack/nova/nova/compute/manager.py", line 559, in _allocate_network
(nova.compute.manager): TRACE: requested_networks=requested_networks)
(nova.compute.manager): TRACE: File "/opt/stack/nova/nova/network/api.py", line 170, in allocate_for_instance
(nova.compute.manager): TRACE: 'args': args})
(nova.compute.manager): TRACE: File "/opt/stack/nova/nova/rpc/__init__.py", line 68, in call
(nova.compute.manager): TRACE: return _get_impl().call(context, topic, msg, timeout)
(nova.compute.manager): TRACE: File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 674, in call
(nova.compute.manager): TRACE: return rpc_amqp.call(context, topic, msg, timeout, Connection.pool)
(nova.compute.manager): TRACE: File "/opt/stack/nova/nova/rpc/amqp.py", line 338, in call
(nova.compute.manager): TRACE: rv = list(rv)
(nova.compute.manager): TRACE: File "/opt/stack/nova/nova/rpc/amqp.py", line 299, in __iter__
(nova.compute.manager): TRACE: self._iterator.next()
(nova.compute.manager): TRACE: File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 572, in iterconsume
(nova.compute.manager): TRACE: yield self.ensure(_error_callback, _consume)
(nova.compute.manager): TRACE: File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 503, in ensure
(nova.compute.manager): TRACE: error_callback(e)
(nova.compute.manager): TRACE: File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 553, in _error_callback
(nova.compute.manager): TRACE: raise rpc_common.Timeout()
(nova.compute.manager): TRACE: Timeout: Timeout while waiting on RPC response.
(nova.compute.manager): TRACE:
2012-03-15 14:57:24 ERROR nova.rpc.amqp [-] Exception during message handling
(nova.rpc.amqp): TRACE: Traceback (most recent call last):
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/rpc/amqp.py", line 252, in _process_data
(nova.rpc.amqp): TRACE: rval = node_func(context=ctxt, **node_args)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/exception.py", line 114, in wrapped
(nova.rpc.amqp): TRACE: return f(*args, **kw)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/compute/manager.py", line 177, in decorated_function
(nova.rpc.amqp): TRACE: sys.exc_info())
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
(nova.rpc.amqp): TRACE: self.gen.next()
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/compute/manager.py", line 171, in decorated_function
(nova.rpc.amqp): TRACE: return function(self, context, instance_uuid, *args, **kwargs)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/compute/manager.py", line 648, in run_instance
(nova.rpc.amqp): TRACE: self._run_instance(context, instance_uuid, **kwargs)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/compute/manager.py", line 451, in _run_instance
(nova.rpc.amqp): TRACE: self._set_instance_error_state(context, instance_uuid)
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
(nova.rpc.amqp): TRACE: self.gen.next()
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/compute/manager.py", line 424, in _run_instance
(nova.rpc.amqp): TRACE: requested_networks)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/compute/manager.py", line 559, in _allocate_network
(nova.rpc.amqp): TRACE: requested_networks=requested_networks)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/network/api.py", line 170, in allocate_for_instance
(nova.rpc.amqp): TRACE: 'args': args})
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/rpc/__init__.py", line 68, in call
(nova.rpc.amqp): TRACE: return _get_impl().call(context, topic, msg, timeout)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 674, in call
(nova.rpc.amqp): TRACE: return rpc_amqp.call(context, topic, msg, timeout, Connection.pool)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/rpc/amqp.py", line 338, in call
(nova.rpc.amqp): TRACE: rv = list(rv)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/rpc/amqp.py", line 299, in __iter__
(nova.rpc.amqp): TRACE: self._iterator.next()
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 572, in iterconsume
(nova.rpc.amqp): TRACE: yield self.ensure(_error_callback, _consume)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 503, in ensure
(nova.rpc.amqp): TRACE: error_callback(e)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/rpc/impl_kombu.py", line 553, in _error_callback
(nova.rpc.amqp): TRACE: raise rpc_common.Timeout()
(nova.rpc.amqp): TRACE: Timeout: Timeout while waiting on RPC response.
(nova.rpc.amqp): TRACE:
2012-03-15 14:57:51 DEBUG nova.manager [-] Running periodic task ComputeManager._publish_service_capabilities from (pid=3883) periodic_tasks /opt/stack/nova/nova/manager.py:152
2012-03-15 14:57:51 DEBUG nova.manager [-] Notifying Schedulers of capabilities ... from (pid=3883) _publish_service_capabilities /opt/stack/nova/nova/manager.py:203
2012-03-15 14:57:51 DEBUG nova.rpc.amqp [-] Making asynchronous fanout cast... from (pid=3883) fanout_cast /opt/stack/nova/nova/rpc/amqp.py:354
2012-03-15 14:57:51 DEBUG nova.manager [-] Running periodic task ComputeManager._poll_rescued_instances from (pid=3883) periodic_tasks /opt/stack/nova/nova/manager.py:152
2012-03-15 14:57:51 DEBUG nova.manager [-] Skipping ComputeManager._sync_power_states, 5 ticks left until next run from (pid=3883) periodic_tasks /opt/stack/nova/nova/manager.py:147
2012-03-15 14:57:51 DEBUG nova.manager [-] Running periodic task ComputeManager._poll_bandwidth_usage from (pid=3883) periodic_tasks /opt/stack/nova/nova/manager

From nova-network log:
nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/rpc/amqp.py", line 252, in _process_data
(nova.rpc.amqp): TRACE: rval = node_func(context=ctxt, **node_args)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/network/manager.py", line 258, in wrapped
(nova.rpc.amqp): TRACE: return func(self, context, *args, **kwargs)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/network/manager.py", line 313, in allocate_for_instance
(nova.rpc.amqp): TRACE: **kwargs)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/network/manager.py", line 258, in wrapped
(nova.rpc.amqp): TRACE: return func(self, context, *args, **kwargs)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/network/manager.py", line 899, in allocate_for_instance
(nova.rpc.amqp): TRACE: requested_networks=requested_networks)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/network/manager.py", line 214, in _allocate_fixed_ips
(nova.rpc.amqp): TRACE: vpn=vpn, address=address)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/network/manager.py", line 1190, in allocate_fixed_ip
(nova.rpc.amqp): TRACE: self._setup_network_on_host(context, network)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/network/manager.py", line 1663, in _setup_network_on_host
(nova.rpc.amqp): TRACE: self.l3driver.initialize_gateway(network)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/network/l3.py", line 98, in initialize_gateway
(nova.rpc.amqp): TRACE: gateway=(network_ref['gateway'] is not None))
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/network/linux_net.py", line 921, in plug
(nova.rpc.amqp): TRACE: return _get_interface_driver().plug(network, mac_address, gateway)
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/network/linux_net.py", line 969, in plug
(nova.rpc.amqp): TRACE: iptables_manager.apply()
(nova.rpc.amqp): TRACE: File "/opt/stack/nova/nova/utils.py", line 901, in inner
(nova.rpc.amqp): TRACE: retval = f(*args, **kwargs)
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/lockfile.py", line 230, in __exit__
(nova.rpc.amqp): TRACE: self.release()
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/lockfile.py", line 273, in release
(nova.rpc.amqp): TRACE: raise NotMyLock
(nova.rpc.amqp): TRACE: NotMyLock
(nova.rpc.amqp): TRACE:
2012-03-15 14:56:25 ERROR nova.rpc.amqp [-] Returning exception to caller
2012-03-15 14:56:25 ERROR nova.rpc.amqp [-] ['Traceback (most recent call last):\n', ' File "/opt/stack/nova/nova/rpc/amqp.py", line 252, in _process_data\n rval = node_func(context=ctxt, **node_args)\n', ' File "/opt/stack/nova/nova/network/manager.py", line 258, in wrapped\n return func(self, context, *args, **kwargs)\n', ' File "/opt/stack/nova/nova/network/manager.py", line 313, in allocate_for_instance\n **kwargs)\n', ' File "/opt/stack/nova/nova/network/manager.py", line 258, in wrapped\n return func(self, context, *args, **kwargs)\n', ' File "/opt/stack/nova/nova/network/manager.py", line 899, in allocate_for_instance\n requested_networks=requested_networks)\n', ' File "/opt/stack/nova/nova/network/manager.py", line 214, in _allocate_fixed_ips\n vpn=vpn, address=address)\n', ' File "/opt/stack/nova/nova/network/manager.py", line 1190, in allocate_fixed_ip\n self._setup_network_on_host(context, network)\n', ' File "/opt/stack/nova/nova/network/manager.py", line 1663, in _setup_network_on_host\n self.l3driver.initialize_gateway(network)\n', ' File "/opt/stack/nova/nova/network/l3.py", line 98, in initialize_gateway\n gateway=(network_ref[\'gateway\'] is not None))\n', ' File "/opt/stack/nova/nova/network/linux_net.py", line 921, in plug\n return _get_interface_driver().plug(network, mac_address, gateway)\n', ' File "/opt/stack/nova/nova/network/linux_net.py", line 969, in plug\n iptables_manager.apply()\n', ' File "/opt/stack/nova/nova/utils.py", line 901, in inner\n retval = f(*args, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/lockfile.py", line 230, in __exit__\n self.release()\n', ' File "/usr/lib/python2.7/dist-packages/lockfile.py", line 273, in release\n raise NotMyLock\n', 'NotMyLock\n']
2012-03-15 14:56:25 DEBUG nova.utils [-] Got semaphore "iptables" for method "apply"... from (pid=4342) inner /opt/stack/nova/nova/utils.py:

Revision history for this message
Patrick Hetu (patrick-hetu) wrote :

After restarting nova-compute, I'm having the same error:

(nova.rpc.amqp): TRACE: Traceback (most recent call last):
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 252, in _process_data
(nova.rpc.amqp): TRACE: rval = node_func(context=ctxt, **node_args)
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
(nova.rpc.amqp): TRACE: return f(*args, **kw)
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 177, in decorated_function
(nova.rpc.amqp): TRACE: sys.exc_info())
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
(nova.rpc.amqp): TRACE: self.gen.next()
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 171, in decorated_function
(nova.rpc.amqp): TRACE: return function(self, context, instance_uuid, *args, **kwargs)
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 648, in run_instance
(nova.rpc.amqp): TRACE: self._run_instance(context, instance_uuid, **kwargs)
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 451, in _run_instance
(nova.rpc.amqp): TRACE: self._set_instance_error_state(context, instance_uuid)
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
(nova.rpc.amqp): TRACE: self.gen.next()
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 424, in _run_instance
(nova.rpc.amqp): TRACE: requested_networks)
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 559, in _allocate_network
(nova.rpc.amqp): TRACE: requested_networks=requested_networks)
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/network/api.py", line 170, in allocate_for_instance
(nova.rpc.amqp): TRACE: 'args': args})
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/rpc/__init__.py", line 68, in call
(nova.rpc.amqp): TRACE: return _get_impl().call(context, topic, msg, timeout)
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 674, in call
(nova.rpc.amqp): TRACE: return rpc_amqp.call(context, topic, msg, timeout, Connection.pool)
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 338, in call
(nova.rpc.amqp): TRACE: rv = list(rv)
(nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 306, in __iter__
(nova.rpc.amqp): TRACE: raise result
(nova.rpc.amqp): TRACE: RemoteError: Remote error: NotMyLock

Changed in nova:
importance: Undecided → High
status: New → Triaged
milestone: none → essex-rc1
Revision history for this message
Travis Rhoden (trhoden) wrote :

I'm running into the same issue. With nova-compute on one node, and nova-network on a different one. Exact same log entries.

Revision history for this message
Chris Behrens (cbehrens) wrote :

Something is removing the lock file early. If the lock file was removed out from under whoever locked it... then NotMyLock is raised.

Interesting.

Changed in nova:
assignee: nobody → Anthony Young (sleepsonthefloor)
Revision history for this message
Vish Ishaya (vishvananda) wrote :

Ok found out why this is raising. Python-lockfile does not support multiple filelocks for the same thread:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=632857

Changed in nova:
assignee: Anthony Young (sleepsonthefloor) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/5462

Changed in nova:
assignee: nobody → Vish Ishaya (vishvananda)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/5462
Committed: http://github.com/openstack/nova/commit/eb42e7fcd7bb67ab951c9bc6c80a78cd23011458
Submitter: Jenkins
Branch: master

commit eb42e7fcd7bb67ab951c9bc6c80a78cd23011458
Author: Vishvananda Ishaya <email address hidden>
Date: Fri Mar 16 13:25:05 2012 -0700

    Workaround issue with greenthreads and lockfiles

     * Adds a GreenLockFile that always works in greenthreads
     * Adds test to verify that regular Lockfile is broken
     * Adds test to verify that GreenLockfile works
     * Adds note about limitation of external locks
     * Adds test showing limitation of nested locks
     * Fixes bug 956313

    Change-Id: I11cd1206611aa4862dadd2fcc077c4c2e0f798f6

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: essex-rc1 → 2012.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.