Comment 11 for bug 973609

Revision history for this message
Vish Ishaya (vishvananda) wrote : Re: [Bug 973609] Re: Instances go to error state with RPC timeout

Yes. Should be resolved with the update then. That is good news.
On Apr 5, 2012 6:35 PM, "David Lawson" <email address hidden> wrote:

> I'm assuming you meant /usr/lib/python2.7/dist-packages/nova/utils.py,
> if so it doesn't have a GreenFileLock class. I assume that means we
> need to update and this will be resolved.
>
> --
> You received this bug notification because you are subscribed to
> OpenStack Compute (nova).
> https://bugs.launchpad.net/bugs/973609
>
> Title:
> Instances go to error state with RPC timeout
>
> Status in OpenStack Compute (Nova):
> Incomplete
>
> Bug description:
> We've been seeing a trend of people being unable to start instances,
> investigation yields the following tracebacks in the nova-compute.log
> on the relevant compute nodes:
>
> 2012-04-04 14:21:10 ERROR nova.compute.manager [-] [instance:
> d86678fb-2a29-4b4e-84b4-c1cb2d81a6e2] Instance failed network setup
> (nova.compute.manager): TRACE: Traceback (most recent call last):
> (nova.compute.manager): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 559, in
> _allocate_network
> (nova.compute.manager): TRACE: requested_networks=requested_networks)
> (nova.compute.manager): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/network/api.py", line 170, in
> allocate_for_instance
> (nova.compute.manager): TRACE: 'args': args})
> (nova.compute.manager): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/rpc/__init__.py", line 68, in call
> (nova.compute.manager): TRACE: return _get_impl().call(context,
> topic, msg, timeout)
> (nova.compute.manager): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 674, in call
> (nova.compute.manager): TRACE: return rpc_amqp.call(context, topic,
> msg, timeout, Connection.pool)
> (nova.compute.manager): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 338, in call
> (nova.compute.manager): TRACE: rv = list(rv)
> (nova.compute.manager): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 299, in __iter__
> (nova.compute.manager): TRACE: self._iterator.next()
> (nova.compute.manager): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 572, in
> iterconsume
> (nova.compute.manager): TRACE: yield self.ensure(_error_callback,
> _consume)
> (nova.compute.manager): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 503, in
> ensure
> (nova.compute.manager): TRACE: error_callback(e)
> (nova.compute.manager): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 553, in
> _error_callback
> (nova.compute.manager): TRACE: raise rpc_common.Timeout()
> (nova.compute.manager): TRACE: Timeout: Timeout while waiting on RPC
> response.
> (nova.compute.manager): TRACE:
> 2012-04-04 14:21:10 ERROR nova.rpc.amqp [-] Exception during message
> handling
> (nova.rpc.amqp): TRACE: Traceback (most recent call last):
> (nova.rpc.amqp): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 252, in
> _process_data
> (nova.rpc.amqp): TRACE: rval = node_func(context=ctxt, **node_args)
> (nova.rpc.amqp): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
> (nova.rpc.amqp): TRACE: return f(*args, **kw)
> (nova.rpc.amqp): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 177, in
> decorated_function
> (nova.rpc.amqp): TRACE: sys.exc_info())
> (nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/contextlib.py", line
> 24, in __exit__
> (nova.rpc.amqp): TRACE: self.gen.next()
> (nova.rpc.amqp): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 171, in
> decorated_function
> (nova.rpc.amqp): TRACE: return function(self, context, instance_uuid,
> *args, **kwargs)
> (nova.rpc.amqp): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 648, in
> run_instance
> (nova.rpc.amqp): TRACE: self._run_instance(context, instance_uuid,
> **kwargs)
> (nova.rpc.amqp): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 451, in
> _run_instance
> (nova.rpc.amqp): TRACE: self._set_instance_error_state(context,
> instance_uuid)
> (nova.rpc.amqp): TRACE: File "/usr/lib/python2.7/contextlib.py", line
> 24, in __exit__
> (nova.rpc.amqp): TRACE: self.gen.next()
> (nova.rpc.amqp): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 424, in
> _run_instance
> (nova.rpc.amqp): TRACE: requested_networks)
> (nova.rpc.amqp): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 559, in
> _allocate_network
> (nova.rpc.amqp): TRACE: requested_networks=requested_networks)
> (nova.rpc.amqp): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/network/api.py", line 170, in
> allocate_for_instance
> (nova.rpc.amqp): TRACE: 'args': args})
> (nova.rpc.amqp): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/rpc/__init__.py", line 68, in call
> (nova.rpc.amqp): TRACE: return _get_impl().call(context, topic, msg,
> timeout)
> (nova.rpc.amqp): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 674, in call
> (nova.rpc.amqp): TRACE: return rpc_amqp.call(context, topic, msg,
> timeout, Connection.pool)
> (nova.rpc.amqp): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 338, in call
> (nova.rpc.amqp): TRACE: rv = list(rv)
> (nova.rpc.amqp): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 299, in __iter__
> (nova.rpc.amqp): TRACE: self._iterator.next()
> (nova.rpc.amqp): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 572, in
> iterconsume
> (nova.rpc.amqp): TRACE: yield self.ensure(_error_callback, _consume)
> (nova.rpc.amqp): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 503, in
> ensure
> (nova.rpc.amqp): TRACE: error_callback(e)
> (nova.rpc.amqp): TRACE: File
> "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 553, in
> _error_callback
> (nova.rpc.amqp): TRACE: raise rpc_common.Timeout()
> (nova.rpc.amqp): TRACE: Timeout: Timeout while waiting on RPC response.
> (nova.rpc.amqp): TRACE:
>
> Then the instance is destroyed because it couldn't come up cleanly.
> Restarting nova-network on the network manager, then nova-compute on
> the compute node seems to fix this for a time, but it recurs after a
> few hours. Is there further debugging information we can provide? I
> haven't found log messages that appear related in nova-network or the
> rabbitmq logs.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/973609/+subscriptions
>