quantum-server hung up it's listening port

Bug #1189385 reported by Robert Collins
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Expired
Undecided
Unassigned

Bug Description

After running for a week or so under load quantum-server hung up it's listening socket, but it still had 173 other sockets open. This naturally caused everything to grind to a halt.

Tags: neutron-core
Revision history for this message
Robert Collins (lifeless) wrote :

This repeated itself, correlated with a request to delete 30 or so vm's [but no evidence of causation at this stage].

Revision history for this message
Mark McClain (markmcclain) wrote :

Any errors in the log file that might indicate why the listening thread has died?

Changed in quantum:
status: New → Incomplete
tags: added: quantum-core
Revision history for this message
Jack McCann (jack-mccann) wrote :
Download full text (4.0 KiB)

We've seen this a few times but have not gotten to the root cause. When we've seen this, it appears to be related to an AMQPConnectionException (trace below).

2013-04-26 18:32:00 ERROR [quantum.api.v2.resource] create failed
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/quantum/api/v2/resource.py", line 82, in resource
result = method(request=request, **args)
File "/usr/lib/python2.7/dist-packages/quantum/api/v2/base.py", line 370, in create
return notify({self._resource: self._view(obj)})
File "/usr/lib/python2.7/dist-packages/quantum/api/v2/base.py", line 350, in notify
notifier_method)
File "/usr/lib/python2.7/dist-packages/quantum/api/v2/base.py", line 234, in _send_dhcp_notification
self._dhcp_agent_notifier.notify(context, data, methodname)
File "/usr/lib/python2.7/dist-packages/quantum/api/rpc/agentnotifiers/dhcp_rpc_agent_api.py", line 131, in notify
self._notification(context, methodname, data, network_id)
File "/usr/lib/python2.7/dist-packages/quantum/api/rpc/agentnotifiers/dhcp_rpc_agent_api.py", line 82, in _notification
topic='%s.%s' % (topic, host))
File "/usr/lib/python2.7/dist-packages/quantum/openstack/common/rpc/proxy.py", line 113, in cast
rpc.cast(context, self._get_topic(topic), msg)
File "/usr/lib/python2.7/dist-packages/quantum/openstack/common/rpc/_init_.py", line 158, in cast
return _get_impl().cast(CONF, context, topic, msg)
File "/usr/lib/python2.7/dist-packages/quantum/openstack/common/rpc/impl_kombu.py", line 805, in cast
rpc_amqp.get_connection_pool(conf, Connection))
File "/usr/lib/python2.7/dist-packages/quantum/openstack/common/rpc/amqp.py", line 625, in cast
conn.topic_send(topic, rpc_common.serialize_msg(msg))
File "/usr/lib/python2.7/dist-packages/quantum/openstack/common/rpc/amqp.py", line 152, in _exit_
self._done()
File "/usr/lib/python2.7/dist-packages/quantum/openstack/common/rpc/amqp.py", line 141, in _done
self.connection.reset()
File "/usr/lib/python2.7/dist-packages/quantum/openstack/common/rpc/impl_kombu.py", line 596, in reset
self.channel = self.connection.channel()
File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 159, in channel
chan = self.transport.create_channel(self.connection)
File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 235, in create_channel
return connection.channel()
File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 144, in channel
return Channel(self, channel_id)
File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 183, in _init_
super(Channel, self)._init_(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 82, in _init_
self._x_open()
File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 471, in _x_open
(20, 11), # Channel.open_ok
File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 95, in wait
self.channel_id, allowed_methods)
File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/connection.py", line 231, in _wait_method
self.wait()
File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait
return self.dispatch_method(method_...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote :

Haven't seen that COMMAND_INVALID thing in the tripleo grizzly POC.
Here are the ERROR logs we've seen: http://paste.ubuntu.com/5756528/

Changed in quantum:
status: Incomplete → New
tags: added: neutron-core
removed: quantum-core
Changed in neutron:
assignee: nobody → Mark McClain (markmcclain)
Revision history for this message
Robert Collins (lifeless) wrote :

We're dropping this from tripleo as we haven't seen it in a while; I believe we've given all the info asked for - and if it happens again we'll certainly update the bug.

no longer affects: tripleo
Revision history for this message
Mark McClain (markmcclain) wrote :

Setting to incomplete unless there are easy steps to replicate this problem.

Changed in neutron:
status: New → Incomplete
Changed in neutron:
assignee: Mark McClain (markmcclain) → nobody
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

I've seen this behavior when there is low disk space. It seems that neutron server is tuck on amqp connection when this happens. HTTP connections to neutron are possible, but neutron doesn't respond until space is freed and amqp continues to function.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Also, it's not specific to neutron, agents and some other services seem to stuck as well when the disk space is low (<1 GB)

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.