regression in network performance causing large-ops jobs failures

Bug #1492249 reported by Sean Dague
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
High
Unassigned

Bug Description

We're getting a pretty increased failure rate with large-ops jobs in Nova, and it looks like it's around message timeouts in tearing down networks. This very distinctly started showing up this week in the race to merge code.

Example failure: http://logs.openstack.org/29/220229/3/check/gate-tempest-dsvm-large-ops/4b9bd8f/logs/screen-n-cpu-1.txt.gz#_2015-09-04_10_47_08_361

2015-09-04 10:47:08.226 ERROR nova.compute.manager [req-bd86a3c8-121b-43c9-ac02-c1d3dcdb2136 tempest-TestLargeOpsScenario-1437588360 tempest-TestLargeOpsScenario-2081713896] [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] Failed to deallocate network for instance.
2015-09-04 10:47:08.329 DEBUG nova.compute.manager [req-08f02702-32ba-44f2-aef3-269a0eff3550 tempest-TestLargeOpsScenario-1437588360 tempest-TestLargeOpsScenario-2081713896] [instance: 2999f5a3-746b-4c7a-9dcd-0f82c031f540] terminating bdm BlockDeviceMapping(boot_index=0,connection_info=None,created_at=2015-09-04T10:42:57Z,delete_on_termination=True,deleted=False,deleted_at=None,destination_type='local',device_name='/dev/sda',device_type='disk',disk_bus=None,guest_format=None,id=281,image_id='4850537c-18cc-438b-b154-1aafb3c36ec8',instance=<?>,instance_uuid=2999f5a3-746b-4c7a-9dcd-0f82c031f540,no_device=False,snapshot_id=None,source_type='image',updated_at=2015-09-04T10:43:54Z,volume_id=None,volume_size=None) _cleanup_volumes /opt/stack/new/nova/nova/compute/manager.py:2299
2015-09-04 10:47:08.361 ERROR nova.compute.manager [req-bd86a3c8-121b-43c9-ac02-c1d3dcdb2136 tempest-TestLargeOpsScenario-1437588360 tempest-TestLargeOpsScenario-2081713896] [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] Setting instance vm_state to ERROR
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] Traceback (most recent call last):
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2396, in do_terminate_instance
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] self._delete_instance(context, instance, bdms, quotas)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/hooks.py", line 149, in inner
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] rv = f(*args, **kwargs)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2375, in _delete_instance
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] quotas.rollback()
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] six.reraise(self.type_, self.value, self.tb)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2338, in _delete_instance
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] self._shutdown_instance(context, instance, bdms)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2265, in _shutdown_instance
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] self._try_deallocate_network(context, instance, requested_networks)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2194, in _try_deallocate_network
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] self._set_instance_obj_error_state(context, instance)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] six.reraise(self.type_, self.value, self.tb)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2189, in _try_deallocate_network
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] self._deallocate_network(context, instance, requested_networks)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 1812, in _deallocate_network
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] context, instance, requested_networks=requested_networks)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/network/api.py", line 49, in wrapped
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] return func(self, context, *args, **kwargs)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/network/api.py", line 297, in deallocate_for_instance
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] requested_networks=requested_networks)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/network/rpcapi.py", line 175, in deallocate_for_instance
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] return cctxt.call(ctxt, 'deallocate_for_instance', **kwargs)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] retry=self.retry)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] timeout=timeout, retry=retry)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 431, in send
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] retry=retry)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 420, in _send
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] result = self._waiter.wait(msg_id, timeout)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 318, in wait
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] message = self.waiters.get(msg_id, timeout=timeout)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 223, in get
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] 'to message ID %s' % msg_id)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] MessagingTimeout: Timed out waiting for a reply to message ID 4ce92e0a720b4f7fb1b26e1af1eff6ae
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]

Tags: gate
Sean Dague (sdague)
Changed in nova:
importance: Undecided → High
status: New → Confirmed
tags: added: gate
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.