heat-tempest-xx already has an action (DELETE) in progress

Bug #1261433 reported by Sahid Orentino
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
High
Jason Dunsmore

Bug Description

Hello,

Jenkins seems to fail sometime in the check 'check-tempest-dsvm-postgres-full' because a race condition:

2013-12-16 14:46:17.813 | Checking logs...
2013-12-16 14:46:20.831 | Log File: h-eng
2013-12-16 14:46:20.833 | 2013-12-16 14:41:19.669 25868 ERROR heat.openstack.common.rpc.common [req-69ba92a4-4129-401e-93bb-170f7d984475 None] Returning exception Stack heat-tempest-1152838063 already has an action (DELETE) in progress. to caller
2013-12-16 14:46:20.835 |
2013-12-16 14:46:20.836 | 2013-12-16 14:41:19.669 25868 ERROR heat.openstack.common.rpc.common [req-69ba92a4-4129-401e-93bb-170f7d984475 None] ['Traceback (most recent call last):\n', ' File "/opt/stack/new/heat/heat/openstack/common/rpc/amqp.py", line 462, in _process_data\n **args)\n', ' File "/opt/stack/new/heat/heat/openstack/common/rpc/dispatcher.py", line 172, in dispatch\n result = getattr(proxyobj, method)(ctxt, **kwargs)\n', ' File "/opt/stack/new/heat/heat/engine/service.py", line 62, in wrapped\n return func(self, ctx, *args, **kwargs)\n', ' File "/opt/stack/new/heat/heat/engine/service.py", line 511, in delete_stack\n self._start_thread_with_lock(cnxt, stack, stack.delete)\n', ' File "/opt/stack/new/heat/heat/engine/service.py", line 125, in _start_thread_with_lock\n lock.acquire()\n', ' File "/opt/stack/new/heat/heat/engine/stack_lock.py", line 61, in acquire\n action=self.stack.action)\n', 'ActionInProgress: Stack heat-tempest-1152838063 already has an action (DELETE) in progress.\n']
2013-12-16 14:46:20.837 |
2013-12-16 14:46:23.295 | Log File: h-api
2013-12-16 14:46:23.297 | 2013-12-16 14:41:19.672 25874 ERROR root [-] Unexpected error occurred serving API: Stack heat-tempest-1152838063 already has an action (DELETE) in progress.
2013-12-16 14:46:23.300 |
2013-12-16 14:46:23.683 | Logs have errors
2013-12-16 14:46:23.712 | FAILED

http://logs.openstack.org/17/61717/4/check/check-tempest-dsvm-postgres-full/3b776c3/

Jeremy Stanley (fungi)
affects: openstack-ci → heat
Changed in heat:
assignee: nobody → Jason Dunsmore (jasondunsmore)
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Currently any user-triggered exception is logged as an ERROR in amqp.py.

We should be decorating our service methods to wrap these exceptions in an ClientException

Changed in heat:
importance: Undecided → High
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/62703

Changed in heat:
status: Triaged → In Progress
Changed in heat:
milestone: none → icehouse-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/62703
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=c6bc01e658dd898c1a39f04d032e9c44f258b7e6
Submitter: Jenkins
Branch: master

commit c6bc01e658dd898c1a39f04d032e9c44f258b7e6
Author: Jason Dunsmore <email address hidden>
Date: Mon Dec 16 16:32:01 2013 -0600

    Prevent tempest from failing upon ActionInProgress

    Jenkins occasionally fails in the 'check-tempest-dsvm-postgres-full'
    test. The tempest tests sometimes try to delete the same stack twice
    and an ActionInProgress exception is raised. This results in ERROR
    messages in the log file, which causes the tempest test to fail.

    The fix is to decorate the StackLock.acquire method with
    client_exceptions, which "allows the declaration of expected exceptions
    that the RPC layer should not consider fatal, and not log as if they
    were generated in a real error scenario."

    Closes-Bug: #1261433
    Change-Id: I0692b9d5cf03501ba532b40abf4e567134cda057

Changed in heat:
status: In Progress → Fix Committed
Revision history for this message
Steven Hardy (shardy) wrote :

This appears to still be broken:

https://review.openstack.org/#/c/62436/

http://logs.openstack.org/36/62436/2/gate/gate-tempest-dsvm-postgres-full/c3efd61/logs/screen-h-eng.txt.gz

If we can't fix this quickly (and the migrations), I think we'll have to revert the stack lock stuff, again, as we don't want to be the new neutron of the gate :(

Changed in heat:
status: Fix Committed → Triaged
Revision history for this message
Russell Bryant (russellb) wrote :
Revision history for this message
Russell Bryant (russellb) wrote :
Revision history for this message
Jason Dunsmore (jasondunsmore) wrote :

The API error logging is fixed by: https://review.openstack.org/#/c/62080

Joe Gordon (jogo)
Changed in heat:
status: Triaged → Fix Committed
Thierry Carrez (ttx)
Changed in heat:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in heat:
milestone: icehouse-2 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.