Timeout trying to delete overcloud stack in CI

Bug #1534213 reported by Steven Hardy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Expired
Undecided
Unassigned

Bug Description

Seen here:

http://logs.openstack.org/13/260413/6/check-tripleo/gate-tripleo-ci-f22-ha/fe2b77a/console.html

2016-01-14 14:23:58.898 | #################
2016-01-14 14:23:58.898 | tripleo.sh -- Overcloud delete
2016-01-14 14:23:58.898 | #################
2016-01-14 14:24:02.016 | +--------------------------------------+------------+-----------------+---------------------+--------------+
2016-01-14 14:24:02.016 | | id | stack_name | stack_status | creation_time | updated_time |
2016-01-14 14:24:02.017 | +--------------------------------------+------------+-----------------+---------------------+--------------+
2016-01-14 14:24:02.017 | | 9a8a8a20-da8e-4de7-b9af-71e48e567aa5 | overcloud | CREATE_COMPLETE | 2016-01-14T13:17:29 | None |
2016-01-14 14:24:02.017 | +--------------------------------------+------------+-----------------+---------------------+--------------+
2016-01-14 14:29:19.168 | #################
2016-01-14 14:29:19.168 | tripleo.sh -- Overcloud 9a8a8a20-da8e-4de7-b9af-71e48e567aa5 delete failed or timed out:
2016-01-14 14:29:19.168 | #################
2016-01-14 14:29:19.168 | /tmp/tripleo.sh: line 447: Timing: command not found

The "Timing: command not found" is odd, it's not coming from tripleo.sh, but we can't see where it actually came from (we just do a heat stack-show after the delete timeout in tripleo.sh).

Looking later in the logs we see:

2016-01-14 14:29:36.653 | | stack_status | DELETE_IN_PROGRESS

And the events show:

2016-01-14 14:31:44.207 | | Controller | c54bc713-f12f-41a0-a0dc-00e6ed606c00 | state changed | DELETE_IN_PROGRESS | 2016-01-14T14:28:01 |

which is the last event, relating to this resource, which I think is an OS::Nova::Server resource (we don't list resources recursively in the CI error path, so it's hard to be sure, we should fix that).

Looking at the heat logs:
2016-01-14 14:24:01.861 16286 INFO heat.engine.stack [-] Stack DELETE IN_PROGRESS (overcloud): Stack DELETE started
...
2016-01-14 14:35:00.891 16286 INFO heat.engine.stack [-] Stack DELETE COMPLETE (overcloud): Stack DELETE completed successfully

It appears the delete did actually work, but it took too long as the job timed out.

11 minutes seems excessive - my local 2 node stacks delete in about 40 seconds, so we may have an issue to diagnose specific to the HA job/stack here.

Revision history for this message
Emilien Macchi (emilienm) wrote :

This bug is > 365 days without activity. We are unsetting assignee and milestone and setting status to Incomplete in order to allow its expiry in 60 days.

If the bug is still valid, then update the bug status.

Changed in tripleo:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for tripleo because there has been no activity for 60 days.]

Changed in tripleo:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.