OpenStack Compute (nova)

baremetal driver needs a state between "building" and "deploying"

Bug #1184470 reported by aeva black on 2013-05-27

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Ironic	Fix Released	Medium	aeva black	Ironic 2014.1 "icehouse"
	OpenStack Compute (nova)	Fix Released	Medium	Sahid Orentino	OpenStack Compute (nova) 2014.1 "icehouse"

Bug Description

It is not possible to tell from the baremetal node status that a deployment has failed because a machine's BIOS hung or was improperly configured. This would be discernable with an additional state change between BUILDING and DEPLOYING.

Details
=====

During a baremetal deployment, the state is tracked in the nova_bm.bm_nodes table. The state is set to BUILDING when virt/driver/baremetal.py:driver.spawn() acquires the node and begins preparing the deployment. After the power_driver's activate_node() method is called, the PXE driver goes into a wait loop to see when the deployment is done. The state is changed to DEPLOYING when baremetal-deploy-helper receives a connection from the deployment ramdisk, and then either set to DEPLOYDONE or DEPLOYFAIL, accordingly.

There is a middle step which is not currently represented. If the baremetal node powers on but never connects to the deploy-helper, it is impossible to tell from the database whether the deploy environment was not created or whether the machine is dead.

Proposed fix
==========

Add a PREPARED state to baremetal_states.py, and set the node to this state immediately after calling activate_node().

Tags:

aeva black (tenbrae) on 2013-05-27

Changed in nova:
status:	New → Triaged
importance:	Undecided → Medium
tags:	added: baremetal

aeva black (tenbrae) on 2013-05-30

Changed in nova:
milestone:	none → havana-2

Russell Bryant (russellb) on 2013-07-03

Changed in nova:
milestone:	havana-2 → havana-3

Russell Bryant (russellb) on 2013-08-27

Changed in nova:
milestone:	havana-3 → none

Sahid Orentino (sahid-ferdjaoui) on 2013-09-25

Changed in nova:
assignee:	nobody → sahid (sahid-ferdjaoui)

Revision history for this message

Sahid Orentino (sahid-ferdjaoui) wrote on 2013-09-26:

The code is now different, but I think your proposal is always good to add.

What do you think about to add this states at the just after the call:
https://github.com/openstack/nova/blob/master/nova/virt/baremetal/driver.py#L250

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-10-08: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/50348

Changed in nova:
status:	Triaged → In Progress

Sahid Orentino (sahid-ferdjaoui) on 2013-11-12

Changed in nova:
assignee:	sahid (sahid-ferdjaoui) → nobody

Tom Fifield (fifieldt) on 2013-12-06

Changed in nova:
status:	In Progress → Confirmed

Sahid Orentino (sahid-ferdjaoui) on 2013-12-17

Changed in nova:
assignee:	nobody → sahid (sahid-ferdjaoui)

OpenStack Infra (hudson-openstack) on 2013-12-17

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

aeva black (tenbrae) wrote on 2013-12-19:

I took a look at the Ironic PXE driver's handling of this sort of situation, and while I think it's OK and not affected by the precise circumstances described in this bug, I think there may be some similar difficulty in determining why a deploy failed part-way through.

I've tagged the bug as also-affecting and will look into it.

Changed in ironic:
assignee:	nobody → Devananda van der Veen (devananda)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-02: Fix merged to nova (master)

Reviewed: https://review.openstack.org/50348
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ce2d580106dc04315e71790c98afee062f87351b
Submitter: Jenkins
Branch: master

commit ce2d580106dc04315e71790c98afee062f87351b
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Tue Oct 8 13:25:49 2013 +0000

Adds a PREPARED state after baremetal node power on.

    During a baremetal deployment there is a middle step which
    is not currently represented. If the baremetal node powers on
    but never connects to the deploy-helper, it is impossible to tell
    from the database whether the deploy environment was not created
    or whether the machine is dead.

Change-Id: I6be3d45fee28970cbb02945c518be34b2bc74689
Closes-Bug: #1184470

Changed in nova:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-06: Related fix merged to ironic (master)

Reviewed: https://review.openstack.org/63037
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=972855e7314c95a07c8483b33138a7a2de8c371c
Submitter: Jenkins
Branch: master

commit 972855e7314c95a07c8483b33138a7a2de8c371c
Author: Devananda van der Veen <email address hidden>
Date: Wed Dec 18 16:57:58 2013 -0800

Improve error handling in PXE _continue_deploy

    Related to bug 1184470, there was a concern that the PXE driver
    may not be adequately handling errors and informing users when failures
    occur mid-deploy.

This patch refactors the _continue_deploy() method to handle both errors
POSTed from the ramdisk and errors that originate within deploy_utils.

    It also fixes an inconsistency in the final provisioning_state:
    ConductorManager.do_node_deploy() will set provisioning_state = ACTIVE,
    however the PXE driver was leaving nodes with state = DEPLOYDONE.

Change-Id: I29cbff87cbaf85d95687ae094720f8b99f33b65f
Related-bug: 1184470

Russell Bryant (russellb) on 2014-01-13

Changed in nova:
milestone:	none → icehouse-2

Thierry Carrez (ttx) on 2014-01-22

Changed in nova:
status:	Fix Committed → Fix Released

Revision history for this message

aeva black (tenbrae) wrote on 2014-03-21:

Ironic uses a "wait-callback" state as an optional intermediary state if a deploy driver needs to wait for a callback from the node // deploy agent. Closing this bug for Ironic now.

Changed in ironic:
status:	New → Fix Committed
importance:	Undecided → Medium

Thierry Carrez (ttx) on 2014-04-01

Changed in ironic:
milestone:	none → icehouse-rc1
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2014-04-17

Changed in nova:
milestone:	icehouse-2 → 2014.1

Thierry Carrez (ttx) on 2014-04-17

Changed in ironic:
milestone:	icehouse-rc1 → 2014.1

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.