Bug #1316773 “Tempest failures on NodeLocked/Conflict errors nee...” : Bugs : Ironic

Revision history for this message

Dmitry Tantsur (divius) wrote on 2014-06-03:

#1

I can confirm this problem (with exception that error now is Conflict). This is something that should be fixed by https://review.openstack.org/#/c/94923/ in the long term and may need a workaround either in Tempest or in client for now (just retrying should be enough).

summary:	- tempest failures on NodeLocked errors + Tempest failures on NodeLocked errors needs workaround
Changed in ironic:
status:	New → Triaged
importance:	Undecided → Medium
tags:	added: low-hanging-fruit
summary:	- Tempest failures on NodeLocked errors needs workaround + Tempest failures on NodeLocked/Conflict errors need workaround

aeva black (tenbrae) on 2014-06-11

Changed in ironic:
importance:	Medium → High

David Shrewsbury (dshrews) on 2014-06-11

Changed in ironic:
assignee:	nobody → David Shrewsbury (dshrews)

David Shrewsbury (dshrews) on 2014-06-17

Changed in ironic:
assignee:	David Shrewsbury (dshrews) → nobody

aeva black (tenbrae) on 2014-07-08

Changed in ironic:
milestone:	none → juno-3

Revision history for this message

aeva black (tenbrae) wrote on 2014-07-23:

#2

Addressed by the following patch.

commit 428c24767575419d5b472b52f2b622588a3a7253
Author: David Shrewsbury <email address hidden>
Date: Thu Jul 17 09:41:00 2014 -0400

Implement retry on NodeLocked exceptions

    This adds the capability to retry node locking when a node is already
    locked. Two conductor configuration variables are added to control this
    behavior, which can be totally disabled by setting the value for the
    node_locked_retry_attempts variable to zero.

    A new unit test is added to check that the retry will work after first
    getting a NodeLocked exception, then succeeding getting the lock on the
    next lock attempt.

Change-Id: I60b099beea1bc3a954a5ab4699e623aaa71ba6c5
Blueprint: add-nodelocked-retry

Changed in ironic:
assignee:	nobody → David Shrewsbury (dshrews)
status:	Triaged → Fix Committed
milestone:	juno-3 → juno-2

Russell Bryant (russellb) on 2014-07-25

Changed in ironic:
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2014-10-16

Changed in ironic:
milestone:	juno-2 → 2014.2

Revision history for this message

Li Shang (li-shang) wrote on 2014-11-24:

#3

seems this issue still occurred on Juno 2014.2, my environment is stable/juno. when i try to delete one ironic node, ironic-api reports related error log, and wait for a long time the node still can`t be deleted:

$ ironic node-delete 3645aead-1f74-418e-9e51-1cc6c31d3af4
'unicode' object has no attribute 'get'

2014-11-21 18:39:06.637 DEBUG oslo.messaging._drivers.amqp [-] UNIQUE_ID is e838df558ddc49fd9e21c61b8b5c9f36. from (pid=5382) _add_unique_id /usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqp.py:246
2014-11-21 18:39:08.695 WARNING wsme.api [-] Client-side error: Node 3645aead-1f74-418e-9e51-1cc6c31d3af4 is locked by host localhost, please retry after the current operation is completed.
Traceback (most recent call last):

File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/server.py", line 139, in inner
return func(*args, **kwargs)

File "/opt/stack/ironic/ironic/conductor/manager.py", line 1053, in destroy_node
with task_manager.acquire(context, node_id) as task:

File "/opt/stack/ironic/ironic/conductor/task_manager.py", line 132, in acquire
driver_name=driver_name)

File "/opt/stack/ironic/ironic/conductor/task_manager.py", line 192, in __init__
self.release_resources()

File "/usr/local/lib/python2.7/dist-packages/oslo/utils/excutils.py", line 82, in __exit__
six.reraise(self.type_, self.value, self.tb)

File "/opt/stack/ironic/ironic/conductor/task_manager.py", line 184, in __init__
reserve_node()

File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 68, in wrapped_f
return Retrying(*dargs, **dkw).call(f, *args, **kw)

File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 229, in call
raise attempt.get()

File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 261, in get
six.reraise(self.value[0], self.value[1], self.value[2])

File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 217, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

File "/opt/stack/ironic/ironic/conductor/task_manager.py", line 180, in reserve_node
self.node = objects.Node.reserve(context, CONF.host, node_id)

File "/opt/stack/ironic/ironic/objects/base.py", line 109, in wrapper
result = fn(cls, context, *args, **kwargs)

File "/opt/stack/ironic/ironic/objects/node.py", line 165, in reserve
db_node = cls.dbapi.reserve_node(tag, node_id)

File "/opt/stack/ironic/ironic/db/sqlalchemy/api.py", line 229, in reserve_node
host=node['reservation'])

NodeLocked: Node 3645aead-1f74-418e-9e51-1cc6c31d3af4 is locked by host localhost, please retry after the current operation is completed.

seems this issue still occurred on Juno 2014.2, my environment is stable/juno. when i try to delete one ironic node, ironic-api reports related error log, and wait for a long time the node still can`t be deleted:

$ ironic node-delete 3645aead-1f74-418e-9e51-1cc6c31d3af4
'unicode' object has no attribute 'get'

2014-11-21 18:39:06.637 DEBUG oslo.messaging._drivers.amqp [-] UNIQUE_ID is e838df558ddc49fd9e21c61b8b5c9f36. from (pid=5382) _add_unique_id /usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqp.py:246
2014-11-21 18:39:08.695 WARNING wsme.api [-] Client-side error: Node 3645aead-1f74-418e-9e51-1cc6c31d3af4 is locked by host localhost, please retry after the current operation is completed.
Traceback (most recent call last):

File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/server.py", line 139, in inner
    return func(*args, **kwargs)

File "/opt/stack/ironic/ironic/conductor/manager.py", line 1053, in destroy_node
    with task_manager.acquire(context, node_id) as task:

File "/opt/stack/ironic/ironic/conductor/task_manager.py", line 132, in acquire
    driver_name=driver_name)

File "/opt/stack/ironic/ironic/conductor/task_manager.py", line 192, in __init__
    self.release_resources()

File "/usr/local/lib/python2.7/dist-packages/oslo/utils/excutils.py", line 82, in __exit__
    six.reraise(self.type_, self.value, self.tb)

File "/opt/stack/ironic/ironic/conductor/task_manager.py", line 184, in __init__
    reserve_node()

File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 68, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)

File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 229, in call
    raise attempt.get()

File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 261, in get
    six.reraise(self.value[0], self.value[1], self.value[2])

File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 217, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

File "/opt/stack/ironic/ironic/conductor/task_manager.py", line 180, in reserve_node
    self.node = objects.Node.reserve(context, CONF.host, node_id)

File "/opt/stack/ironic/ironic/objects/base.py", line 109, in wrapper
    result = fn(cls, context, *args, **kwargs)

File "/opt/stack/ironic/ironic/objects/node.py", line 165, in reserve
    db_node = cls.dbapi.reserve_node(tag, node_id)

File "/opt/stack/ironic/ironic/db/sqlalchemy/api.py", line 229, in reserve_node
    host=node['reservation'])

NodeLocked: Node 3645aead-1f74-418e-9e51-1cc6c31d3af4 is locked by host localhost, please retry after the current operation is completed.

Ironic

Tempest failures on NodeLocked/Conflict errors need workaround

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches