Bug #1330955 “Lock wait timeout exceeded while updating status f...” : Bugs : neutron

Eugene Nikanorov (enikanorov) on 2014-06-17

tags:	added: db
Changed in neutron:
importance:	Undecided → Medium

Salvatore Orlando (salvatore-orlando) on 2014-06-17

Changed in neutron:
assignee:	nobody → Salvatore Orlando (salvatore-orlando)

Revision history for this message

Salvatore Orlando (salvatore-orlando) wrote on 2014-06-17:

#1

One interesting thing about this bug is a potential situation of nested locking.

- delete_port will acquire a lock on a port resource-
- this routine will call disassociate_floating_ips which will probably acquire a lock on a floating IP resource

If the lock on the floating ip is held by some other thread (eg: an update fip status operation), then everything should be fine as soon as that lock is released. However we need to rule out something like the following might happen:

- thread A: update floating IP X, update Port Y
- thread B: delete port Y associated with floating IP X

thread A acquires floating IP X lock
thread B acquires delete port Y lock
thread A wait for Y lock held by thread B, thread B waits for X lock held by thread B --- deadlock!!!

And since we don't have any deadlock detection resolution mechanism - hell will ensue.

Bottom line is: let's go on with this pattern of doing resource-level locks but let's not get carried by it. Let's keep in mind this is a workaround for an eventlet issue, and not a 'final' solution.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-18: Fix proposed to neutron (master)

#2

Fix proposed to branch: master
Review: https://review.openstack.org/100724

Changed in neutron:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-18: Change abandoned on neutron (master)

#3

Change abandoned by Salvatore Orlando (<email address hidden>) on branch: master
Review: https://review.openstack.org/100724
Reason: w/e

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-18: Fix proposed to neutron (master)

#4

Fix proposed to branch: master
Review: https://review.openstack.org/100934

Changed in neutron:
assignee:	Salvatore Orlando (salvatore-orlando) → Ihar Hrachyshka (ihar-hrachyshka)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-19: Related fix proposed to neutron (master)

#5

Related fix proposed to branch: master
Review: https://review.openstack.org/101218

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-19:

#6

Related fix proposed to branch: master
Review: https://review.openstack.org/101219

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-23: Change abandoned on neutron (master)

#7

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: master
Review: https://review.openstack.org/101218
Reason: We can't expect the same behaviour retained long term, so let's avoid the patch to reduce work later in the future when we remove lots of notification related hacks, including the one that returns routers set() from disassociate_floatingips().

Kyle Mestery (mestery) on 2014-06-30

Changed in neutron:
milestone:	none → juno-2

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-08: Fix merged to neutron (master)

#8

Reviewed: https://review.openstack.org/100934
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=876c2c25e15806529f4e197cd265bc3b2184aa14
Submitter: Jenkins
Branch: master

commit 876c2c25e15806529f4e197cd265bc3b2184aa14
Author: Ihar Hrachyshka <email address hidden>
Date: Wed Jun 18 16:56:25 2014 +0200

Avoid notifying while inside transaction opened in delete_port()

    delete_port() calls to disassociate_floatingips() while in transaction.
    The latter method sends RPC notification which may result in eventlet
    yield. If yield switches a thread to another one that tries to access
    the same floating IP object in db as disassociate_floatingips() method
    does, we're locked and get db timeout.

We should avoid calling to notifier while under transaction.

    To achieve this, I introduce a do_notify argument that controls whether
    notification is done by disassociate_floatingips() itself or delegated
    to caller. Callers that call to disassociate_floatingips() from under
    transactions should handle notifications on their own. For this,
    disassociate_floatingips() returns a set of routers that require
    notification.

Updated drivers to reflect new behaviour. Added unit test.

Change-Id: I2411f2aa778ea088be416d062c4816c16f49d2bf
Closes-Bug: 1330955

Changed in neutron:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-08: Fix proposed to neutron (stable/icehouse)

#9

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/105392

Russell Bryant (russellb) on 2014-07-23

Changed in neutron:
status:	Fix Committed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-25: Related fix merged to neutron (master)

#10

Reviewed: https://review.openstack.org/101219
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=eafebec2d3f6d0bd0ecc6e6d7c4f1ae421a44dfe
Submitter: Jenkins
Branch: master

commit eafebec2d3f6d0bd0ecc6e6d7c4f1ae421a44dfe
Author: Ihar Hrachyshka <email address hidden>
Date: Thu Jun 19 13:58:48 2014 +0200

VMWare: don't notify on disassociate_floatingips()

L3 agent notifications don't make sense for NSX VMWare plugin since
there is no L3 agent in such setup, so disabling them here.

Updated a unit test to check that notification is indeed not requested.

Change-Id: I9c7c32d02d466098d22df8f10448361c3d99174c
Related-Bug: 1330955

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-25: Related fix proposed to neutron (stable/icehouse)

#11

Related fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/109607

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-08-05: Fix merged to neutron (stable/icehouse)

#12

Reviewed: https://review.openstack.org/105392
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9c94d960310811b428abd0a48f37cf35a2e96940
Submitter: Jenkins
Branch: stable/icehouse

commit 9c94d960310811b428abd0a48f37cf35a2e96940
Author: Ihar Hrachyshka <email address hidden>
Date: Wed Jun 18 16:56:25 2014 +0200

Avoid notifying while inside transaction opened in delete_port()

    delete_port() calls to disassociate_floatingips() while in transaction.
    The latter method sends RPC notification which may result in eventlet
    yield. If yield switches a thread to another one that tries to access
    the same floating IP object in db as disassociate_floatingips() method
    does, we're locked and get db timeout.

We should avoid calling to notifier while under transaction.

    To achieve this, I introduce a do_notify argument that controls whether
    notification is done by disassociate_floatingips() itself or delegated
    to caller. Callers that call to disassociate_floatingips() from under
    transactions should handle notifications on their own. For this,
    disassociate_floatingips() returns a set of routers that require
    notification.

Updated drivers to reflect new behaviour. Added unit test.

    Conflicts:
     neutron/db/l3_db.py
     neutron/plugins/bigswitch/plugin.py
     neutron/plugins/nuage/plugin.py

    Change-Id: I2411f2aa778ea088be416d062c4816c16f49d2bf
    Closes-Bug: 1330955
    (cherry picked from commit 876c2c25e15806529f4e197cd265bc3b2184aa14)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-08-12: Related fix merged to neutron (stable/icehouse)

#13

Reviewed: https://review.openstack.org/109607
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4a0210ebc87fe716aa2aa441bc48ee2908bb2a2a
Submitter: Jenkins
Branch: stable/icehouse

commit 4a0210ebc87fe716aa2aa441bc48ee2908bb2a2a
Author: Ihar Hrachyshka <email address hidden>
Date: Thu Jun 19 13:58:48 2014 +0200

VMWare: don't notify on disassociate_floatingips()

L3 agent notifications don't make sense for NSX VMWare plugin since
there is no L3 agent in such setup, so disabling them here.

Updated a unit test to check that notification is indeed not requested.

Conflicts:
neutron/tests/unit/vmware/test_nsx_plugin.py

    Change-Id: I9c7c32d02d466098d22df8f10448361c3d99174c
    Related-Bug: 1330955
    (cherry picked from commit eafebec2d3f6d0bd0ecc6e6d7c4f1ae421a44dfe)

tags:

added: in-stable-icehouse

Thierry Carrez (ttx) on 2014-10-16

Changed in neutron:
milestone:	juno-2 → 2014.2

Affects		Status	Importance	Assigned to	Milestone
	neutron	Fix Released	Medium	Ihar Hrachyshka	neutron 2014.2 "juno"
	Icehouse	Fix Released	Medium	Ihar Hrachyshka	neutron 2014.1.3

neutron

Lock wait timeout exceeded while updating status for floatingips

Bug Description

Other bug subscribers

Remote bug watches