Lock wait timeout exceeded while updating status for floatingips

Bug #1330955 reported by Ihar Hrachyshka
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Ihar Hrachyshka
Icehouse
Fix Released
Medium
Ihar Hrachyshka

Bug Description

Lock timeout occurred when updating floating IP.

2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp OperationalError: (OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction') 'UPDATE floatingips SET status=%s WHERE floatingips.id = %s' ('ACTIVE', 'a030bb1e-31f0-42d7-84fc-520856f0ee66')

This is probably introduced in Icehouse with: https://review.openstack.org/#/c/66866/

More info at Red Hat bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1109577

tags: added: db
Changed in neutron:
importance: Undecided → Medium
Changed in neutron:
assignee: nobody → Salvatore Orlando (salvatore-orlando)
Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

One interesting thing about this bug is a potential situation of nested locking.

- delete_port will acquire a lock on a port resource-
- this routine will call disassociate_floating_ips which will probably acquire a lock on a floating IP resource

If the lock on the floating ip is held by some other thread (eg: an update fip status operation), then everything should be fine as soon as that lock is released. However we need to rule out something like the following might happen:

- thread A: update floating IP X, update Port Y
- thread B: delete port Y associated with floating IP X

thread A acquires floating IP X lock
thread B acquires delete port Y lock
thread A wait for Y lock held by thread B, thread B waits for X lock held by thread B --- deadlock!!!

And since we don't have any deadlock detection resolution mechanism - hell will ensue.

Bottom line is: let's go on with this pattern of doing resource-level locks but let's not get carried by it. Let's keep in mind this is a workaround for an eventlet issue, and not a 'final' solution.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/100724

Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Salvatore Orlando (<email address hidden>) on branch: master
Review: https://review.openstack.org/100724
Reason: w/e

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/100934

Changed in neutron:
assignee: Salvatore Orlando (salvatore-orlando) → Ihar Hrachyshka (ihar-hrachyshka)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/101218

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/101219

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: master
Review: https://review.openstack.org/101218
Reason: We can't expect the same behaviour retained long term, so let's avoid the patch to reduce work later in the future when we remove lots of notification related hacks, including the one that returns routers set() from disassociate_floatingips().

Kyle Mestery (mestery)
Changed in neutron:
milestone: none → juno-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/100934
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=876c2c25e15806529f4e197cd265bc3b2184aa14
Submitter: Jenkins
Branch: master

commit 876c2c25e15806529f4e197cd265bc3b2184aa14
Author: Ihar Hrachyshka <email address hidden>
Date: Wed Jun 18 16:56:25 2014 +0200

    Avoid notifying while inside transaction opened in delete_port()

    delete_port() calls to disassociate_floatingips() while in transaction.
    The latter method sends RPC notification which may result in eventlet
    yield. If yield switches a thread to another one that tries to access
    the same floating IP object in db as disassociate_floatingips() method
    does, we're locked and get db timeout.

    We should avoid calling to notifier while under transaction.

    To achieve this, I introduce a do_notify argument that controls whether
    notification is done by disassociate_floatingips() itself or delegated
    to caller. Callers that call to disassociate_floatingips() from under
    transactions should handle notifications on their own. For this,
    disassociate_floatingips() returns a set of routers that require
    notification.

    Updated drivers to reflect new behaviour. Added unit test.

    Change-Id: I2411f2aa778ea088be416d062c4816c16f49d2bf
    Closes-Bug: 1330955

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/105392

Changed in neutron:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/101219
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=eafebec2d3f6d0bd0ecc6e6d7c4f1ae421a44dfe
Submitter: Jenkins
Branch: master

commit eafebec2d3f6d0bd0ecc6e6d7c4f1ae421a44dfe
Author: Ihar Hrachyshka <email address hidden>
Date: Thu Jun 19 13:58:48 2014 +0200

    VMWare: don't notify on disassociate_floatingips()

    L3 agent notifications don't make sense for NSX VMWare plugin since
    there is no L3 agent in such setup, so disabling them here.

    Updated a unit test to check that notification is indeed not requested.

    Change-Id: I9c7c32d02d466098d22df8f10448361c3d99174c
    Related-Bug: 1330955

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/icehouse)

Related fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/109607

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/icehouse)

Reviewed: https://review.openstack.org/105392
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9c94d960310811b428abd0a48f37cf35a2e96940
Submitter: Jenkins
Branch: stable/icehouse

commit 9c94d960310811b428abd0a48f37cf35a2e96940
Author: Ihar Hrachyshka <email address hidden>
Date: Wed Jun 18 16:56:25 2014 +0200

    Avoid notifying while inside transaction opened in delete_port()

    delete_port() calls to disassociate_floatingips() while in transaction.
    The latter method sends RPC notification which may result in eventlet
    yield. If yield switches a thread to another one that tries to access
    the same floating IP object in db as disassociate_floatingips() method
    does, we're locked and get db timeout.

    We should avoid calling to notifier while under transaction.

    To achieve this, I introduce a do_notify argument that controls whether
    notification is done by disassociate_floatingips() itself or delegated
    to caller. Callers that call to disassociate_floatingips() from under
    transactions should handle notifications on their own. For this,
    disassociate_floatingips() returns a set of routers that require
    notification.

    Updated drivers to reflect new behaviour. Added unit test.

    Conflicts:
     neutron/db/l3_db.py
     neutron/plugins/bigswitch/plugin.py
     neutron/plugins/nuage/plugin.py

    Change-Id: I2411f2aa778ea088be416d062c4816c16f49d2bf
    Closes-Bug: 1330955
    (cherry picked from commit 876c2c25e15806529f4e197cd265bc3b2184aa14)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/icehouse)

Reviewed: https://review.openstack.org/109607
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4a0210ebc87fe716aa2aa441bc48ee2908bb2a2a
Submitter: Jenkins
Branch: stable/icehouse

commit 4a0210ebc87fe716aa2aa441bc48ee2908bb2a2a
Author: Ihar Hrachyshka <email address hidden>
Date: Thu Jun 19 13:58:48 2014 +0200

    VMWare: don't notify on disassociate_floatingips()

    L3 agent notifications don't make sense for NSX VMWare plugin since
    there is no L3 agent in such setup, so disabling them here.

    Updated a unit test to check that notification is indeed not requested.

    Conflicts:
     neutron/tests/unit/vmware/test_nsx_plugin.py

    Change-Id: I9c7c32d02d466098d22df8f10448361c3d99174c
    Related-Bug: 1330955
    (cherry picked from commit eafebec2d3f6d0bd0ecc6e6d7c4f1ae421a44dfe)

tags: added: in-stable-icehouse
Thierry Carrez (ttx)
Changed in neutron:
milestone: juno-2 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.