cluster action dependency list can't be updated correctly

Bug #1495449 reported by Yanyan Hu
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
senlin
Fix Released
Critical
Yanyan Hu

Bug Description

In cases where cluster has large desired_capacity, some node_create/delete actions could have finished before other node_create/delete actions are created. In this case, action_del_dependency and action_add_dependency method could be invoked in different DB sessions at the same time to update the cluster action's depends_on list and thus cause desynchrony. This issue prevents cluster action's depends_on list being cleaned correctly after all node_create/delete actions are done.

The following is an example log of this kind of situation. We can see at 05:24:06.831, the first node was created successfully before the third node's creation was started. Although the node_create action a7228666-b759-4fc9-a83b-5e3756b3589b has been removed from cluster action's depends_on list at 05:24:06.831, it still appeared in the depends_on list when the third node_create action was created at 05:24:07.038.

2015-09-14 05:24:06.345 17767 INFO senlin.engine.event [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] cluster_create_6d699ed0[5b8ba486-ae44-451a-bf7a-078bc06dbaf4] CLUSTER_CREATE - START: None
2015-09-14 05:24:06.482 17767 INFO senlin.engine.event [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] node-6d699ed0-001[None] create - None: None
2015-09-14 05:24:06.509 17767 ERROR senlin.db.sqlalchemy.api [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] add new depenedency to action 5b8ba486-ae44-451a-bf7a-078bc06dbaf4
2015-09-14 05:24:06.510 17767 ERROR senlin.db.sqlalchemy.api [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] before add: []
2015-09-14 05:24:06.510 17767 ERROR senlin.db.sqlalchemy.api [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] after add: ['dab8404e-b369-4fac-a75f-9f91506a12fc']
2015-09-14 05:24:06.545 17767 ERROR senlin.engine.event [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] node_create_16a8ace5[dab8404e-b369-4fac-a75f-9f91506a12fc] NODE_CREATE - READY: None
2015-09-14 05:24:06.546 17767 DEBUG oslo_messaging._drivers.amqpdriver [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] MSG_ID is e85d39d14abd478c8dc99b840446ce68 _send /home/stack/huyanyan/python/virtualenv/SENLINENV/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:392
2015-09-14 05:24:06.578 17767 INFO senlin.engine.event [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] node_create_16a8ace5[dab8404e-b369-4fac-a75f-9f91506a12fc] NODE_CREATE - START: None
2015-09-14 05:24:06.677 17767 INFO senlin.engine.event [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] node-6d699ed0-001[None] create - None: None
2015-09-14 05:24:06.682 17767 INFO senlin.engine.event [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] node-6d699ed0-001[16a8ace5-0663-47ba-863e-5cea6f6b502c] create - None: None
2015-09-14 05:24:06.715 17767 ERROR senlin.db.sqlalchemy.api [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] add new depenedency to action 5b8ba486-ae44-451a-bf7a-078bc06dbaf4
2015-09-14 05:24:06.715 17767 ERROR senlin.db.sqlalchemy.api [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] before add: ['dab8404e-b369-4fac-a75f-9f91506a12fc']
2015-09-14 05:24:06.716 17767 ERROR senlin.db.sqlalchemy.api [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] after add: ['dab8404e-b369-4fac-a75f-9f91506a12fc', 'a7228666-b759-4fc9-a83b-5e3756b3589b']
2015-09-14 05:24:06.737 17767 INFO senlin.profiles.os.nova.server [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] Creating server: {'name': u'node-6d699ed0-001-FBspDyMS', 'imageRef': '70a599e0-31e7-49b7-b260-868f441e862b', 'key_name': 'oskey', 'flavorRef': '1', 'timeout': 120, 'auto_disk_config': True, 'networks': [{'uuid': 'd32019d3-bc6e-4319-9c1d-6722fc136a22', 'fixed-ip': None, 'port': None}], 'metadata': {'cluster': u'6d699ed0-6e91-4de6-910c-42f3943fd13b'}}
2015-09-14 05:24:06.796 17767 ERROR senlin.engine.event [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] node_create_bf7c5c1c[a7228666-b759-4fc9-a83b-5e3756b3589b] NODE_CREATE - READY: None
2015-09-14 05:24:06.796 17767 DEBUG oslo_messaging._drivers.amqpdriver [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] MSG_ID is 0331243aaa80402f9c5bbaca6f2a358d _send /home/stack/huyanyan/python/virtualenv/SENLINENV/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:392
2015-09-14 05:24:06.799 17767 INFO senlin.engine.event [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] node-6d699ed0-001[16a8ace5-0663-47ba-863e-5cea6f6b502c] update - None: None
2015-09-14 05:24:06.831 17767 ERROR senlin.db.sqlalchemy.api [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] Remove depenedency from action 5b8ba486-ae44-451a-bf7a-078bc06dbaf4
2015-09-14 05:24:06.832 17767 ERROR senlin.db.sqlalchemy.api [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] before remove: [u'dab8404e-b369-4fac-a75f-9f91506a12fc', u'a7228666-b759-4fc9-a83b-5e3756b3589b']
2015-09-14 05:24:06.832 17767 ERROR senlin.db.sqlalchemy.api [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] after remove: [u'dab8404e-b369-4fac-a75f-9f91506a12fc']
2015-09-14 05:24:06.832 17767 ERROR senlin.db.sqlalchemy.api [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] [u'a7228666-b759-4fc9-a83b-5e3756b3589b']
2015-09-14 05:24:06.890 17767 INFO senlin.engine.event [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] node_create_16a8ace5[dab8404e-b369-4fac-a75f-9f91506a12fc] NODE_CREATE - SUCCEEDED: Node created successfully
2015-09-14 05:24:06.907 17767 INFO senlin.engine.event [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] node_create_bf7c5c1c[a7228666-b759-4fc9-a83b-5e3756b3589b] NODE_CREATE - START: None
2015-09-14 05:24:07.005 17767 INFO senlin.engine.event [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] node-6d699ed0-001[None] create - None: None
2015-09-14 05:24:07.010 17767 INFO senlin.engine.event [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] node-6d699ed0-001[bf7c5c1c-39aa-44e5-9934-44f15570fe03] create - None: None
2015-09-14 05:24:07.038 17767 ERROR senlin.db.sqlalchemy.api [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] add new depenedency to action 5b8ba486-ae44-451a-bf7a-078bc06dbaf4
2015-09-14 05:24:07.038 17767 ERROR senlin.db.sqlalchemy.api [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] before add: ['dab8404e-b369-4fac-a75f-9f91506a12fc', 'a7228666-b759-4fc9-a83b-5e3756b3589b']
2015-09-14 05:24:07.039 17767 ERROR senlin.db.sqlalchemy.api [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] after add: ['dab8404e-b369-4fac-a75f-9f91506a12fc', 'c2530564-2cba-4b4b-94f1-9271cc39a206', 'a7228666-b759-4fc9-a83b-5e3756b3589b']
2015-09-14 05:24:07.051 17767 INFO senlin.profiles.os.nova.server [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] Creating server: {'name': u'node-6d699ed0-001-KP6UWWxC', 'imageRef': '70a599e0-31e7-49b7-b260-868f441e862b', 'key_name': 'oskey', 'flavorRef': '1', 'timeout': 120, 'auto_disk_config': True, 'networks': [{'uuid': 'd32019d3-bc6e-4319-9c1d-6722fc136a22', 'fixed-ip': None, 'port': None}], 'metadata': {'cluster': u'6d699ed0-6e91-4de6-910c-42f3943fd13b'}}
2015-09-14 05:24:07.113 17767 ERROR senlin.engine.event [req-16a2a93f-5558-49e1-bf56-a189444454bc admin] node_create_ec52baeb[c2530564-2cba-4b4b-94f1-9271cc39a206] NODE_CREATE - READY: None

Yanyan Hu (yanyanhu)
description: updated
description: updated
Revision history for this message
Yanyan Hu (yanyanhu) wrote :
Changed in senlin:
assignee: nobody → Yanyan Hu (yanyanhu)
Changed in senlin:
status: New → In Progress
Yanyan Hu (yanyanhu)
description: updated
description: updated
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to senlin (master)

Reviewed: https://review.openstack.org/223033
Committed: https://git.openstack.org/cgit/stackforge/senlin/commit/?id=232825ea94017085a00b666dd4e77bf6774bbd30
Submitter: Jenkins
Branch: master

commit 232825ea94017085a00b666dd4e77bf6774bbd30
Author: yanyanhu <email address hidden>
Date: Mon Sep 14 05:37:10 2015 -0400

    Fix a bug about add/del action dependency

    This patch uses session.refresh to get the lastest action record
    before each time updating action's dependency list to avoid
    possible desynchrony between different DB sessions. It also
    replaces the 'context' input parameter of _action_dependency_add
    method by 'query' parameter to unify the implementation of
    _action_dependency_del and _action_dependency_add methods.

    Change-Id: I126c72c2b28fe5ab18bd652d8106eb4ae070ff0e
    Closes-Bug: #1495449

Changed in senlin:
status: In Progress → Fix Committed
Revision history for this message
Yanyan Hu (yanyanhu) wrote :

Reopen it since we meet it again.

Changed in senlin:
status: Fix Committed → Confirmed
status: Confirmed → New
Changed in senlin:
status: New → In Progress
Qiming Teng (tengqim)
Changed in senlin:
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/258308
Committed: https://git.openstack.org/cgit/openstack/senlin/commit/?id=8d7c4902c8bd24c74890b509732440e143492a21
Submitter: Jenkins
Branch: master

commit 8d7c4902c8bd24c74890b509732440e143492a21
Author: tengqm <email address hidden>
Date: Wed Dec 16 02:55:48 2015 -0500

    Fix action dependency problem

    This is an attempt to fix action dependency update problem.

    Closes-Bug: #1495449
    Change-Id: I690b9c55394dc693c205c8d6914bfbf22bdd6fc7

Changed in senlin:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/259315
Committed: https://git.openstack.org/cgit/openstack/senlin/commit/?id=bd5091b0d75f2c21d185ff5add0ebcfd7fc1544b
Submitter: Jenkins
Branch: master

commit bd5091b0d75f2c21d185ff5add0ebcfd7fc1544b
Author: tengqm <email address hidden>
Date: Fri Dec 18 02:51:04 2015 -0500

    Add 'dependency' table

    Add a new db table for tracking action dependencies. Previously, we were
    using two list fields (note that a list is ordered) for this purpose. It
    has caused quite some problems:
     - Binding List datatype to sqlalchemy.Text is causing DB confusion
       about the mutable Dict data type.
     - A List column cannot be checked and/or set in a single SQL statement.
     - SqlAlchemy ORM layer has very limited support to transactions and
       even that has been compromised by the oslo.db package.

    Change-Id: I4b97fcc9fcf7684adbfd48a2f908a0f864691310
    Partial-Bug: #1495449

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to senlin (master)

Fix proposed to branch: master
Review: https://review.openstack.org/259686

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to senlin (master)

Reviewed: https://review.openstack.org/259686
Committed: https://git.openstack.org/cgit/openstack/senlin/commit/?id=be826cace408087a9a98f69e36e97263ba6f0f05
Submitter: Jenkins
Branch: master

commit be826cace408087a9a98f69e36e97263ba6f0f05
Author: tengqm <email address hidden>
Date: Sat Dec 19 02:12:40 2015 -0500

    Rework action dependency

    This patch reworks the action dependency handling.

    Closes-Bug: #1495449
    Change-Id: Ie302853aeddfb3fc80010746f2f9633e7bf9e086

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/senlin 1.0.0.0b2

This issue was fixed in the openstack/senlin 1.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on senlin (master)

Change abandoned by Yanyan Hu (<email address hidden>) on branch: master
Review: https://review.openstack.org/252231

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.