Scheduler Race Condition at high volume

Bug #1073956 reported by Jon Proulx
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Vish Ishaya
Folsom
Fix Released
High
Vish Ishaya
nova (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

When trying to rapidly schedule hundres of instances the nova scheduler will not honor cpu_allocation_ratio and ram_allocation_ratio.

using 'euca-run-instances -n' or a shell loop around 'nova boot' will produce this I've seen it with numbers as low as 100 though others report it taking >300 to reproduce.

On a cloud with a single nova-scheduler host configured with cpu_allocation_ration=1.0 and targeting a single compute node with 24 VCPUs (named nova-1) this command will schedule 100 instances where only 24 should be allowed:

for i in `seq 1 100`;do nova boot --image someimage --availability-zone nova:nova-1 --flavor 1 chaff-$i > /dev/null & done

Operating system is Ubuntu 12.04, using package 2012.2-0ubuntu5~cloud0 from ubuntu-cloud.archive.canonical.com

Revision history for this message
Vish Ishaya (vishvananda) wrote :

A few commments here:

a) the availability-zone scheduling uses forced_host which skips the scheduler logic.

b) euca-run-instances -n actually uses a different code path than nova boot with a shell loop. I have not been able to reproduce your issue in this case.

c) The issue with the shell loop is reproducible and I have a patch which appears to fix it in the case of one scheduler. Multiple schedulers could be trickier.

Changed in nova:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Vish Ishaya (vishvananda)
tags: added: folsom-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/15215

Changed in nova:
status: Triaged → In Progress
Revision history for this message
Jon Proulx (jproulx) wrote :

After writing this I considered that availability-zone scheduling might just by pass scheduler logic since there is per force one choice.

I am definitely seeing this with the euca-run-instances -n case (I belive this is a single API request rather than the N api request the shell loop gives?).

Using the ChanceScheduler euca-run-instances -n 700 gives me 700 instance in active state in about 8min whit no over scheduling on compute nodes. I had about 800 unallocated VCPUs over 43 nodes when I ran this so it's a significant fraction of my available capacity.

Switching back to the FilterScheduler with the same command 230 instances got schedule on one node that only had 17 available.

I've seen similar behaviour with numbers as low as 100. (using packaged code not your patch)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/15215
Committed: http://github.com/openstack/nova/commit/94560ab57d9fc23673f42017e6f2a78cb2b66b7a
Submitter: Jenkins
Branch: master

commit 94560ab57d9fc23673f42017e6f2a78cb2b66b7a
Author: Vishvananda Ishaya <email address hidden>
Date: Wed Oct 31 19:41:04 2012 -0700

    Eliminates simultaneous schedule race.

    Keeps host state in memory so multiple schedule attempts use the
    up-to-date values that may have been modified by another greenthread.

    Fixes bug 1073956

    Change-Id: I69fdd9b46bde6b7408c501c42a6ef3b6dd92bbc2

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu:
status: New → Confirmed
Revision history for this message
Joe Breu (breu) wrote :

I've verified that this affects Folsom as well and the patch does resolve the issue.

Revision history for this message
Joe Breu (breu) wrote :
Download full text (5.8 KiB)

Hey Vish,

I'm still seeing the behavior happen when scheduling at least 52 instances nearly simultaneously. I am running with the patch mentioned above. We are running a single instance of the scheduler.

The scheduler placed the instances in the following manner:
compute-node07 : 4
compute-node09 : 3
compute-node10 : 3
compute-node11 : 3
compute-node22 : 18
compute-node26 : 13
compute-node36 : 4
compute-node37 : 4

all nodes have 96GB of RAM.

nova.conf:
compute_scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
scheduler_available_filters=nova.scheduler.filters.standard_filters
# which filter class names to use for filtering hosts when not specified in the request.
#scheduler_default_filters=AvailabilityZoneFilter,RamFilter,ComputeFilter,CoreFilter,SameHostFilter,DifferentHostFilter,RetryFilter
scheduler_default_filters=RamFilter,ComputeFilter,RetryFilter
node_availability_zone=nova
default_schedule_zone=nova
compute_fill_first_cost_fn_weight=-1.0
scheduler_max_attempts=5

Here is a snippet from the nova-scheduler.log

2012-11-15 22:28:16 DEBUG nova.openstack.common.rpc.amqp [-] Making asynchronous cast on compute.compute-node22... from (pid=10327) cast /usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py:376
2012-11-15 22:28:16 DEBUG nova.openstack.common.rpc.amqp [-] Making asynchronous cast on compute.compute-node22... from (pid=10327) cast /usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py:376
2012-11-15 22:28:16 DEBUG nova.openstack.common.rpc.amqp [-] Making asynchronous cast on compute.compute-node22... from (pid=10327) cast /usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py:376
2012-11-15 22:28:16 DEBUG nova.openstack.common.rpc.amqp [-] Making asynchronous cast on compute.compute-node22... from (pid=10327) cast /usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py:376
2012-11-15 22:28:17 DEBUG nova.openstack.common.rpc.amqp [-] Making asynchronous cast on compute.compute-node22... from (pid=10327) cast /usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py:376
2012-11-15 22:28:18 DEBUG nova.openstack.common.rpc.amqp [-] Making asynchronous cast on compute.compute-node22... from (pid=10327) cast /usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py:376
2012-11-15 22:28:18 DEBUG nova.openstack.common.rpc.amqp [-] Making asynchronous cast on compute.compute-node22... from (pid=10327) cast /usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py:376
2012-11-15 22:28:18 DEBUG nova.openstack.common.rpc.amqp [-] Making asynchronous cast on compute.compute-node22... from (pid=10327) cast /usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py:376
2012-11-15 22:28:18 DEBUG nova.openstack.common.rpc.amqp [-] Making asynchronous cast on compute.compute-node22... from (pid=10327) cast /usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py:376
2012-11-15 22:28:18 DEBUG nova.openstack.common.rpc.amqp [-] Making asynchronous cast on compute.compute-node22... from (pid=10327) cast /usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py:376
2012-11-15 22:28:19 DEBUG nova.openstack.common.rpc.amqp [-] Maki...

Read more...

Chuck Short (zulcss)
tags: removed: folsom-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/folsom)

Fix proposed to branch: stable/folsom
Review: https://review.openstack.org/16430

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/folsom
Review: https://review.openstack.org/16587

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/folsom)

Reviewed: https://review.openstack.org/16587
Committed: http://github.com/openstack/nova/commit/b874d21fcdd804ba8f92f06fbcf4b09c634995a4
Submitter: Jenkins
Branch: stable/folsom

commit b874d21fcdd804ba8f92f06fbcf4b09c634995a4
Author: Vishvananda Ishaya <email address hidden>
Date: Wed Oct 31 19:41:04 2012 -0700

    Eliminates simultaneous schedule race.

    Keeps host state in memory so multiple schedule attempts use the
    up-to-date values that may have been modified by another greenthread.

    Fixes bug 1073956

    Change-Id: I69fdd9b46bde6b7408c501c42a6ef3b6dd92bbc2
    (cherry picked from commit 94560ab57d9fc23673f42017e6f2a78cb2b66b7a)

Thierry Carrez (ttx)
Changed in nova:
milestone: none → grizzly-1
status: Fix Committed → Fix Released
affects: ubuntu → nova (Ubuntu)
Changed in nova (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Clint Byrum (clint-fewbar) wrote : Please test proposed package

Hello Jon, or anyone else affected,

Accepted nova into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/nova/2012.2.1+stable-20121212-a99a802e-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-1 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.