NUMATopologyFilter doesn't account for CPU/RAM overcommit

Bug #1484742 reported by Chris Friesen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Chris Friesen

Bug Description

There seems to be a bug in the NUMATopologyFilter where it doesn't properly account for cpu_allocation_ratio or ram_allocation_ratio. (Detected on stable/kilo, not sure if it applies to current master.)

To reproduce:

1) Create a flavor with a moderate number of CPUs (5, for example) and enable hugepages by setting "hw:mem_page_size=2048" in the flavor extra specs. Do not specify dedicated CPUs on the flavor.

2) Ensure that the available compute nodes have fewer CPUs free than the number of CPUs in the flavor above.

3) Ensure that the "cpu_allocation_ratio" is big enough that "num_free_cpus * cpu_allocation_ratio" is more than the number of CPUs in the flavor above.

4) Enable the NUMATopologyFilter for the nova filter scheduler.

5) Try to boot an instance with the specified flavor.

This should pass, because we're not using dedicated CPUs and so the "cpu_allocation_ratio" should apply. However, the NUMATopologyFilter returns 0 hosts.

It seems like the NUMATopologyFilter is failing to properly account for the cpu_allocation_ratio when checking whether an instance can fit onto a given host.

Chris Friesen (cbf123)
description: updated
Chris Friesen (cbf123)
summary: - NUMATopologyFilter doesn't account for cpu_allocation_ratio
+ NUMATopologyFilter doesn't account for CPU/RAM overcommit
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/213268

Changed in nova:
assignee: nobody → Chris Friesen (cbf123)
status: New → In Progress
Revision history for this message
Nikola Đipanov (ndipanov) wrote :

As commented on the code review:

The idea was that it's OK to have overcommit, but an instance larger than a NUMA node should _never_ land on that NUMA noda, as it would effectively be overcommiting against itself.

This is not how overcommit on host level works - but it should probably get fixed there as it is questionable whether overcommitting an instance against itself makes sense. So maybe we want to have a new bug for that and close this one?

If you are seeing the opposite, that the instance is not larger than the whole of NUMA node itself. but still won't get considered for CPU overcommit with non-pinned NUMA requested - than that's a different bug and your patch won't fix it and we should investigate more.

Revision history for this message
Chris Friesen (cbf123) wrote :

I've been testing the case where a single instance is larger than the number of host logical CPUs, so that would fit with your explanation. I can see why one might chose to implement that, though as you say it's not great to have different overcommit behaviour depending on whether or not the NUMA filter is involved.

I may do as you suggest and open up a separate bug specifically addressing the behaviour difference.

Chris Friesen (cbf123)
Changed in nova:
status: In Progress → Invalid
Revision history for this message
Chris Friesen (cbf123) wrote :

Closing as "invalid" based on Nikola's comments above. Bug 1485631 has been opened to unify the logic between the two cases.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Chris Friesen (<email address hidden>) on branch: master
Review: https://review.openstack.org/213268
Reason: Abandoning change based on Nikola's comments. Bug 1485631 has been opened to unify the logic between the NUMA-topology case and the no-NUMA-topoology case.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.