OpenStack Compute (nova)

local_gb_used wrong in compute_nodes table when using Dell Cinder backend

Bug #1508907 reported by Tobias Urdin on 2015-10-22

This bug report is a duplicate of: Bug #1469179: instance.root_gb should be 0 for volume-backed instances. Edit Remove

This bug affects 4 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Confirmed	Undecided	Unassigned

Bug Description

We have compute nodes with a very small amount of disk so we deploy our instances to Cinder volumes with a Dell backend.
The issue is that when creating instances with a Cinder volume it still gets counted towards the local storage used (local_gb_used in compute_nodes table of the nova database) which results in faulty information on what's actually stored on local disk.

Before:

nova hypervisor-stats
+----------------------+--------+
| Property | Value |
+----------------------+--------+
| local_gb | 425 |
| local_gb_used | 80 |
+----------------------+--------+

cinder list
+--------------------------------------+-----------+---------------------------------------+------+-------------+----------+-------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+---------------------------------------+------+-------------+----------+-------------+
+--------------------------------------+-----------+---------------------------------------+------+-------------+----------+-------------+

After booting a new instance with 40 GB cinder volume.

So the volume is counted as local storage which is wrong and prevents us from knowing if an instance has been
booted on local disk which we need to know since we don't have any local disk for usage.

Anybody got any clues?
Best regards

Tags:

John Garbutt (johngarbutt) on 2015-11-13

tags:

added: volumes

Sean McGinnis (sean-mcginnis) on 2015-11-24

Changed in nova:
status:	New → Confirmed

Revision history for this message

Tobias Urdin (tobias-urdin) wrote on 2016-01-07:

This bug is still active, I will be interesting to see if this has any negative impact ("such as failing when spawning instances") when the amount of used volume space exceeds the total amount of local disk space.

Revision history for this message

Tobias Urdin (tobias-urdin) wrote on 2016-01-07:

Ok so I spawned enough instances to cover all the "local disk space" which is actually Cinder volumes counted as local hypervisor disk and now when I have filled it up all new created instances will fail because of "not enough disk space".

2016-01-07 15:34:13.642 23722 INFO nova.filters [req-0a2e28cd-07f6-4dfc-8d7c-7f925996b6c6 212f451de64b4ae89c853f1430510037 e47ebdf3f3934025b37df3b85bdfd565 - - -] Filter DiskFilter returned 0 hosts
2016-01-07 15:34:13.643 23722 INFO nova.filters [req-0a2e28cd-07f6-4dfc-8d7c-7f925996b6c6 212f451de64b4ae89c853f1430510037 e47ebdf3f3934025b37df3b85bdfd565 - - -] Filtering removed all hosts for the request with reservation ID 'r-k90hlv14' and instance ID '5fb489a3-03ab-494f-87fa-af0d86b61544'. Filter results: ['RetryFilter: (start: 4, end: 4)', 'AvailabilityZoneFilter: (start: 4, end: 4)', 'RamFilter: (start: 4, end: 4)', 'DiskFilter: (start: 4, end: 0)']

This needs to be elevanted since this corners a deployment into a critical state where no new instances can be spawned.

Revision history for this message

Tobias Urdin (tobias-urdin) wrote on 2016-01-08:

Temporary fix by removing the DiskFilter that was added as a default filter in nova.conf scheduler_default_filters.
Have not yet confirmed that this actually resolved it but it should.

Revision history for this message

Tobias Urdin (tobias-urdin) wrote on 2016-01-11:

I'm investigating the resource_tracker.py file in nova/compute folder.
As one can see the passed Instance objects contains the "root_gb" value which is the size of the Cinder volume since we boot our instances with a Cinder volume as root volume/disk.

And in resource_tracker.py _update_usage the local_gb_used is calculated using: self.compute_node.local_gb_used += sign * usage.get('root_gb', 0)

(where sign is 1 in the function prototype)

This means we will effectively add the cinder volumes size as local storage on the hypervisor which gives us the wrong scheduling, "graphs" in horizon and a faulty view on what is actually stored on local disk.

As a quick example, on one of our compute nodes we have a instance with 20 GB disk (a cinder volume attached to /dev/vda) and when dumping the instance object in the for loop in the function _update_usage_from_instances (in resource_tracker.py) we can see that the root_gb is set to 20 so this will then be counted as local disk which is wrong.

A note is that we deploy instances from our own control panel using the Nova API. Perhaps there is some parameters that should not be pushed with to make sure the root_gb is not set?

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1469179 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.