local_gb_used wrong in compute_nodes table when using Dell Cinder backend

Bug #1508907 reported by Tobias Urdin
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Undecided
Unassigned

Bug Description

We have compute nodes with a very small amount of disk so we deploy our instances to Cinder volumes with a Dell backend.
The issue is that when creating instances with a Cinder volume it still gets counted towards the local storage used (local_gb_used in compute_nodes table of the nova database) which results in faulty information on what's actually stored on local disk.

Before:

nova hypervisor-stats
+----------------------+--------+
| Property | Value |
+----------------------+--------+
| local_gb | 425 |
| local_gb_used | 80 |
+----------------------+--------+

cinder list
+--------------------------------------+-----------+---------------------------------------+------+-------------+----------+-------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+---------------------------------------+------+-------------+----------+-------------+
+--------------------------------------+-----------+---------------------------------------+------+-------------+----------+-------------+

nova list
+----+------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+----+------+--------+------------+-------------+----------+
+----+------+--------+------------+-------------+----------+

After booting a new instance with 40 GB cinder volume.

nova hypervisor-stats
+----------------------+--------+
| Property | Value |
+----------------------+--------+
| local_gb | 425 |
| local_gb_used | 120 |

cinder list
+--------------------------------------+-----------+---------------------------------------+------+-------------+----------+--------------------------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+---------------------------------------+------+-------------+----------+--------------------------------------+
| 15345aa2-efc5-4a02-924a-963c0572a399 | in-use | None | 40 | None | true | 29cbe001-4eca-4b2c-972e-c19121a7cc31 |
+--------------------------------------+-----------+---------------------------------------+------+-------------+----------+--------------------------------------+

nova list
+--------------------------------------+--------+--------+------------+-------------+--------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+--------+--------+------------+-------------+--------------------+
| 29cbe001-4eca-4b2c-972e-c19121a7cc31 | tester | ACTIVE | - | Running | test=192.168.28.25 |
+--------------------------------------+--------+--------+------------+-------------+--------------------+

So the volume is counted as local storage which is wrong and prevents us from knowing if an instance has been
booted on local disk which we need to know since we don't have any local disk for usage.

Anybody got any clues?
Best regards

Tags: cinder volumes
tags: added: volumes
Changed in nova:
status: New → Confirmed
Revision history for this message
Tobias Urdin (tobias-urdin) wrote :

This bug is still active, I will be interesting to see if this has any negative impact ("such as failing when spawning instances") when the amount of used volume space exceeds the total amount of local disk space.

Revision history for this message
Tobias Urdin (tobias-urdin) wrote :

Ok so I spawned enough instances to cover all the "local disk space" which is actually Cinder volumes counted as local hypervisor disk and now when I have filled it up all new created instances will fail because of "not enough disk space".

2016-01-07 15:34:13.642 23722 INFO nova.filters [req-0a2e28cd-07f6-4dfc-8d7c-7f925996b6c6 212f451de64b4ae89c853f1430510037 e47ebdf3f3934025b37df3b85bdfd565 - - -] Filter DiskFilter returned 0 hosts
2016-01-07 15:34:13.643 23722 INFO nova.filters [req-0a2e28cd-07f6-4dfc-8d7c-7f925996b6c6 212f451de64b4ae89c853f1430510037 e47ebdf3f3934025b37df3b85bdfd565 - - -] Filtering removed all hosts for the request with reservation ID 'r-k90hlv14' and instance ID '5fb489a3-03ab-494f-87fa-af0d86b61544'. Filter results: ['RetryFilter: (start: 4, end: 4)', 'AvailabilityZoneFilter: (start: 4, end: 4)', 'RamFilter: (start: 4, end: 4)', 'DiskFilter: (start: 4, end: 0)']

This needs to be elevanted since this corners a deployment into a critical state where no new instances can be spawned.

Revision history for this message
Tobias Urdin (tobias-urdin) wrote :

Temporary fix by removing the DiskFilter that was added as a default filter in nova.conf scheduler_default_filters.
Have not yet confirmed that this actually resolved it but it should.

Revision history for this message
Tobias Urdin (tobias-urdin) wrote :

I'm investigating the resource_tracker.py file in nova/compute folder.
As one can see the passed Instance objects contains the "root_gb" value which is the size of the Cinder volume since we boot our instances with a Cinder volume as root volume/disk.

And in resource_tracker.py _update_usage the local_gb_used is calculated using: self.compute_node.local_gb_used += sign * usage.get('root_gb', 0)

(where sign is 1 in the function prototype)

This means we will effectively add the cinder volumes size as local storage on the hypervisor which gives us the wrong scheduling, "graphs" in horizon and a faulty view on what is actually stored on local disk.

As a quick example, on one of our compute nodes we have a instance with 20 GB disk (a cinder volume attached to /dev/vda) and when dumping the instance object in the for loop in the function _update_usage_from_instances (in resource_tracker.py) we can see that the root_gb is set to 20 so this will then be counted as local disk which is wrong.

A note is that we deploy instances from our own control panel using the Nova API. Perhaps there is some parameters that should not be pushed with to make sure the root_gb is not set?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.