Server builds that fail due to an incorrect imageRef being used still consume resources towards quota

Bug #1041581 reported by Daryl Walleck
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Nikola Đipanov

Bug Description

Due to a typo, I was trying to create servers with an imageRef that is not in Glance when I suddenly started getting 413's back instead of 404's. At that point I could no longer create servers. Here's an example me of checking my limits, and then trying a create:

200
{"limits": {"rate": [], "absolute": {"maxServerMeta": 128, "maxTotalInstances": 33, "maxPersonality": 5, "maxImageMeta": 128, "maxPersonalitySize": 10240, "totalVolumesUsed": 0, "totalCoresUsed": 0, "maxTotalKeypairs": 100, "totalRAMUsed": 0, "maxTotalVolumes": 10, "totalInstancesUsed": 0, "totalVolumeGigabytesUsed": 0, "maxTotalCores": 20, "totalSecurityGroupsUsed": 0, "maxTotalFloatingIps": 10, "totalKeyPairsUsed": 0, "maxTotalVolumeGigabytes": 1000, "maxTotalRAMSize": 51200}}}
413
{"overLimit": {"message": "Quota exceeded for cores: Requested 2, but already used 20 of 20 cores", "code": 413, "retryAfter": 0}}

This shows I had no cores or instances used, but was still being rejected. I searched around in the nova database, and I think I see what happened:

mysql> select * from quota_usages;
+---------------------+---------------------+------------+---------+----+----------------------------------+-----------+--------+----------+---------------+
| created_at | updated_at | deleted_at | deleted | id | project_id | resource | in_use | reserved | until_refresh |
+---------------------+---------------------+------------+---------+----+----------------------------------+-----------+--------+----------+---------------+
| 2012-08-25 09:49:08 | 2012-08-25 10:03:43 | NULL | 0 | 1 | b530c880117d404ebfbc3f93470f31f6 | instances | 0 | 17 | NULL |
| 2012-08-25 09:49:08 | 2012-08-25 10:03:43 | NULL | 0 | 2 | b530c880117d404ebfbc3f93470f31f6 | ram | 0 | 36352 | NULL |
| 2012-08-25 09:49:08 | 2012-08-25 10:03:43 | NULL | 0 | 3 | b530c880117d404ebfbc3f93470f31f6 | cores | 0 | 20 | NULL |
+---------------------+---------------------+------------+---------+----+----------------------------------+-----------+--------+----------+---------------+

It seems like the values in this table are being incremented regardless of whether the create server request was successful.

Michael Still (mikal)
Changed in nova:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Nikola Đipanov (ndipanov) wrote :

Hi,

I spent some time looking into this and the problem seems to be in the nova.compute.api.API._create_instance method.

What is happening is that this method calls another method of the same class - _check_num_instances_quota which in turn calls QUOTAS.reserve method that reserves an instance. However after this not all possible paths will resutl in the reservation being commited/rolled back, and one of these paths is indeed when the image lookup fails with Glance. (it will cause an ImageNotFound exception that is only caught several levels up by the nova.api.openstack.compute.servers.create - which of course does not rollback the reservation).

(Two things worth mentioning:
* if you try to create an instance with a non existant image using the python-novaclient or dashbord - it will be caught before it hits the nova API and thus will not cause the wrong reservations.
* Since the reservations do expire - they will be refreshed and thus freed up at one point in time in a real system. The default value is 24 hours (reservation_expire config option in nova.quota)

The quick fix is of course to make sure all reservations are rolledback/commited. inside the _create_instance method. I will propose a fix for this soon.

However this makes me think that there may be more places where quota reservations are handeled in such a non-symetric way - which makes me think that we may want to provide a more straightforward method of dealing with them.

Changed in nova:
assignee: nobody → Nikola Đipanov (ndipanov)
Revision history for this message
Nikola Đipanov (ndipanov) wrote :

The following change fixes the issue actually : https://review.openstack.org/#/c/14289/

This can be considered closed

Changed in nova:
status: Triaged → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → grizzly-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-1 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.