When KVM Host does not have enough ram, "nova show " does not show a fault message

Bug #1019017 reported by Jon McCormick
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Vish Ishaya

Bug Description

Steps to reproduce:
1. use a kvm host that is running as a vm on an esx host with 2048 Megs of RAM
2. on the controller do "nova boot --flavor 1 --image <valid-image-id> vm1
3. do the same nova boot command to create a vm2, vm3, vm4,.....
4. at some point, the nova boot command will fail because there won't be enough memory. You will see a message about ram filter in the nova-scheduler:

2012-04-03 12:37:20 DEBUG nova.scheduler.host_manager [req-49ea5605-9e31-4db6-955c-8acd59cf0f82 baaa0b3c108b4626a083007a72d37b01 4eabbf34471d469a91234b769b5b2af3] Host filter function <bound method RamFilter.host_passes of <nova.scheduler.filters.ram_filter.RamFilter object at 0x4217150>> failed for mymachine-host from (pid=4567) passes_filters /home/myuser/openstack/nova/app/nova/scheduler/host_manager.py:160

2012-04-03 12:37:20 WARNING nova.scheduler.manager [req-49ea5605-9e31-4db6-955c-8acd59cf0f82 baaa0b3c108b4626a083007a72d37b01 4eabbf34471d469a91234b769b5b2af3] Failed to schedule_run_instance: No valid host was found.

5. do a "nova show <vm name that failed>" and notice that there is no "fault" line in the output.

The fault line should show why a vm was not created. The nova show command will show a fault line if the fault_instances table has data in it. So the last "nova boot" command failed to create the vm. When that happened the OpenStack code should have added an entry to the fault_instances table with information about why the vm was not created.

To see a fault created correctly, do this:
1. set auto_assign_floating_ip to true in /etc/nova/nova/conf on either the controller or the kvm host (I am not sure which one it the right one)
2. make sure you do NOT have any floating ip pool configured
3. do "nova boot --flavor 1 --image <valid-image-id> <vm-name>"
4. do "nova show <vm-name>"
notice that there is a "fault" line in the output that will include a message about why the vm was not created.

The same thing should happen in all cases where a vm is not created.

Revision history for this message
Russell Bryant (russellb) wrote :

Looks right to me. It looks like compute/manager.py is the only place where instance faults are recorded. If the instance fails to be scheduled, it won't make it to the compute manager at all.

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Russell Bryant (russellb) wrote :

A patch just came up that should fix this: https://review.openstack.org/#/c/13159/

Changed in nova:
assignee: nobody → Vish Ishaya (vishvananda)
status: Confirmed → In Progress
milestone: none → folsom-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/13159
Committed: http://github.com/openstack/nova/commit/91734bad9139555294fe088d2c2d77a9712652ab
Submitter: Jenkins
Branch: master

commit 91734bad9139555294fe088d2c2d77a9712652ab
Author: Vishvananda Ishaya <email address hidden>
Date: Mon Sep 17 16:02:06 2012 -0700

    Fixes error handling during schedule_run_instance

    If there are not enough hosts available during a multi-instance launch,
    every failing instance should be updated to error state, instead of
    just the first instance. Currently only the first instance is set
    to Error and the rest stay in building.

    This patch makes a number of fixes to error handling during scheduling.

     * Moves instance faults into compute utils so they can be created
       from the scheduler.
     * Moves error handling into the driver so that each instance can be
       updated separately.
     * Sets an instance fault for failed scheduling
     * Sets task state back to none if there is a scheduling failure
     * Modifies chance scheduler to stop returning a list of instances
       as it is not used.
     * Modifies tests to check for these states.

    In addition to the included tests, the code was manually verified on
    a devstack install

    Fixes bug 1051066
    Fixes bug 1019017

    Change-Id: I49267ce4a21e2f7cc7a996fb2ed5d625f6794730

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-rc1 → 2012.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.