Loss of rabbit MQ connection should surely trigger nova-compute XXX?

Bug #1095533 reported by Bharath Kumar Kobagana
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

Though Rabbit MQ service is down, nova-compute service status is showing as alive, so VMs are regularly scheduled to this node and get stuck.

Steps to reproduce:

  1. Stop the Rabbit MQ service:
       > sudo service rabbitmq-server stop
  2. Check the services status:
       > sudo nova-manage service list
  3. Boot an isntance
  4. Check if instance came to active or not.

Proposed Solution:

Need to detect compute managers which have failed to contact the dependent service like RabitMQ, libvirt etc., after identifying the problem (after configurable number of retries) compute service should update "disabled" flag to 1 in database with proper reason and it should update the disabled flag to 0 once the detected problem resolves.

With the proposed solution when the Rabbit MQ service is down service list shows as follows:

nova-network nv-aw1st21-compute0001 nova disabled (unable to contact RabbitMQ) enabled 2013-01-02 05:33:00

Please comment on the proposed approach.

description: updated
Revision history for this message
Ivan-Zhu (ivan-zhu) wrote :

I think if we set the nova-compute service disabled, it will be confused, when one dependent service is down. The User will think nova-compute cann't boot or something wrong in nova-compute self. I think when one dependent service is down, we can find why nova-compute can not work in nova-compute log, it will be ok.

Revision history for this message
Russell Bryant (russellb) wrote :

I think I disagree here. If rabbitmq goes down, the whole nova deployment is in trouble, and nothing is going to work anyway. You can't even send a message to the scheduler asking it to schedule something, so setting a compute node to disabled is a moot point.

Changed in nova:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.