Loss of rabbit MQ connection should surely trigger nova-compute XXX?
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Though Rabbit MQ service is down, nova-compute service status is showing as alive, so VMs are regularly scheduled to this node and get stuck.
Steps to reproduce:
1. Stop the Rabbit MQ service:
> sudo service rabbitmq-server stop
2. Check the services status:
> sudo nova-manage service list
3. Boot an isntance
4. Check if instance came to active or not.
Proposed Solution:
Need to detect compute managers which have failed to contact the dependent service like RabitMQ, libvirt etc., after identifying the problem (after configurable number of retries) compute service should update "disabled" flag to 1 in database with proper reason and it should update the disabled flag to 0 once the detected problem resolves.
With the proposed solution when the Rabbit MQ service is down service list shows as follows:
nova-network nv-aw1st21-
Please comment on the proposed approach.
description: | updated |
I think if we set the nova-compute service disabled, it will be confused, when one dependent service is down. The User will think nova-compute cann't boot or something wrong in nova-compute self. I think when one dependent service is down, we can find why nova-compute can not work in nova-compute log, it will be ok.