Asynchronous errors can only be reported if instance is in ERROR

Bug #1061062 reported by Johannes Erdfelt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Won't Fix
Medium
Jim Jiang

Bug Description

The API currently states that asynchronous errors can only be reported if the instance is in ERROR state. Nova currently implements that too.

However, this implies that all problems that occur are fatal.

One instance where an error may not necessarily be fatal is when communicating with the agent. During a spawn of a new instance, the agent is used (in the case of the xenapi driver at least) to set the root password, possibly inject files and set the network config.

Often these problems leave the instance running, but maybe in a partially configured state. Moving the instance to ERROR should shut off the instance, leaving it unavailable to troubleshoot. Logging an asynchronous error doesn't make it visible to the user unless the instance is in ERROR.

Right now the xenapi driver just logs agent failures and continues. This can leave the user confused as to why the instance doesn't appear to be configured correctly since there is no user accessible way to obtain non-fatal errors.

Revision history for this message
Brian Waldon (bcwaldon) wrote :

I definitely agree with the this (it's something that has come up in my usage of the API several times) but I don't know how best to fix it. The API defines this behavior you document as a bug (which I would agree with), so maybe we have to add an extension and expose async faults multiple ways?

Revision history for this message
Johannes Erdfelt (johannes.erdfelt) wrote :

An extension could be a good way to expose these errors in a backwards compatible way.

The problem then is how to determine if the asynchronous faults returned are relevant.

The API currently doesn't return a timestamp for instance faults, but the database does have a created_at timestamp. It could probably also record the request-id to make it easier to correlate a fault to a particular request.

This requires the client to do a little more work recording the request-id to correlate faults to.

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Jim Jiang (jiangwt100)
Changed in nova:
assignee: nobody → 蒋闻天 (jiangwt100)
Revision history for this message
Andrew Laski (alaski) wrote :

A method for addressing this has been described at https://blueprints.launchpad.net/nova/+spec/instance-actions

Revision history for this message
John Garbutt (johngarbutt) wrote :

Now instance actions has been implemented, lets kill this one, its more of a feature really?

Changed in nova:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.