Comment 6 for bug 907808

Revision history for this message
Gabe Westmaas (westmaas) wrote :

I'd expect the instance to end up in a deleted state, which I believe is what happened, correct? There were of course errors from Xen, but still the end result was what everyone expected.

Yes, I misspoke, the spec was too strict, and it should have been caught sooner, and that was the mistake. Following the spec was not the mistake.

I'm fine with doing that, but right now we need to move forward with something. I don't know if this will be universally accepted, but as long as we have the right configuration available (whether or not to do it, which project to move to, what to do when that account is full from a quota standpoint, etc) we can do this.

In the meantime, I'd like to just take out the decorators on delete. I think the one remaining issue, which we should fix (differently) is that deleting while resizing has some race conditions - most of which will end with a deleted server (as expected), but in some cases may not.

Also, this doesn't fix our inability to rebuild, change password, or other things while taking a snapshot, along with other such state issues. There are a slew of vm and task states that translate to ACTIVE status and the user still gets a 409 back. Again, this can be fixed in the short term by updating the decorators.

What I propose is:
1) Remove decorators on delete
2) Update other decorators
3) Add in the troubleshooting abilities Johannes mentions above

I think longer term we should look at removing more and more of those restrictions at the API layer and adding more serialization lower down in the stack to resolve these race conditions.