Comment 4 for bug 1688508

Revision history for this message
Christoph Wolff (yonk) wrote :

Hello Christian, thank you very much for your detailed response.

>The default value for this is PARALLEL_SHUTDOWN=10 so everybody would run into this issue.
>I assume that there needs to be more to this than just "broken in general", so let us try to find what it is that makes this fail for you.

These were exactly my thoughts when I encountered the bug.

>While certainly broken and needing a fix this should at least still time out for you after the >default of 2 minutes right?
>You could lessen the timeout as the most convenient until a proper fix is there then.

Actually no, that was my first guess too and I turned down the time-out. But what was actually happening was that since it failed to shut down the VMs, the check_guests_shutdown() got called repeatedly, thereby adding more error messages to the list of VMs to shut down and so on. So it actually never timed out because the list of VMs only grew longer.

>I wondered that for me "check_guests_shutdown" is on a different line (353) then.
>That might just be a type or such, but to be sure could you check with verify if the package thinks the file is non default (after>you remove your modification of course):

I'm pretty sure the file is default, it's propably an empty line somewhere from when I started debugging, but I will check on that right away. I have also already tried downloading the newest version from upstream, but you are right in that it remained pretty much the same (and that script also did not work).

>Also the issue only occurs if function guest_is_on fails (so neither detected run, nor not running, but really failing). Eventually that executes:
>$ virsh domname <uuid>
>That should also fail in your case to trigger the issue - is there any obvious reason you'd know why that fails for you? The output of this should also be mixed into the result in your case, so maybe you find it there.

Hmm, initally I thought that was just a very bad way of checking if the VM was still running, but upon closer inspection you are right. But when I manually run something like "virsh domname $uuid" it gives me the domname as output, so it seems to work fine. What might cause trouble here is that these VMs are 'transient', i.e. they do not keep their UUID after shutdown. That would explain why it can't check whether or not the VM has been shut down. I only know this because when I choose "suspend" as value for "ON_SHUTDOWN", it tells me that 'transient VMs can't be suspended".
Maybe I should mention that I run libvirt with opennebula, which basically puts a nice interface to KVM to manage VMs, so the transient thing comes from there.

>Would you mind as being the one who found it to report the issue there and linking the bug or mailing list entry here to help tracking the discussion there?

No problem, I will do that.

>It would be great if you could try this diff on your file and see if it resolves your issues as well.
Sadly the patch did not help, altough it changed the faulty behaviour, yay! Now I get repeated output looking like this:

sudo ./libvirt-guests.sh stop

Running guests on default URI: one-44, one-38
Shutting down guests on default URI...
Starting shutdown on guest: one-44
Starting shutdown on guest: one-38
Waiting for 2 guests to shut down, 120 seconds left
Starting shutdown on guest: one-44
Starting shutdown on guest: one-38
Starting shutdown on guest: one-44
Starting shutdown on guest: one-38
Starting shutdown on guest: one-44
Starting shutdown on guest: one-38
Failed to determine state of guest: 6cffc2fe-c2b3-4f54-b8e0-054f70453294. Not tracking it anymore.
Failed to determine state of guest: 6cffc2fe-c2b3-4f54-b8e0-054f70453294. Not tracking it anymore.
Failed to determine state of guest: 6cffc2fe-c2b3-4f54-b8e0-054f70453294. Not tracking it anymore.
Failed to determine state of guest: 6cffc2fe-c2b3-4f54-b8e0-054f70453294. Not tracking it anymore.
Shutdown of guest complete.
Shutdown of guest complete.
Shutdown of guest complete.
Shutdown of guest complete.
Starting shutdown on guest:
error: failed to get domain '6cffc2fe-c2b3-4f54-b8e0-054f70453294'
error: Domain not found: no domain with matching name '6cffc2fe-c2b3-4f54-b8e0-054f70453294'
Starting shutdown on guest: one-38
Starting shutdown on guest:
error: failed to get domain '6cffc2fe-c2b3-4f54-b8e0-054f70453294'
error: Domain not found: no domain with matching name '6cffc2fe-c2b3-4f54-b8e0-054f70453294'
Starting shutdown on guest: one-38
Starting shutdown on guest:
error: failed to get domain '6cffc2fe-c2b3-4f54-b8e0-054f70453294'
error: Domain not found: no domain with matching name '6cffc2fe-c2b3-4f54-b8e0-054f70453294'
Starting shutdown on guest: one-38
Failed to determine state of guest: 6cffc2fe-c2b3-4f54-b8e0-054f70453294. Not tracking it anymore.
Failed to determine state of guest: 6cffc2fe-c2b3-4f54-b8e0-054f70453294. Not tracking it anymore.
Failed to determine state of guest: 6cffc2fe-c2b3-4f54-b8e0-054f70453294. Not tracking it anymore.
Shutdown of guest complete.
Shutdown of guest complete.
Shutdown of guest complete.

And so on and so on with no sign of stopping. Upon inspection with "set -x" set, it seems that the 2 guests are added again to the list of guests to shut down, so there are sometimes 3 or 4 VMs (basically duplicated UUIDs) in the list.