Nova service restart disconnects Quobyte volumes on systemd systems

Bug #1530860 reported by Silvan Kaiser
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Silvan Kaiser

Bug Description

When running an instance from an image in a Cinder Quobyte volume issues arise when the corresponding Nova service (openstack-nova-compute) is restarted or stopped while the instance is active. systemd sigterms the whole cgroup, this includes the Quobyte client(s) handling the instances mount point(s), which effectively removes the image from under the VM(s).

Possible immediate Mitigation steps:
- Do _NOT_ restart/stop a Nova service that has running instances using images in Cinder Quobyte volumes
- Reconfigure sytemd.kill to use killmode=process or killmode=none instead of killmode=control-group (which is the default).
- Migrate instances off the host prior to restarting/stopping the Nova service.

Silvan Kaiser (2-silvan)
Changed in nova:
assignee: nobody → Silvan Kaiser (2-silvan)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/264752

Changed in nova:
status: New → In Progress
Revision history for this message
Hendrik Frenzel (hfrenzel) wrote :

It looks as not just Quobyte Cinder volumes are affected.
After restarting nova-compute on our compute nodes, all VMs using cinder volumes (GlusterFS) got readonly filesystems.

Revision history for this message
Silvan Kaiser (2-silvan) wrote :

Interesting. This needs more information on the nature of the issues GlusterFS has at that point.
From that we should be able to decide if this is a more general systemd/CGROUP/filesystem related issue requiring a more general approach or if each driver should tackle this individually.

Matt Riedemann (mriedem)
tags: added: libvirt volumes
Revision history for this message
Toni Ylenius (toni-ylenius) wrote :

With GlusterFS volumes the behavior is similar. When nova-compute is restarted the fuse mounts are killed. However, it's a good question should tackle it individually in each driver or have more general solution.

With GlusterFS one can also use gfapi to attach volumes and then this issue doesn't apply, but we have had other issues with gfapi.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/432344

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Silvan Kaiser (<email address hidden>) on branch: master
Review: https://review.openstack.org/264752
Reason: This patch is superseeded by a systemd-run based patch at https://review.openstack.org/#/c/432344/ as proposed.

Revision history for this message
Silvan Kaiser (2-silvan) wrote :

I abandoned the old patchset and added a new one (https://review.openstack.org/432344) that utilizes systemd-run instead of relying on an external mount.
This solutions solves this only for the Quobyte driver as the GlusterFS solutions seems to be different.

Silvan Kaiser (2-silvan)
description: updated
Revision history for this message
Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing the status back to the previous state and unassigning. If there are active reviews related to this bug, please include links in comments.

Changed in nova:
status: In Progress → New
assignee: Silvan Kaiser (2-silvan) → nobody
Revision history for this message
Silvan Kaiser (2-silvan) wrote :

This bug was fixed by the change in https://review.openstack.org/#/c/432344/ . However that fixes commit message contained the wrong bug id (typo) and thus did not post an update in here.
Can this be set/fixed manually in the status of this ticket?

Changed in nova:
status: New → Fix Committed
assignee: nobody → Silvan Kaiser (2-silvan)
Silvan Kaiser (2-silvan)
Changed in nova:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.