Galera lxc shutdown results in corrupted db

Bug #1806696 reported by Justin Alford
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Undecided
Justin Alford

Bug Description

After a recent set of controller reboots, our Galera cluster didn't come back up and had to be rebuilt. This appears to be due to the fact that the lxc service is configured to forcibly kill containers after 5 seconds if they aren't down by then (due to the SHUTDOWNDELAY env var being set to 5).

Justin Alford (jlalford)
summary: - Galera lxc shutdown
+ Galera lxc shutdown results in corrupted db
Revision history for this message
Justin Alford (jlalford) wrote :

root@controller-dc1r02n03:~# systemctl show lxc.service |grep ExecStop
ExecStop={ path=/usr/lib/x86_64-linux-gnu/lxc/lxc-containers ; argv[]=/usr/lib/x86_64-linux-gnu/lxc/lxc-containers stop ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }

/usr/lib/x86_64-linux-gnu/lxc/lxc-containers:

SHUTDOWNDELAY=5
STOPOPTS="-a -A -s"
stop)
  if [ -n "$SHUTDOWNDELAY" ]; then
    SHUTDOWNDELAY="-t $SHUTDOWNDELAY"
  fi
  "$bindir"/lxc-autostart $STOPOPTS $SHUTDOWNDELAY
;;

Mohammed Naser (mnaser)
Changed in openstack-ansible:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-lxc_hosts (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/625998

Justin Alford (jlalford)
Changed in openstack-ansible:
assignee: nobody → Justin Alford (jlalford)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/626716

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on openstack-ansible-lxc_hosts (master)

Change abandoned by Justin Alford (<email address hidden>) on branch: master
Review: https://review.openstack.org/626716

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-lxc_hosts (master)

Reviewed: https://review.openstack.org/625998
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-lxc_hosts/commit/?id=d0d9384aee36277e899aa664bc9f64835c33d6c4
Submitter: Zuul
Branch: master

commit d0d9384aee36277e899aa664bc9f64835c33d6c4
Author: Justin Alford <email address hidden>
Date: Tue Dec 18 13:04:57 2018 -0700

    Increase LXC container default shutdown delay

    Increase container shutdown delay before force-killing to avoid db
    corruption after controller reboots
    Parameterize SHUTDOWNDELAY envvar as lxc_container_shutdown_delay
    with default value 60 seconds
    Rename lxc.default.j2 template to lxc-net.default.j2 to align with
    destination config file name lxc-net
    Add new lxc.default.j2 template to use the lxc_container_shutdown_delay
    variable and allow user-defined value

    Related-Bug: 1806696

    Change-Id: I1d3b7990e462140fdb402883f8d25422eafca66b

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-lxc_hosts (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.openstack.org/633767

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-lxc_hosts (stable/rocky)

Reviewed: https://review.openstack.org/633767
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-lxc_hosts/commit/?id=3138838ff8f169ae0cf413c459785d5f20306f33
Submitter: Zuul
Branch: stable/rocky

commit 3138838ff8f169ae0cf413c459785d5f20306f33
Author: Justin Alford <email address hidden>
Date: Tue Dec 18 13:04:57 2018 -0700

    Increase LXC container default shutdown delay

    Increase container shutdown delay before force-killing to avoid db
    corruption after controller reboots
    Parameterize SHUTDOWNDELAY envvar as lxc_container_shutdown_delay
    with default value 60 seconds
    Rename lxc.default.j2 template to lxc-net.default.j2 to align with
    destination config file name lxc-net
    Add new lxc.default.j2 template to use the lxc_container_shutdown_delay
    variable and allow user-defined value

    Related-Bug: 1806696

    Change-Id: I1d3b7990e462140fdb402883f8d25422eafca66b
    (cherry picked from commit d0d9384aee36277e899aa664bc9f64835c33d6c4)

tags: added: in-stable-rocky
Changed in openstack-ansible:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.