destroy-environment fails to clear lxc containers

Bug #1307215 reported by Evan
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
juju-core
Invalid
High
Unassigned
lxc (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Running destroy-environment with --force happily cleans up evan-local-machine-1, but fails to remove 2 or 3. I think this is what led to bug 1303778 for me.

  % juju destroy-environment local --force
WARNING! this command will destroy the "local" environment (type: local)
This includes all machines, services, data and other resources.

Continue [y/N]? y
[sudo] password for evan:
ERROR failed to destroy lxc container: error executing "lxc-destroy": lxc_container: Error destroying rootfs for evan-local-machine-2; Destroying evan-local-machine-2 failed
ERROR error executing "lxc-destroy": lxc_container: Error destroying rootfs for evan-local-machine-2; Destroying evan-local-machine-2 failed
ERROR exit status 1

 % sudo lxc-ls -f
NAME STATE IPV4 IPV6 AUTOSTART
-----------------------------------------------------
evan-local-machine-2 STOPPED - - YES
evan-local-machine-3 STOPPED - - YES
juju-precise-template STOPPED - - NO

  % juju destroy-environment local --force
WARNING! this command will destroy the "local" environment (type: local)
This includes all machines, services, data and other resources.

Continue [y/N]? y
ERROR failed to destroy lxc container: error executing "lxc-destroy": lxc_container: Error destroying rootfs for evan-local-machine-3; Destroying evan-local-machine-3 failed
ERROR error executing "lxc-destroy": lxc_container: Error destroying rootfs for evan-local-machine-3; Destroying evan-local-machine-3 failed
ERROR exit status 1

 % sudo lxc-ls -f
NAME STATE IPV4 IPV6 AUTOSTART
-----------------------------------------------------
evan-local-machine-2 STOPPED - - YES
evan-local-machine-3 STOPPED - - YES
juju-precise-template STOPPED - - NO

  % juju destroy-environment local --force
WARNING! this command will destroy the "local" environment (type: local)
This includes all machines, services, data and other resources.

Continue [y/N]? y

 % sudo lxc-ls -f
NAME STATE IPV4 IPV6 AUTOSTART
-----------------------------------------------------
evan-local-machine-2 STOPPED - - YES
evan-local-machine-3 STOPPED - - YES
juju-precise-template STOPPED - - NO

Revision history for this message
Tim Penhey (thumper) wrote :

Hi Evan, Can I get you to please run and pastebin the following?

juju destroy-environment local --logging-config=golxc=TRACE;juju=DEBUG --show-log

This will give us the output of the lxc command and why it is failing.

Curtis Hovey (sinzui)
tags: added: destroy-environment local-provider lxc
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 1.20.0
Revision history for this message
Evan (ev) wrote :

I've been unable to reproduce this thus far, but I'll keep at it.

Revision history for this message
John A Meinel (jameinel) wrote :

Until we can reproduce this, I don't think we can address it.

Changed in juju-core:
status: Triaged → Incomplete
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.20.0 → none
Curtis Hovey (sinzui)
Changed in juju-core:
importance: High → Medium
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju-core because there has been no activity for 60 days.]

Changed in juju-core:
status: Incomplete → Expired
Revision history for this message
Evan (ev) wrote :
Download full text (33.6 KiB)

I've had this happen again. It looks like it lxc cannot remove the rootfs subvolume because it references other subvolumes:
http://lxr.free-electrons.com/source/fs/btrfs/ioctl.c#L1894

(.venv-ubuntu)vagrant@vagrant-ubuntu-trusty-64:/ev/bzr/uci-engine/ceph$ sudo strace -e file lxc-destroy --force --logpriority=DEBUG --name vagrant-local-machine-1
execve("/usr/bin/lxc-destroy", ["lxc-destroy", "--force", "--logpriority=DEBUG", "--name", "vagrant-local-machine-1"], [/* 16 vars */]) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/liblxc.so.1", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libcap.so.2", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/libapparmor.so.1", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/libseccomp.so.2", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libcgmanager.so.0", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libnih.so.1", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libnih-dbus.so.1", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libdbus-1.so.3", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libutil.so.1", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpcre.so.3", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
statfs("/sys/fs/selinux", 0x7fffe99765f0) = -1 ENOENT (No such file or directory)
statfs("/selinux", 0x7fffe99765f0) = -1 ENOENT (No such file or directory)
open("/proc/filesystems", O_RDONLY) = 3
open("/proc/cgroups", O_RDONLY|O_CLOEXEC) = 3
stat("/sys/kernel/se...

Changed in juju-core:
status: Expired → New
Revision history for this message
Evan (ev) wrote :

Yup, that's definitely it. Deleting the subvolumes under /var/lib/lxc/vagrant-local-machine-*/rootfs/srv/disk/{current,snap_} freed up /var/lib/lxc/vagrant-local-machine-* so that lxc-destroy worked.

Revision history for this message
Evan (ev) wrote :

To further clarify, this isn't a juju bug. lxc should be smart enough to delete dependent subvolumes before it deletes the rootfs subvolume. I took a stab at this over the weekend, but ran out of time. btrfs_destroy(struct bdev *orig) is what you're after.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

To be sure I understand, the issue is that inside your container you created btrfs subvolumes (by using btrfs containers or manually using btrfs subvolume create)?

Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Triaged
importance: Medium → High
importance: High → Medium
Revision history for this message
Evan (ev) wrote :

Correct. Attached is a juju-deployer config that reproduces the issue when used with the juju local provider and /var/lib/lxc in the host backed onto btrfs. If you deploy it and run juju destroy-environment --force -y local, it should fail.

This is because Ceph sees that /srv/ceph is on btrfs and makes use of it.

ubuntu@vagrant-local-machine-1:~$ mount
/dev/sda2 on / type btrfs (rw)

So you'll end up with lots of snapshots created by Ceph inside LXC. These will be visible by running `btrfs subvolume list /var/lib/lxc` in the host:

(.venv-ubuntu)vagrant@vagrant-ubuntu-trusty-64:~$ sudo btrfs subvolume list /var/lib/lxc
ID 256 gen 9268 top level 5 path @lxc
ID 257 gen 8126 top level 256 path juju-precise-template/rootfs
ID 285 gen 8006 top level 256 path juju-trusty-template/rootfs
ID 762 gen 9208 top level 256 path vagrant-local-machine-25/rootfs
ID 763 gen 9208 top level 256 path vagrant-local-machine-26/rootfs
ID 764 gen 9208 top level 256 path vagrant-local-machine-27/rootfs
ID 925 gen 9208 top level 256 path vagrant-local-machine-27/rootfs/srv/ceph/current
ID 934 gen 9208 top level 256 path vagrant-local-machine-26/rootfs/srv/ceph/current
ID 944 gen 9208 top level 256 path vagrant-local-machine-25/rootfs/srv/ceph/current
ID 951 gen 9208 top level 256 path vagrant-local-machine-26/rootfs/srv/ceph/snap_4426
ID 954 gen 9208 top level 256 path vagrant-local-machine-26/rootfs/srv/ceph/snap_4483
ID 957 gen 9208 top level 256 path vagrant-local-machine-25/rootfs/srv/ceph/snap_4767
ID 958 gen 9208 top level 256 path vagrant-local-machine-27/rootfs/srv/ceph/snap_5753
ID 959 gen 9208 top level 256 path vagrant-local-machine-25/rootfs/srv/ceph/snap_4845
ID 960 gen 9208 top level 256 path vagrant-local-machine-27/rootfs/srv/ceph/snap_5754

Changed in lxc (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Curtis Hovey (sinzui)
tags: added: ubuntu-engineering
Changed in juju-core:
importance: Medium → High
milestone: none → next-stable
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

The fix for this is applied in lxc's git HEAD.

Changed in lxc (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lxc - 1.1.0~alpha2-0ubuntu2

---------------
lxc (1.1.0~alpha2-0ubuntu2) utopic; urgency=medium

  * Cherry-pick usptream bugfix for lxc-usernic test.
 -- Stephane Graber <email address hidden> Thu, 02 Oct 2014 15:01:56 -0400

Changed in lxc (Ubuntu):
status: Fix Committed → Fix Released
Curtis Hovey (sinzui)
Changed in juju-core:
status: Triaged → Invalid
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: next-stable → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.