lxcfs does not properly enforce directory escapes

Bug #1508481 reported by Serge Hallyn
260
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxcfs (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

lxcfs, like cgmanager, is meant to enforce that a task under cgroup /a/b/c cannot query or update cgroups which are not /a/b/c or its descendents.

Since lxcfs is a filesystem, it makes an exception so that 'ls /a' (really 'ls /var/lib/lxcfs/cgroup/freezer/a') return a single dentry, for 'b'.

This enforcement is not complete. So if you are logged into 5:freezer:/user/serge/1, you can do

0 ✓ serge@sl ~ $ sudo mkdir /var/lib/lxcfs/cgroup/freezer/xx
0 ✓ serge@sl ~ $ ls /sys/fs/cgroup/freezer/xx
cgroup.clone_children cgroup.procs freezer.parent_freezing freezer.self_freezing freezer.state notify_on_release tasks

DAC permission still apply, locking unprivileged containers.

CVE References

Revision history for this message
Marc Deslauriers (mdeslaur) wrote :

This is CVE-2015-1342

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Here is a preliminary debdiff. I'd like to add some testcases, but this seems to do the trick.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

No, that debdiff doesn't quite suffice. A new one, with testcase ( which caught the remaining bug ) coming later.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

New debdiff, with testcase.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :
Revision history for this message
Seth Arnold (seth-arnold) wrote :

I'm having trouble reviewing the patch: I'm not spotting the pattern of
which checks should be used when. I'll summarize what I'm seeing here, as
much for me in the future as anything else:

cg_getattr
- is_child_cgroup
- caller_is_in_ancestor
- fc_may_access
- caller_is_in_ancestor
- fc_may_access

cg_opendir
- caller_may_see_dir
- fc_may_access

cg_readdir
- caller_is_in_ancestor

cg_open
- caller_may_see_dir
- fc_may_access

cg_read
- fc_may_access

cg_write
- fc_may_access

cg_chown
- is_child_cgroup
- is_privileged_over

cg_chmod
- is_child_cgroup
- is_privileged_over

cg_mkdir
- caller_is_in_ancestor
- fc_may_access
- caller_is_in_ancestor

Incidentally, what prevents the controller or cgroup here from containing
shell metacharacters?

cg_rmdir
- caller_is_in_ancestor
- fc_may_access
- caller_is_in_ancestor

Does this summary look correct? Did I summarize it correctly? Does it
accurately express what needs to be done?

Thanks

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Let me start by giving the overall goal:

To a task in freezer cgroup /a/b/c/d, it should appear that there are no cgroups other than its descendents. Since this is a filesystem, we must have the parent directories, but each parent cgroup should only contain the child which the task can see.

So, when this task looks at /a/b, it should see only directory 'c' and no files. Attempt to create /a/b/x should result in -EPERM, whether /a/b/x already exists or not. Attempts to query /a/b/x should result in -ENOENT whether /a/b/x exists or not. Opening /a/b/tasks should result in -ENOENT.

The caller_may_see_dir checks specifically whether a task may see a cgroup directory - i.e. /a/b/x if opening /a/b/x/tasks, and /a/b/c/d if doing opendir('/a/b/c/d').

caller_is_in_ancestor() will return true if the caller in /a/b/c/d looks at /a/b/c/d/e. If the caller is in a child cgroup of the queried one - i.e. if the task in /a/b/c/d queries /a/b, then *nextcg will container the next (the only) directory which he can see in the path - 'c'.

Beyond this, regular DAC permissions should apply, with the root-in-user-namespace privilege over its mapped uids being respected. The fc_may_access check does this check for both directories and files.

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Thanks for the explanation; it's still quite difficult to tell that there's no way to leak information, but it does make sense.

Thanks

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

By 'leak information' do you mean allowing to see directories which are not descendents?

(I can try to write a more convincing proof)

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Perhaps it would make the code a lot clearer to separate out the caller_is_in_ancestor from the "get_next_visible_parent_cgroup" function.

To rephrase dome of comment #9,

Assume again task is in /a/b. If it does:

opendir(/a) it should see only b
opendir(/c) it should get ENOENT
opendir(/a/b/c) it should see full contents
stat/open/chown/chmod(/a/tasks) should get ENOENT
stat/open/chown/chmod(/a/b/tasks) should act on the tasks file

mkdir(/c) should get EPERM
rmdir(/c) should get ENOENT
mkdir(/a) should get EEXIST
rmdir(/a) should get EBUSY

Revision history for this message
Seth Arnold (seth-arnold) wrote :

It's just that there are dozens of calls to work with:

open
stat
access
bind
truncate
chdir
rename
mkdir
rmdir
creat
link
unlink
symlink
readlink
chmod
chown
lchown
mknod
chroot
setxattr
lsetxattr
getxattr
lgetxattr
listxattr
llistxattr
removexattr
lremovexattr
utimes

Then there's the consequences of an fd to one of these cgroup files or directories being passed to another process and then the f* variants of the above functions getting used. Or the *at variants.

That's what I'm worried about; the patches you've attached here look good, and close a hole you identified, but there's a lot of syscalls that include a lot of error returns.

Thanks

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1508481] Re: lxcfs does not properly enforce directory escapes

Quoting Seth Arnold (<email address hidden>):
> It's just that there are dozens of calls to work with:

Good point.

Many of those are not implemented. Anything not implemented in lxcfs.c
should return "Function not implemented". For instance truncate
does so, and access returns -ENOENT.

I'll work on a testcase for the below list.

> open
> stat
> access
> bind
> truncate
> chdir
> rename
> mkdir
> rmdir
> creat
> link
> unlink
> symlink
> readlink
> chmod
> chown
> lchown
> mknod
> chroot
> setxattr
> lsetxattr
> getxattr
> lgetxattr
> listxattr
> llistxattr
> removexattr
> lremovexattr
> utimes
>
> Then there's the consequences of an fd to one of these cgroup files or
> directories being passed to another process and then the f* variants of
> the above functions getting used. Or the *at variants.
>
> That's what I'm worried about; the patches you've attached here look
> good, and close a hole you identified, but there's a lot of syscalls
> that include a lot of error returns.
>
> Thanks
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1508481
>
> Title:
> lxcfs does not properly enforce directory escapes
>
> Status in lxcfs package in Ubuntu:
> New
>
> Bug description:
> lxcfs, like cgmanager, is meant to enforce that a task under cgroup
> /a/b/c cannot query or update cgroups which are not /a/b/c or its
> descendents.
>
> Since lxcfs is a filesystem, it makes an exception so that 'ls /a'
> (really 'ls /var/lib/lxcfs/cgroup/freezer/a') return a single dentry,
> for 'b'.
>
> This enforcement is not complete. So if you are logged into
> 5:freezer:/user/serge/1, you can do
>
> 0 ✓ serge@sl ~ $ sudo mkdir /var/lib/lxcfs/cgroup/freezer/xx
> 0 ✓ serge@sl ~ $ ls /sys/fs/cgroup/freezer/xx
> cgroup.clone_children cgroup.procs freezer.parent_freezing freezer.self_freezing freezer.state notify_on_release tasks
>
> DAC permission still apply, locking unprivileged containers.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/lxcfs/+bug/1508481/+subscriptions

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

By bind, do you mean bind mount?

Revision history for this message
Seth Arnold (seth-arnold) wrote :

I had been thinking of bind(2) with a unix domain socket, and the name bound into the filesystem.

Trying to use the directories or files with mounts is a great idea and also worth testing.

Thanks

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

New patch (not debdiff) for w which adds a testcase for most of the listed syscalls and fixes one bug which was found.

(mkdir of a top-level cgroup directory returned -EEXIST instead of -ENOENT)

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

New w debdiff. I will post for other releases once this one is deemed ok.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :
Revision history for this message
Seth Arnold (seth-arnold) wrote :

Those are awesome tests, good work. I think link and symlink ought to test the passed pathnames in both source and destination parameters, but it's otherwise awesome. :)

Thanks

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :
Revision history for this message
Seth Arnold (seth-arnold) wrote :

Great! Thanks Serge. Looks beautiful.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

These next 3 deddiffs have been testedwith the new test_confinement testcase as well as all lxc tests shipped with the lxc-tests package.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :
Revision history for this message
Marc Deslauriers (mdeslaur) wrote :

CRD for this issue will be 2015-11-17 18:00:00 UTC

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lxcfs - 0.10-0ubuntu2.1

---------------
lxcfs (0.10-0ubuntu2.1) wily-security; urgency=medium

  * SECURITY UPDATE: does not properly enforce directory escapes
    (LP: #1508481)
    - debian/patches/0002-fix-checking-of-parent-dirs.patch: Ensure that a
      task under cgroup /a/b cannot mkdir, rmdir, or modify files under,
      directories not under /a/b. Add a testcase for this.
    - CVE-2015-1342
  * SECURITY UPDATE: lack of privilege checking in do_write_pids
    (LP: #1512854)
    - debian/patches/0002-Fix-movepid-cve.patch: Fix missing privilege
      check when moving pids to a new cgroup.
    - CVE-2015-1344

 -- Marc Deslauriers <email address hidden> Wed, 11 Nov 2015 07:19:02 -0500

Changed in lxcfs (Ubuntu):
status: New → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lxcfs - 0.7-0ubuntu4.1

---------------
lxcfs (0.7-0ubuntu4.1) vivid-security; urgency=medium

  * SECURITY UPDATE: does not properly enforce directory escapes
    (LP: #1508481)
    - debian/patches/0005-fix-checking-of-parent-dirs.patch: Ensure that a
      task under cgroup /a/b cannot mkdir, rmdir, or modify files under,
      directories not under /a/b. Add a testcase for this.
    - CVE-2015-1342
  * SECURITY UPDATE: lack of privilege checking in do_write_pids
    (LP: #1512854)
    - debian/patches/0005-Fix-movepid-cve.patch: Fix missing privilege
      check when moving pids to a new cgroup.
    - CVE-2015-1344

 -- Marc Deslauriers <email address hidden> Wed, 11 Nov 2015 07:19:02 -0500

Changed in lxcfs (Ubuntu):
status: New → Fix Released
information type: Private Security → Public Security
To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.