Comment 19 for bug 1168526

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

There is at least one fundamental bug in start.c's signal_handler, as should
be fixed by the below. However, this alone did not fix it for me, so more is wrong. There is a minimal testcase at http://people.canonical.com/~serge/signalfd.c which originally reproduced this bug, then was fixed by using the waitpid loop as below.

From 7333b77759794f5420ea6898494073a28cac445f Mon Sep 17 00:00:00 2001
From: Serge Hallyn <email address hidden>
Date: Wed, 3 Jul 2013 14:13:08 -0500
Subject: [PATCH 1/1] start.c:signal_handler: fix wrong assumption about
 sigchld

The monitor process adds a signalfd to the set of fds it watches
with epoll. It calls signal_handler() when a signal is found in
the sigfd. That returns 1 if the container init was found to
be the exiting process.

The flaw in reasoning (pointed out by Andy Whitcroft - thanks!)
here is that if two children have exited, we assume we will get
two sigchilds - in fact we may only get one. So when we get one,
we need to reap all the children we have and check if any is the
container init.

Signed-off-by: Serge Hallyn <email address hidden>
---
 src/lxc/start.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/lxc/start.c b/src/lxc/start.c
index 8c8af9c..3fa50ad 100644
--- a/src/lxc/start.c
+++ b/src/lxc/start.c
@@ -162,7 +162,7 @@ static int signal_handler(int fd, void *data,
       struct lxc_epoll_descr *descr)
 {
  struct signalfd_siginfo siginfo;
- int ret;
+ int ret, status, retval = 0;
  pid_t *pid = data;

  ret = read(fd, &siginfo, sizeof(siginfo));
@@ -188,16 +188,16 @@ static int signal_handler(int fd, void *data,
   return 0;
  }

- /* more robustness, protect ourself from a SIGCHLD sent
- * by a process different from the container init
+ /*
+ * wait for any and all children which are ours which are reapable,
+ * since if >1 children have exited, we'll only get one sigchld.
   */
- if (siginfo.ssi_pid != *pid) {
- WARN("invalid pid for SIGCHLD");
- return 0;
- }
+ while ((ret = waitpid(-1, &status, WNOHANG)) > 0)
+ if (ret == *pid)
+ retval = 1; // we reaped the container init

  DEBUG("container init process exited");
- return 1;
+ return retval;
 }

 int lxc_set_state(const char *name, struct lxc_handler *handler, lxc_state_t state)
--
1.8.1.2