Turns out you can prevent both dbus-daemon and xinetd from actually forking. dbus-daemon needs configuration change (/etc/dbus-1/system.conf) and xinetd needs a -dontfork added to the command line.
This doesn't fix the apparent incapability of upstart to properly trace a child process through a fork.
I added some debug statements to the code the help solve this problem. It appears there is some wrong information out there about how the waitid() syscall works. In many places i see the use of CLD_TRAPPED, however the man page wait(2) says nothing about it. I do see it mentioned in siginfo.h (kernel headers).
So I added a simple nih_warn() at the top of the nih_child_poll() function. This is what i see for my xinetd example:
2008-10-09T21:36:02.187937+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-09T21:36:02.188896+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-09T21:36:02.188931+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-09T21:36:02.188931+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-09T21:36:02.188931+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-09T21:36:02.188931+00:00 fs03 init: xinetd goal changed from stop to start
2008-10-09T21:36:02.188931+00:00 fs03 init: xinetd state changed from waiting to starting
2008-10-09T21:36:02.188931+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-09T21:36:02.188931+00:00 fs03 init: Handling starting event
2008-10-09T21:36:02.189743+00:00 fs03 init: xinetd state changed from starting to pre-start
2008-10-09T21:36:02.189778+00:00 fs03 init: xinetd state changed from pre-start to spawned
2008-10-09T21:36:02.189837+00:00 fs03 init: xinetd main process (19246)
2008-10-09T21:36:02.189837+00:00 fs03 init: waitid - pid=19246, signo=17, code=2, status=5
2008-10-09T21:36:02.189837+00:00 fs03 init: xinetd main process (19246) killed by TRAP signal
2008-10-09T21:36:02.189837+00:00 fs03 init: xinetd main process ended, respawning
2008-10-09T21:36:02.189837+00:00 fs03 init: xinetd state changed from spawned to post-start
2008-10-09T21:36:02.189837+00:00 fs03 init: xinetd state changed from post-start to running
2008-10-09T21:36:02.190748+00:00 fs03 init: waitid - pid=19245, signo=17, code=1, status=0
2008-10-09T21:36:02.190748+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-09T21:36:02.190748+00:00 fs03 init: Handling started event
2008-10-09T21:36:02.191710+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
This indicates that the code is a CLD_KILLED and the status is SIGTRAP. There is CLD_TRAPPED used here. The question I have, then, is this due to the kernel im using (2.6.24)? Is a newer kernel/older kernel behaving differently?
Turns out you can prevent both dbus-daemon and xinetd from actually forking. dbus-daemon needs configuration change (/etc/dbus- 1/system. conf) and xinetd needs a -dontfork added to the command line.
This doesn't fix the apparent incapability of upstart to properly trace a child process through a fork.
I added some debug statements to the code the help solve this problem. It appears there is some wrong information out there about how the waitid() syscall works. In many places i see the use of CLD_TRAPPED, however the man page wait(2) says nothing about it. I do see it mentioned in siginfo.h (kernel headers).
So I added a simple nih_warn() at the top of the nih_child_poll() function. This is what i see for my xinetd example:
2008-10- 09T21:36: 02.187937+ 00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0 09T21:36: 02.188896+ 00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0 09T21:36: 02.188931+ 00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0 09T21:36: 02.188931+ 00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0 09T21:36: 02.188931+ 00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0 09T21:36: 02.188931+ 00:00 fs03 init: xinetd goal changed from stop to start 09T21:36: 02.188931+ 00:00 fs03 init: xinetd state changed from waiting to starting 09T21:36: 02.188931+ 00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0 09T21:36: 02.188931+ 00:00 fs03 init: Handling starting event 09T21:36: 02.189743+ 00:00 fs03 init: xinetd state changed from starting to pre-start 09T21:36: 02.189778+ 00:00 fs03 init: xinetd state changed from pre-start to spawned 09T21:36: 02.189837+ 00:00 fs03 init: xinetd main process (19246) 09T21:36: 02.189837+ 00:00 fs03 init: waitid - pid=19246, signo=17, code=2, status=5 09T21:36: 02.189837+ 00:00 fs03 init: xinetd main process (19246) killed by TRAP signal 09T21:36: 02.189837+ 00:00 fs03 init: xinetd main process ended, respawning 09T21:36: 02.189837+ 00:00 fs03 init: xinetd state changed from spawned to post-start 09T21:36: 02.189837+ 00:00 fs03 init: xinetd state changed from post-start to running 09T21:36: 02.190748+ 00:00 fs03 init: waitid - pid=19245, signo=17, code=1, status=0 09T21:36: 02.190748+ 00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0 09T21:36: 02.190748+ 00:00 fs03 init: Handling started event 09T21:36: 02.191710+ 00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-
2008-10-
2008-10-
2008-10-
2008-10-
2008-10-
2008-10-
2008-10-
2008-10-
2008-10-
2008-10-
2008-10-
2008-10-
2008-10-
2008-10-
2008-10-
2008-10-
2008-10-
2008-10-
2008-10-
This indicates that the code is a CLD_KILLED and the status is SIGTRAP. There is CLD_TRAPPED used here. The question I have, then, is this due to the kernel im using (2.6.24)? Is a newer kernel/older kernel behaving differently?