Ubuntu
strace package

strace leaves process SIGSTOPped after detaching

Bug #103133 reported by Andrew Bennetts on 2007-04-05

Affects		Status	Importance	Assigned to	Milestone
	strace (Debian)	New	Unknown	debbugs #424706
	strace (Ubuntu)	Fix Released	Medium	Unassigned

Bug Description

Binary package hint: strace

The attached python script starts a thread, attaches strace -f to itself, and sends SIGINT to strace to detach it. The process is then left stopped:

$ python sigstop-demo.py

[1]+ Stopped python sigstop-demo.py
$ fg
python sigstop-demo.py
done
$

If the thread is not started, the process isn't stopped by detaching strace. Similarly, if "-f" is not passed to strace, this does not occur.

This was discovered by a debug hook in bzr. There are reports this problem doesn't happen in dapper.

Revision history for this message

Andrew Bennetts (spiv) wrote on 2007-04-05:

sigstop-demo.py Edit (693 bytes, text/plain)

Revision history for this message

Micah Cowan (micahcowan) wrote on 2007-04-09:

This doesn't seem to be a bug. From strace(1), -f option:

                   If the parent process decides to wait(2) for a child
                   that is currently being traced, it is suspended
                   until an appropriate child process either terminates
                   or incurs a signal that would cause it to terminate
                   (as determined from the child’s current signal dis‐
                   position).

of course, if that child happens to be strace itself, the "until" bit probably ceases to apply.

I'm guessing the call to proc.communicate() involves wait()ing.

(Closing bug for now; if further information arises that suggests that this may still be a bug, please feel free to reopen.)

Changed in strace:
status:	Unconfirmed → Rejected

Revision history for this message

Andrew Bennetts (spiv) wrote on 2007-04-15:

You are correct that proc.communicate() invokes wait(), or rather waitpid(), but "proc" here is the strace process itself, which is not being traced, so should be irrelevant.

Perhaps I'm missing something, but I don't think that part of the man page applies to this case: there's no *child* being traced that is being wait()ed for. All there is is a thread, and the parent never wait()s or joins it while strace is running, so the part of the man page you quote doesn't seem to be relevant.

Regardless, the current behaviour seems to make it impossible for a program to reliably call strace -f on itself, which I think is a desirable thing to support. For debugging purposes, being able to wrap a function call with something that starts & stops strace would be quite handy.

I hope you don't mind me putting the status back to Unconfirmed for the moment.

Changed in strace:
status:	Rejected → Unconfirmed

Revision history for this message

Micah Cowan (micahcowan) wrote on 2007-04-16:

wait() doesn't have meaning for separate threads; only for processes. Additionally, it is not possible to run strace, except as a separate (child) process. And, while "proc" does /represent/ an interface to the child, it's still the parent that is doing the wait()ing, upon its child (which is the strace process). It seems to me that the man page is highly relevant; but I'm happy to wait for a second opinion.

As to reliably tracing oneself, you might try a method that would involve killing the strace process (SIGTERM, probably), instead of waiting for it to complete (which will cause strace to SIGSTOP the parent).

Revision history for this message

Andrew Bennetts (spiv) wrote on 2007-04-16:

The man page talks about wait()ing for a *traced* child. That's not the case here. The parent is wait()ing for strace itself, which is definitely not a traced child.

In fact, if I comment out the "proc.communicate()" line, so no wait() happens, the bug still occurs. So wait() is definitely a red herring. (Sorry for not realising this sooner so that I could have given you a more minimal example!)

If you change the demo script to send SIGTERM to strace, rather than SIGINT, then problem still exists.

I used SIGINT because in the man page it says of the "-p" option:

                   begin tracing. The trace may be terminated at any
                   time by a keyboard interrupt signal (CTRL-C).
                   strace will respond by detaching itself from the
                   traced process(es) leaving it (them) to continue
                   running.

I expect that SIGTERM is treated identically, but that isn't documented, whereas an interrupt signal is. All I really want is the behaviour that is promised by that part of the man page: "strace will respond by detaching itself from the traced process(es) leaving it (them) to continue running."

Micah Cowan (micahcowan) on 2007-05-18