strace leaves process SIGSTOPped after detaching

Bug #103133 reported by Andrew Bennetts
2
Affects Status Importance Assigned to Milestone
strace (Debian)
New
Unknown
strace (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Binary package hint: strace

The attached python script starts a thread, attaches strace -f to itself, and sends SIGINT to strace to detach it. The process is then left stopped:

$ python sigstop-demo.py

[1]+ Stopped python sigstop-demo.py
$ fg
python sigstop-demo.py
done
$

If the thread is not started, the process isn't stopped by detaching strace. Similarly, if "-f" is not passed to strace, this does not occur.

This was discovered by a debug hook in bzr. There are reports this problem doesn't happen in dapper.

Revision history for this message
Andrew Bennetts (spiv) wrote :
Revision history for this message
Micah Cowan (micahcowan) wrote :

This doesn't seem to be a bug. From strace(1), -f option:

                   If the parent process decides to wait(2) for a child
                   that is currently being traced, it is suspended
                   until an appropriate child process either terminates
                   or incurs a signal that would cause it to terminate
                   (as determined from the child’s current signal dis‐
                   position).

of course, if that child happens to be strace itself, the "until" bit probably ceases to apply.

I'm guessing the call to proc.communicate() involves wait()ing.

(Closing bug for now; if further information arises that suggests that this may still be a bug, please feel free to reopen.)

Changed in strace:
status: Unconfirmed → Rejected
Revision history for this message
Andrew Bennetts (spiv) wrote :

You are correct that proc.communicate() invokes wait(), or rather waitpid(), but "proc" here is the strace process itself, which is not being traced, so should be irrelevant.

Perhaps I'm missing something, but I don't think that part of the man page applies to this case: there's no *child* being traced that is being wait()ed for. All there is is a thread, and the parent never wait()s or joins it while strace is running, so the part of the man page you quote doesn't seem to be relevant.

Regardless, the current behaviour seems to make it impossible for a program to reliably call strace -f on itself, which I think is a desirable thing to support. For debugging purposes, being able to wrap a function call with something that starts & stops strace would be quite handy.

I hope you don't mind me putting the status back to Unconfirmed for the moment.

Changed in strace:
status: Rejected → Unconfirmed
Revision history for this message
Micah Cowan (micahcowan) wrote :

wait() doesn't have meaning for separate threads; only for processes. Additionally, it is not possible to run strace, except as a separate (child) process. And, while "proc" does /represent/ an interface to the child, it's still the parent that is doing the wait()ing, upon its child (which is the strace process). It seems to me that the man page is highly relevant; but I'm happy to wait for a second opinion.

As to reliably tracing oneself, you might try a method that would involve killing the strace process (SIGTERM, probably), instead of waiting for it to complete (which will cause strace to SIGSTOP the parent).

Revision history for this message
Andrew Bennetts (spiv) wrote :

The man page talks about wait()ing for a *traced* child. That's not the case here. The parent is wait()ing for strace itself, which is definitely not a traced child.

In fact, if I comment out the "proc.communicate()" line, so no wait() happens, the bug still occurs. So wait() is definitely a red herring. (Sorry for not realising this sooner so that I could have given you a more minimal example!)

If you change the demo script to send SIGTERM to strace, rather than SIGINT, then problem still exists.

I used SIGINT because in the man page it says of the "-p" option:

                   begin tracing. The trace may be terminated at any
                   time by a keyboard interrupt signal (CTRL-C).
                   strace will respond by detaching itself from the
                   traced process(es) leaving it (them) to continue
                   running.

I expect that SIGTERM is treated identically, but that isn't documented, whereas an interrupt signal is. All I really want is the behaviour that is promised by that part of the man page: "strace will respond by detaching itself from the traced process(es) leaving it (them) to continue running."

Micah Cowan (micahcowan)
Changed in strace:
importance: Undecided → Medium
status: Unconfirmed → Confirmed
Changed in strace:
status: Unknown → Unconfirmed
Revision history for this message
Andrew Bennetts (spiv) wrote :

This appears to be fixed in Ubuntu 10.10 (and perhaps earlier). The script completes normally and immediately for me now:

$ python sigstop-demo.py
done

Changed in strace (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.