Bug #57731 “Futex hang when exiting using the window close butto...” : Bugs : xchat-gnome package : Ubuntu

Revision history for this message

Guillaume Desmottes (cassidy) wrote on 2006-08-29:

#1

I suppose you don't know how to reproduce this bug?

David, any idea about this issue?

Revision history for this message

David Trowbridge (trowbrds) wrote on 2006-08-29:

#2

The futex thing is definitely a side effect of something, and not the cause. Other than that, I don't know.

Revision history for this message

Matt Zimmerman (mdz) wrote on 2006-09-06: Re: [Bug 57731] Re: Futex hang when exiting using the window close button

#3

On Tue, Aug 29, 2006 at 09:24:59AM -0000, Guillaume Desmottes wrote:
> I suppose you don't know how to reproduce this bug?

I tried closing xchat in the same manner a second time and was unable to
reproduce the crash, unfortunately.

--
- mdz

Revision history for this message

Rogers Veber (labelsarl) wrote on 2007-03-02:

#4

Unfreezer of sys_futex(2) blocked programs. Edit (6.2 KiB, text/plain)

Hi,
I am experiencing same trouble on a self made program.
I made a tool used to unfreeze process (and watch periodically) : survfutex
I send you the source in attachment. You can try it on the frozen process to check that it works (normaly it would). It is using the ptrace(2) syscall to attach the frozen proc (use -p pid as argument), to get the syscall number (check it is sys_futex), to get the address of WAIT, and poke a 0 at it. If you add -s millisec option survfutex will check periodically.
That is for the reparing ... Now, there are informations I collected that could help preventing.
On my program (the one that sometime freezes), the trouble occurs within a signal handler of SIGCLD (death of child process). In this handler, I call a wait(2), log the death using fprintf(3) and then free(3) a struct (element of a list of running son).
I have two technics to prevent the freeze :
First one:
  - I block the SIGCLD (using sigprocmask(2)),
  - periodically check if there is a pending SIGCLD,
  - if so manage the death ouside a signal handler and remove the pending condition.
Second one:
- the signal handler call the wait(2), then move the struct of son from a list of alive sons
  to a list of dead sons.
- Outside of the signal handler, manage the death of the son (fprintf and free).
The survfutex can be compiled with cc survfutex.c -o survfutex
One must use the -u option to unblock a frozen process.
Please, let me know if it work with your trouble and feel free to communicate the survfutex.c and informations to anyone that encounter the same trouble.

Cheers
-Rogers

Revision history for this message

will_in_wi (will-in-wi) wrote on 2008-04-05:

#5

I know almost nothing about this, but I am seeing the same error in xmoto. I try to quit and it hangs until I kill it with -9. Strace reveals a futex hang. 100% reproducible. Hardy Heron with fglrx.

Changed in xchat-gnome:
status:	New → Confirmed

Revision history for this message

Rogers Veber (labelsarl) wrote on 2008-04-09: Re: [Bug 57731] Re: Futex hang when exiting using the window close button

#6

Hi,

When you try to quit, do you use an Interrupt key sequence (i.e.
Ctrl-C) ?

-Rogers

> I know almost nothing about this, but I am seeing the same error in
> xmoto. I try to quit and it hangs until I kill it with -9. Strace
> reveals a futex hang. 100% reproducible. Hardy Heron with fglrx.
>
> ** Changed in: xchat-gnome (Ubuntu)
> Status: New => Confirmed
>

Sebastien Bacher (seb128) on 2008-10-29

Changed in xchat-gnome:
assignee:	nobody → desktop-bugs
importance:	Undecided → Low

Revision history for this message

Wayne Salmiaker (hannessteltzer) wrote on 2010-11-05:

#7

Hi Rogers,
Thanks for your interesting C-code!
Occasionally I am experiencing the same problem.
The program freezes due to a FUTEX_WAIT call (detected by the use of strace), directly after the arrival of SIGCHLD.
After reading some documents on futexes I believe to know, that the reason for the deadlock is a missing FUTEX_WAKE call by the kernel to wake up the suspended processes/threads again.
In your program your are writing a zero to the futex value.
Why is this working? Did you refer to a specific document?

Cheers,
Wayne

Revision history for this message

Rogers Veber (labelsarl) wrote on 2010-11-05: Re: [Bug 57731] Re: Futex hang when exiting using the window close button

#8

Hi Wayne,

A - There are some "strange" behaviours with signal on Linux since the
advent of posix threads. I mean "strange" for programer that is used
with the simple fork()/wait()/exit().
     What I just concluded with all of this, is :
     if one choose to use this then he has to avoid using some library
function within the handlers (such as fprintf, ...), and has to do as
less as possible within those handlers.
     See
http://www.gnu.org/s/libc/manual/html_node/Nonreentrancy.html#Nonreentrancy

B - If you are the maintainer of the program you use, you may have to
manage differently the death of son processes.
     1) If you don't mind the exit status of the son nor mind to know
when they dies just use SIG_IGN as handler.
         There should be no zomby process at all while the process must
be removed from list by the kernel if the father is not interested with
SIGCLD (SIGCHLD).
    2) If you mind, then it is a bit more complicated. You should
maintain two lists of sons, one for the living ones and the second for
the dead ones.
        When you receive the signal, in the handler, just get the status
by wait() syscall retrieve the son from the "Living" list, remove it and
add it to the "Dead" list.
        Then, from the outside of the handler, you may use any function
(fprintf and nonreentrant functions). Of course you should periodically
take a look in the "Dead" list.

C - In my program I push a 0.
I used no specific document, I was just expecting that the futex
was working similarly as a traditional semaphore (see semop(2)).

D - For the FUTEX_WAKE.
I am not quite sure but, I suppose the FUTEX_WAKE should be done by
the user process NOT by the kernel.
And may be, it has really been done but at some point of the code
that should not have been interrupted by a signal.

Just suppose your program is calling fprintf(stderr, ....) and
waiting for some event inside this function(buffer, fflush, ...) and a
signal occurs (ie SIGCLD).
The signal handler uses fprintf(stderr, ...) too. It could easily
lead to data corruption.

What was doing your program at the time of the signal delivery ?

I hope this will help you.

Cheers.

-Rogers

> Hi Rogers,
> Thanks for your interesting C-code!
> Occasionally I am experiencing the same problem.
> The program freezes due to a FUTEX_WAIT call (detected by the use of strace), directly after the arrival of SIGCHLD.
> After reading some documents on futexes I believe to know, that the reason for the deadlock is a missing FUTEX_WAKE call by the kernel to wake up the suspended processes/threads again.
> In your program your are writing a zero to the futex value.
> Why is this working? Did you refer to a specific document?
>
> Cheers,
> Wayne
>

Hi Wayne,

A - There are some "strange" behaviours with signal on Linux since the
advent of posix threads. I mean "strange" for programer that is used
with the simple  fork()/wait()/exit().
     What I just concluded with all of this, is :
     if one choose to use this then he has to avoid using some library
function within the handlers (such as fprintf, ...), and has to do as
less as possible within those handlers.
     See
http://www.gnu.org/s/libc/manual/html_node/Nonreentrancy.html#Nonreentrancy

B - If you are the maintainer of the program you use, you may have to
manage differently the death of son processes.
     1) If you don't mind the exit status of the son nor mind to know
when they dies just use SIG_IGN as handler.
         There should be no zomby process at all while the process must
be removed from list by the kernel if the father is not interested with
SIGCLD (SIGCHLD).
    2) If you mind, then it is a bit more complicated. You should
maintain two lists of sons, one for the living ones and the second for
the dead ones.
        When you receive the signal, in the handler, just get the status
by wait() syscall retrieve the son from the "Living" list, remove it and
add it to the "Dead" list.
        Then, from the outside of the handler, you may use any function
(fprintf and nonreentrant functions). Of course you should periodically
take a look in the "Dead" list.

C - In my program I push a 0.
     I used no specific document, I was just expecting that the futex
was working similarly as a traditional semaphore (see semop(2)).

D - For the FUTEX_WAKE.
     I am not quite sure but, I suppose the FUTEX_WAKE should be done by
the user process NOT by the kernel.
     And may be, it has really been done but at some point of the code
that should not have been interrupted by a signal.

Just suppose your program is calling fprintf(stderr, ....) and
waiting for some event inside this function(buffer, fflush, ...)  and a
signal occurs (ie SIGCLD).
     The signal handler uses fprintf(stderr, ...) too. It could easily
lead to data corruption.

What was doing your program at the time of the signal delivery ?

I hope this will help you.

Cheers.

-Rogers

> Hi Rogers,
> Thanks for your interesting C-code!
> Occasionally I am experiencing the same problem.
> The program freezes due to a FUTEX_WAIT call (detected by the use of strace), directly after the arrival of SIGCHLD.
> After reading some documents on futexes I believe to know, that the reason for the deadlock is a missing FUTEX_WAKE call by the kernel to wake up the suspended processes/threads again.
> In your program your are writing a zero to the futex value.
> Why is this working? Did you refer to a specific document?
> 
> Cheers,
> Wayne
>

Revision history for this message

Wayne Salmiaker (hannessteltzer) wrote on 2010-11-05:

#9

Basically my program looks like this:

static volatile sig_atomic_t child_terminated=0;

void sigchld_handler(int sig) {
    int copy_errno=errno;
    debug("Received SIGCHLD");
    child_terminated=1;
    signal(SIGCHLD,sigchld_handler);
    errno=copy_errno;
}

int main() {
    signal(SIGCHLD,sigchld_handler);
    for(;;) {
        /* do some heavy weight stuff */
        /* check for child_terminated and perform waitpid */
}

Maybe the debug-call is the reason. It is sending the String to the local syslog-daemon, using sockets and therefore a bunch of system calls. When I consider strace, I see the arrival of SIGCHLD and the futex call directly behind it.
When the futex call is performed, the 3rd argument is a "2". I verified it using PTRACE_PEEKDATA on the 1st argument (which is the address of the futex value). It really is a "2". Do you know what this "2" exactly means? Does it mean one process blocked a ressource and another one is now suspended? What happens if I write a "1" to the futex address?

Thanks a lot for answers ;-)

Revision history for this message

Wayne Salmiaker (hannessteltzer) wrote on 2010-11-05:

#10

By the way: The last system call before SIGCHLD arrives is "connect". This call is also part of the debug procedure. But strace does not mark this call as "unfinished". Normally "connect" is followed by a "send".

Revision history for this message

Rogers Veber (labelsarl) wrote on 2010-11-05:

#11

Hi,

> Basically my program looks like this:
>
> static volatile sig_atomic_t child_terminated=0;
>
> void sigchld_handler(int sig) {
> int copy_errno=errno;
> debug("Received SIGCHLD");
> child_terminated=1;
> signal(SIGCHLD,sigchld_handler);
> errno=copy_errno;
> }
>
> int main() {
> signal(SIGCHLD,sigchld_handler);
> for(;;) {
> /* do some heavy weight stuff */
> /* check for child_terminated and perform waitpid */
> }

You may also use sigaction() instead of signal(). It is more precise
mainly because of the sa_flags and sa_mask that you can provide in the
struct sigaction.
It is not necessary to restore the handler
(signal(SIGCHLD,sigchld_handler); in you handler code) as by default (if
you do not use flag SA_RESETHAND or SA_ONESHOT) the signal disposition
is untouched.

Anyway, I do agree with you : it is mostly possible that the debug()
call is the reason of your trouble, as no other action in your handler
can produce it.

The program I made years ago that produce the same effect that you are
experimenting is a massive forker() and as such has lots of childs
running concurrently. It worked fine for years, then arrived
troubles ...
The first I saw were due to the pthread dynamic library that was claimed
at run time (see ldd(1)). I had to change some code to avoid the use of
some functions that use other functions (that use ...) that finally
needed the pthread.
Then time after, I experimented SIGSEGV when using sprintf(3) and
finally had the FUTEX_WAIT syndrome.
In my mind (beware it is some years old) it began with the change of the
libc that was provided with the Linux Distro. Thus I suppose there is a
reason that can be found mainly in some implementation in the User
space.
Anyway, the fact is that removing the kind of fprintf() from the handler
solved all my troubles.

You can change my old survfutex to peek the data, and decrement it
before poking. Then you will know if poking a "1" would produce some
effect.
If you do this, please let me know if it worked too.

Regards.

-Rogers

>
> Maybe the debug-call is the reason. It is sending the String to the local syslog-daemon, using sockets and therefore a bunch of system calls. When I consider strace, I see the arrival of SIGCHLD and the futex call directly behind it.
> When the futex call is performed, the 3rd argument is a "2". I verified it using PTRACE_PEEKDATA on the 1st argument (which is the address of the futex value). It really is a "2". Do you know what this "2" exactly means? Does it mean one process blocked a ressource and another one is now suspended? What happens if I write a "1" to the futex address?
>
> Thanks a lot for answers ;-)
>

Hi,

> Basically my program looks like this:
> 
> static volatile sig_atomic_t child_terminated=0;
> 
> void sigchld_handler(int sig) {
>     int copy_errno=errno;
>     debug("Received SIGCHLD");
>     child_terminated=1;
>     signal(SIGCHLD,sigchld_handler);
>     errno=copy_errno;
> }
> 
> int main() {
>     signal(SIGCHLD,sigchld_handler);
>     for(;;) {
>         /* do some heavy weight stuff */
>         /* check for child_terminated and perform waitpid */
> }

You may also use sigaction() instead of signal(). It is more precise
mainly because of the sa_flags and sa_mask that you can provide in the
struct sigaction.
It is not necessary to restore the handler
(signal(SIGCHLD,sigchld_handler); in you handler code) as by default (if
you do not use flag SA_RESETHAND or SA_ONESHOT) the signal disposition
is untouched.

Anyway, I do agree with you : it is mostly possible that the debug()
call is the reason of your trouble, as no other action in your handler
can produce it.

The program I made years ago that produce the same effect that you are
experimenting is a massive forker() and as such has lots of childs
running concurrently. It worked fine for years, then arrived
troubles ...
The first I saw were due to the pthread dynamic library that was claimed
at run time (see ldd(1)). I had to change some code to avoid the use of
some functions that use other functions (that use ...) that finally
needed the pthread.
Then time after, I experimented SIGSEGV when using sprintf(3) and
finally had the FUTEX_WAIT syndrome.
In my mind (beware it is some years old) it began with the change of the
libc that was provided with the Linux Distro. Thus I suppose there is a
reason that can be found mainly in some implementation in the User
space.
Anyway, the fact is that removing the kind of fprintf() from the handler
solved all my troubles.

You can change my old survfutex to peek the data, and decrement it
before poking. Then you will know if poking a "1" would produce some
effect.
If you do this, please let me know if it worked too.

Regards.

-Rogers

> 
> Maybe the debug-call is the reason. It is sending the String to the local syslog-daemon, using sockets and therefore a bunch of system calls. When I consider strace, I see the arrival of SIGCHLD and the futex call directly behind it. 
> When the futex call is performed, the 3rd argument is a "2". I verified it using PTRACE_PEEKDATA on the 1st argument (which is the address of the futex value). It really is a "2". Do you know what this "2" exactly means? Does it mean one process blocked a ressource and another one is now suspended? What happens if I write a "1" to the futex address?
> 
> Thanks a lot for answers ;-)
>

Revision history for this message

Rogers Veber (labelsarl) wrote on 2010-11-05:

#12

Yes yes ! Normally the connect(2) should be followed by a send(2).
It is impossible to strace and use survfutex at the same time on your
process because both of them use the ptrace(2) and a process cannot be
ptraced twice at the same time (the ATTACH will not work).
But, you can use the tcpdump command to trace the packets and see if the
three-phase connect (SYN sent, SYN-ACK received, ACK sent) are seen on
the network.
If so and your connect does not success (or waits), yes your connect is
in strange state.

> By the way: The last system call before SIGCHLD arrives is "connect".
> This call is also part of the debug procedure. But strace does not mark
> this call as "unfinished". Normally "connect" is followed by a "send".
>

Revision history for this message

Wayne Salmiaker (hannessteltzer) wrote on 2010-11-08:

#13

You are right. I dont need to restore the signal handler - one system call less in my signal handler.
I now saved a snapshot of the unfinished futex call. So I can experiment as often as I want.
Poking a 1 does not work. But Poking a 0 works perfektly fine.
And youre also right : I cannot use strace when the process is alredy attached to my ptrace stuff.
So I will implement some PTRACE_SYSCALL steps myself to see the process continuing after poking a 0.

Do you think, that errno could be the problem as well? Maybe a system call is assinging a value to errno. Then the signal arrives and the handler tries to access errno, too, which is blocked now?
If really the debug() call is the reason, I wonder why this problem does not happen more often. I am debugging a lot in my code. And besides the debug calls like every 2nd or 3rd code line ends up as a system call. If interrupting a system call and executing another system call in the handler should cause the problem, I feel like this kind of deadlock should happen more often...

Revision history for this message

Rogers Veber (labelsarl) wrote on 2010-11-08:

#14

> You are right. I dont need to restore the signal handler - one system call less in my signal handler.
> I now saved a snapshot of the unfinished futex call. So I can experiment as often as I want.
> Poking a 1 does not work. But Poking a 0 works perfektly fine.

Thank you for this information.

> And youre also right : I cannot use strace when the process is alredy attached to my ptrace stuff.
> So I will implement some PTRACE_SYSCALL steps myself to see the process continuing after poking a 0.

Yes that is the best way; big work too, cheers.

>
> Do you think, that errno could be the problem as well? Maybe a system call is assinging a value to errno. Then the signal arrives and the handler tries to access errno, too, which is blocked now?

When we see that a syscall may be automatically restarted using the
SA_RESTART flag in the struct sigaction (you may have to set this flag
too if you plan to use sigaction(2) instead of signal(2)), then it may
be safe to interrupt a syscall and the errno will be restored safely by
the kernel when restarting the syscall.
So, I do not believe it could be the problem for the kernel part.
Il you choose not to use the SA_RESTART, your interrupted syscall may
return -1 and have an EINTR errno. This could cause trouble in the user
land depending on the code being executed.
As I am no more accurate with this and the kernel stuff, you may
consider my answer with lot of prudence.

> If really the debug() call is the reason, I wonder why this problem does not happen more often. I am debugging a lot in my code. And besides the debug calls like every 2nd or 3rd code line ends up as a system call. If interrupting a system call and executing another system call in the handler should cause the problem, I feel like this kind of deadlock should happen more often...

I am not surprised by the non systematic circumstance because the debug() code may not be in totality re-entrant critical
and you need to have your synchrone part of your program running debug() too to be in a re-entrant trouble.
Just suppress the debug() call and test, you will have your answer.

> You are right. I dont need to restore the signal handler - one system call less in my signal handler.
> I now saved a snapshot of the unfinished futex call. So I can experiment as often as I want.
> Poking a 1 does not work. But Poking a 0 works perfektly fine.

Thank you for this information.

> And youre also right : I cannot use strace when the process is alredy attached to my ptrace stuff. 
> So I will implement some PTRACE_SYSCALL steps myself to see the process continuing after poking a 0.

Yes that is the best way; big work too, cheers.

> 
> Do you think, that errno could be the problem as well? Maybe a system call is assinging a value to errno. Then the signal arrives and the handler tries to access errno, too, which is blocked now?

When we see that a syscall may be automatically restarted using the
SA_RESTART flag in the struct sigaction (you may have to set this flag
too if you plan to use sigaction(2) instead of signal(2)), then it may
be safe to interrupt a syscall and the errno will be restored safely by
the kernel when restarting the syscall.
So, I do not believe it could be the problem for the kernel part.
Il you choose not to use the SA_RESTART, your interrupted syscall may
return -1 and have an EINTR errno. This could cause trouble in the user
land depending on the code being executed.
As I am no more accurate with this and the kernel stuff, you may
consider my answer with lot of prudence.

> If really the debug() call is the reason, I wonder why this problem does not happen more often. I am debugging a lot in my code. And besides the debug calls like every 2nd or 3rd code line ends up as a system call. If interrupting a system call and executing another system call in the handler should cause the problem, I feel like this kind of deadlock should happen more often...

I am not surprised by the non systematic circumstance because the debug() code may not be in totality re-entrant critical
and you need to have your synchrone part of your program running debug() too to be in a re-entrant trouble.
Just suppress the debug() call and test, you will have your answer.

Revision history for this message

Wayne Salmiaker (hannessteltzer) wrote on 2010-11-08:

#15

Thanks for your quick answers!
After poking a zero, I performed a little loop to get the system call number (orig_eax) and the next two arguments (ebx, ecx) of the next 20 system calls. This is how it looks:

syscall=240 (1st_arg=-1210085564 2nd_arg=0)
syscall=240 (1st_arg=-1210085564 2nd_arg=1)
syscall=240 (1st_arg=-1210085564 2nd_arg=1)
syscall=13 (1st_arg=-1075451152 2nd_arg=135199597)
syscall=13 (1st_arg=-1075451152 2nd_arg=135199597)
syscall=102 (1st_arg=1 2nd_arg=-1075451340)
syscall=102 (1st_arg=1 2nd_arg=-1075451340)
syscall=221 (1st_arg=11 2nd_arg=2)
syscall=221 (1st_arg=11 2nd_arg=2)
syscall=102 (1st_arg=3 2nd_arg=-1075451340)
syscall=102 (1st_arg=3 2nd_arg=-1075451340)
syscall=102 (1st_arg=9 2nd_arg=-1075451304)
syscall=102 (1st_arg=9 2nd_arg=-1075451304)
syscall=6 (1st_arg=11 2nd_arg=1)
syscall=6 (1st_arg=11 2nd_arg=1)
syscall=174 (1st_arg=17 2nd_arg=0)
syscall=174 (1st_arg=17 2nd_arg=0)
syscall=4 (1st_arg=6 2nd_arg=134908385)
syscall=4 (1st_arg=6 2nd_arg=134908385)
syscall=13 (1st_arg=-1075451152 2nd_arg=135199597)
syscall=13 (1st_arg=-1075451152 2nd_arg=135199597)
syscall=102 (1st_arg=1 2nd_arg=-1075451340)
syscall=102 (1st_arg=1 2nd_arg=-1075451340)
syscall=221 (1st_arg=11 2nd_arg=2)
syscall=221 (1st_arg=11 2nd_arg=2)
syscall=102 (1st_arg=3 2nd_arg=-1075451340)
syscall=102 (1st_arg=3 2nd_arg=-1075451340)
syscall=102 (1st_arg=9 2nd_arg=-1075451304)
syscall=102 (1st_arg=9 2nd_arg=-1075451304)
syscall=6 (1st_arg=11 2nd_arg=1)
syscall=6 (1st_arg=11 2nd_arg=1)
syscall=119 (1st_arg=1 2nd_arg=-1210093580)
syscall=-1 (1st_arg=1 2nd_arg=-1075448172)
syscall=221 (1st_arg=9 2nd_arg=2)
syscall=221 (1st_arg=9 2nd_arg=2)
syscall=102 (1st_arg=3 2nd_arg=-1075448172)
syscall=102 (1st_arg=3 2nd_arg=-1075448172)

This looks a little cryptic now, but you just need to have a look into /usr/include/asm-i486/unistd.h where all the numbers for the different system calls are defined. It seems each system call is represented by 2 output lines. 240 stands for futex. 2nd arguent "0" means FUTEX_WAIT. "1" means FUTEX_WAKE. 119 stands for "sigreturn". This is where the signal handler is left and the process continues with the normal procedure. 221 is fcnt64 which is also part of the debug() call. 102 is socketcall which seems to be a synonym for "connect".

Thanks for your quick answers!
After poking a zero, I performed a little loop to get the system call number (orig_eax) and the next two arguments (ebx, ecx) of the next 20 system calls. This is how it looks:

syscall=240 (1st_arg=-1210085564 2nd_arg=0)
syscall=240 (1st_arg=-1210085564 2nd_arg=1)
syscall=240 (1st_arg=-1210085564 2nd_arg=1)
syscall=13 (1st_arg=-1075451152 2nd_arg=135199597)
syscall=13 (1st_arg=-1075451152 2nd_arg=135199597)
syscall=102 (1st_arg=1 2nd_arg=-1075451340)
syscall=102 (1st_arg=1 2nd_arg=-1075451340)
syscall=221 (1st_arg=11 2nd_arg=2)
syscall=221 (1st_arg=11 2nd_arg=2)
syscall=102 (1st_arg=3 2nd_arg=-1075451340)
syscall=102 (1st_arg=3 2nd_arg=-1075451340)
syscall=102 (1st_arg=9 2nd_arg=-1075451304)
syscall=102 (1st_arg=9 2nd_arg=-1075451304)
syscall=6 (1st_arg=11 2nd_arg=1)
syscall=6 (1st_arg=11 2nd_arg=1)
syscall=174 (1st_arg=17 2nd_arg=0)
syscall=174 (1st_arg=17 2nd_arg=0)
syscall=4 (1st_arg=6 2nd_arg=134908385)
syscall=4 (1st_arg=6 2nd_arg=134908385)
syscall=13 (1st_arg=-1075451152 2nd_arg=135199597)
syscall=13 (1st_arg=-1075451152 2nd_arg=135199597)
syscall=102 (1st_arg=1 2nd_arg=-1075451340)
syscall=102 (1st_arg=1 2nd_arg=-1075451340)
syscall=221 (1st_arg=11 2nd_arg=2)
syscall=221 (1st_arg=11 2nd_arg=2)
syscall=102 (1st_arg=3 2nd_arg=-1075451340)
syscall=102 (1st_arg=3 2nd_arg=-1075451340)
syscall=102 (1st_arg=9 2nd_arg=-1075451304)
syscall=102 (1st_arg=9 2nd_arg=-1075451304)
syscall=6 (1st_arg=11 2nd_arg=1)
syscall=6 (1st_arg=11 2nd_arg=1)
syscall=119 (1st_arg=1 2nd_arg=-1210093580)
syscall=-1 (1st_arg=1 2nd_arg=-1075448172)
syscall=221 (1st_arg=9 2nd_arg=2)
syscall=221 (1st_arg=9 2nd_arg=2)
syscall=102 (1st_arg=3 2nd_arg=-1075448172)
syscall=102 (1st_arg=3 2nd_arg=-1075448172)

This looks a little cryptic now, but you just need to have a look into /usr/include/asm-i486/unistd.h where all the numbers for the different system calls are defined. It seems each system call is represented by 2 output lines. 240 stands for futex. 2nd arguent "0" means FUTEX_WAIT. "1" means FUTEX_WAKE. 119 stands for "sigreturn". This is where the signal handler is left and the process continues with the normal procedure. 221 is fcnt64 which is also part of the debug() call. 102 is socketcall which seems to be a synonym for "connect".

Revision history for this message

Rogers Veber (labelsarl) wrote on 2010-11-08:

#16

Download full text (4.0 KiB)

Hi,

1) To have a less cryptic output you may add one function to your
program.
It is composed of an automatically built part from header (on my
system /usr/include/asm/unistd_32.h) sc.h file and
a C source file that includes the former.

a) Building the sc.h by :
awk 'NR==1 { printf("#include <%s>\n",FILENAME); } $1 == "#define" && $2
~ /__NR_.*/ { printf("{ %s, \"%s\"},\n",$3,substr($2,6)); } END
{ printf("{0,(char*)0}\n"); }' /usr/include/asm/unistd_32.h > sc.h

b) The sc.c file is :

#include <stdio.h>

        static struct {
         int syscall_no;
         char *syscall_name;
        } scor[] = {
        #include "sc.h"
        };

        char *
        getcorr(int syscall_no)
        {
        static char noname[32];
         int i,maxi = sizeof(scor) / sizeof(scor[0]);

         for(i=0;i<maxi;++i) {
          if( syscall_no == scor[i].syscall_no ) {
           return scor[i].syscall_name;
          }
         }
         snprintf(noname,sizeof(noname),"?%d",syscall_no);
         return noname;
        }

#ifdef Test_MAIN
#include <stdlib.h>

        int
        main(int argc,char *argv[]) {
         int i,no;

         for(i=1;i<argc;++i) {
          no = atoi(argv[i]); // Hoping this will be a number !
          printf("Syscall %d is \"%s\"\n",no,getcorr(no));
         }
         exit(0);
        }
        #endif

2) Some syscall use more than 2 arguments.
It could be nice to have more than 2 arguments displayed.

3) I find strange this futex(addr,FUTEX_WAKE,...) on line 2.
Could you find what is at the address -1210085564(0xffb7df8f44) (in
your name list and/or maps).

-Rogers

> Thanks for your quick answers!
> After poking a zero, I performed a little loop to get the system call number (orig_eax) and the next two arguments (ebx, ecx) of the next 20 system calls. This is how it looks:
>
> syscall=240 (1st_arg=-1210085564 2nd_arg=0)
> syscall=240 (1st_arg=-1210085564 2nd_arg=1)
> syscall=240 (1st_arg=-1210085564 2nd_arg=1)
> syscall=13 (1st_arg=-1075451152 2nd_arg=135199597)
> syscall=13 (1st_arg=-1075451152 2nd_arg=135199597)
> syscall=102 (1st_arg=1 2nd_arg=-1075451340)
> syscall=102 (1st_arg=1 2nd_arg=-1075451340)
> syscall=221 (1st_arg=11 2nd_arg=2)
> syscall=221 (1st_arg=11 2nd_arg=2)
> syscall=102 (1st_arg=3 2nd_arg=-1075451340)
> syscall=102 (1st_arg=3 2nd_arg=-1075451340)
> syscall=102 (1st_arg=9 2nd_arg=-1075451304)
> syscall=102 (1st_arg=9 2nd_arg=-1075451304)
> syscall=6 (1st_arg=11 2nd_arg=1)
> syscall=6 (1st_arg=11 2nd_arg=1)
> syscall=174 (1st_arg=17 2nd_arg=0)
> syscall=174 (1st_arg=17 2nd_arg=0)
> syscall=4 (1st_arg=6 2nd_arg=134908385)
> syscall=4 (1st_arg=6 2nd_arg=134908385)
> syscall=13 (1st_arg=-1075451152 2nd_arg=135199597)
> syscall=13 (1st_arg=-1075451152 2nd_arg=135199597)
> syscall=102 (1st_arg=1 2nd_arg=-1075451340)
> syscall=102 (1st_arg=1 2nd_arg=-1075451340)
> syscall=221 (1st_arg=11 2nd_arg=2)
> syscall=221 (1st_arg=11 2nd_arg=2)
> syscall=102 (1st_arg=3 2nd_arg=-1075451340)
> syscall=102 (1st_arg=3 2nd_arg=-1075451340)
> syscall=102 (1st_arg=9 2nd_arg=-1075451304)
> syscall=102 (1st_arg=9 2nd_arg=-1075451304)
> syscal...

Hi,

1) To have a less cryptic output you may add one function to your
program.
It is composed of an automatically built part from header (on my
system /usr/include/asm/unistd_32.h) sc.h file and
a C source file that includes the former.

a) Building the sc.h by :
awk 'NR==1 { printf("#include <%s>\n",FILENAME); } $1 == "#define" && $2
~ /__NR_.*/ { printf("{ %s, \"%s\"},\n",$3,substr($2,6)); } END
{ printf("{0,(char*)0}\n"); }' /usr/include/asm/unistd_32.h > sc.h

b) The sc.c file is :

#include	<stdio.h>
        
        static struct {
        	int	syscall_no;
        	char	*syscall_name;
        } scor[] = {
        #include "sc.h"
        };
        
        char	*
        getcorr(int syscall_no)
        {
        static	char	noname[32];
        	int	i,maxi = sizeof(scor) / sizeof(scor[0]);
        
        	for(i=0;i<maxi;++i) {
        		if( syscall_no == scor[i].syscall_no ) {
        			return scor[i].syscall_name;
        		}
        	}
        	snprintf(noname,sizeof(noname),"?%d",syscall_no);
        	return noname;
        }
        
        #ifdef	Test_MAIN
        #include	<stdlib.h>
        
        int
        main(int argc,char *argv[]) {
        	int	i,no;
        
        	for(i=1;i<argc;++i) {
        		no = atoi(argv[i]);	// Hoping this will be a number !
        		printf("Syscall %d is \"%s\"\n",no,getcorr(no));
        	}
        	exit(0);
        }
        #endif

2) Some syscall use more than 2 arguments.
    It could be nice to have more than 2 arguments displayed.

3) I find strange this futex(addr,FUTEX_WAKE,...) on line 2.
    Could you find what is at the address -1210085564(0xffb7df8f44) (in
your name list and/or maps).

-Rogers

> Thanks for your quick answers!
> After poking a zero, I performed a little loop to get the system call number (orig_eax) and the next two arguments (ebx, ecx) of the next 20 system calls. This is how it looks:
> 
> syscall=240 (1st_arg=-1210085564 2nd_arg=0)
> syscall=240 (1st_arg=-1210085564 2nd_arg=1)
> syscall=240 (1st_arg=-1210085564 2nd_arg=1)
> syscall=13 (1st_arg=-1075451152 2nd_arg=135199597)
> syscall=13 (1st_arg=-1075451152 2nd_arg=135199597)
> syscall=102 (1st_arg=1 2nd_arg=-1075451340)
> syscall=102 (1st_arg=1 2nd_arg=-1075451340)
> syscall=221 (1st_arg=11 2nd_arg=2)
> syscall=221 (1st_arg=11 2nd_arg=2)
> syscall=102 (1st_arg=3 2nd_arg=-1075451340)
> syscall=102 (1st_arg=3 2nd_arg=-1075451340)
> syscall=102 (1st_arg=9 2nd_arg=-1075451304)
> syscall=102 (1st_arg=9 2nd_arg=-1075451304)
> syscall=6 (1st_arg=11 2nd_arg=1)
> syscall=6 (1st_arg=11 2nd_arg=1)
> syscall=174 (1st_arg=17 2nd_arg=0)
> syscall=174 (1st_arg=17 2nd_arg=0)
> syscall=4 (1st_arg=6 2nd_arg=134908385)
> syscall=4 (1st_arg=6 2nd_arg=134908385)
> syscall=13 (1st_arg=-1075451152 2nd_arg=135199597)
> syscall=13 (1st_arg=-1075451152 2nd_arg=135199597)
> syscall=102 (1st_arg=1 2nd_arg=-1075451340)
> syscall=102 (1st_arg=1 2nd_arg=-1075451340)
> syscall=221 (1st_arg=11 2nd_arg=2)
> syscall=221 (1st_arg=11 2nd_arg=2)
> syscall=102 (1st_arg=3 2nd_arg=-1075451340)
> syscall=102 (1st_arg=3 2nd_arg=-1075451340)
> syscall=102 (1st_arg=9 2nd_arg=-1075451304)
> syscall=102 (1st_arg=9 2nd_arg=-1075451304)
> syscall=6 (1st_arg=11 2nd_arg=1)
> syscall=6 (1st_arg=11 2nd_arg=1)
> syscall=119 (1st_arg=1 2nd_arg=-1210093580)
> syscall=-1 (1st_arg=1 2nd_arg=-1075448172)
> syscall=221 (1st_arg=9 2nd_arg=2)
> syscall=221 (1st_arg=9 2nd_arg=2)
> syscall=102 (1st_arg=3 2nd_arg=-1075448172)
> syscall=102 (1st_arg=3 2nd_arg=-1075448172)
> 
> This looks a little cryptic now, but you just need to have a look into
> /usr/include/asm-i486/unistd.h where all the numbers for the different
> system calls are defined. It seems each system call is represented by 2
> output lines. 240 stands for futex. 2nd arguent "0" means FUTEX_WAIT.
> "1" means FUTEX_WAKE. 119 stands for "sigreturn". This is where the
> signal handler is left and the process continues with the normal
> procedure. 221 is fcnt64 which is also part of the debug() call. 102 is
> socketcall which seems to be a synonym for "connect".
>

Revision history for this message

Wayne Salmiaker (hannessteltzer) wrote on 2010-11-09:

#17

By interpreting the 1st argument as an address and reading the value on this address, I get a 0 for the first three lines of my output (which are the futex calls).
Very nice how you use awk and the created header file in the middle of your code. I tried it. It works ;-)

A few answers ago, you said restoring the signal handler is not neccessary. Is this a general fact or does it only apply to some kernel versions/architectures?

Revision history for this message

Rogers Veber (labelsarl) wrote on 2010-11-09:

#18

Hi,

> By interpreting the 1st argument as an address and reading the value on this address, I get a 0 for the first three lines of my output (which are the futex calls).

No surprise for the first one because you just pushed a 0 at it to
unblock your process.
For the later ... I just wonder where is it from in the code (which lib,
which function ?).
You may try to send a SEGV signal to your process.
If it has be started under no limit for core (ulimit -c unlimited) you
should have a core dumped that could help you to know more about this
context with a debugger ...

> Very nice how you use awk and the created header file in the middle of your code. I tried it. It works ;-)

Yes, awk is a very fine and simple program for such a job.
The basic idea was to "extract" names of syscall regardless of your
system (32|64 bits, distro ...) and not to hard to regenerate if you
change some version of your box.
You just have to feed it with the correct header file ...

The difficulty now is to know how many arguments have been passed to
your syscall (and also the nature of those arguments).
That is mainly why strace is a very great tool and should be difficult
to maintain ...

>
> A few answers ago, you said restoring the signal handler is not
> neccessary. Is this a general fact or does it only apply to some kernel
> versions/architectures?

Well, historically in the Unix world there were two main branches of
Unices, the System V created by the Bell Labs. and the BSD created at
Berkeley University.
Many years ago, on System V Unices you had to restore the handler from
within the handler just like you did in your program. On BSD, the
management of signal
were cleaner and much more like you can see on Linux today, you could
specify if the handler should be reseted or not.
Except that the default behaviour is inverted.
On Linux today, the default is to keep the handler, and only an explicit
SA_ONESHOT or SA_RESETHAND prevents it.
I suppose there is a Posix on it ...

Hi,

> By interpreting the 1st argument as an address and reading the value on this address, I get a 0 for the first three lines of my output (which are the futex calls).

No surprise for the first one because you just pushed a 0 at it to
unblock your process.
For the later ... I just wonder where is it from in the code (which lib,
which function ?).
You may try to send a SEGV signal to your process.
If it has be started under no limit for core (ulimit -c unlimited) you
should have a core dumped that could help you to know more about this
context with a debugger ...

> Very nice how you use awk and the created header file in the middle of your code. I tried it. It works ;-)

Yes, awk is a very fine and simple program for such a job.
The basic idea was to "extract" names of syscall  regardless of your
system (32|64 bits, distro ...) and not to hard to regenerate if you
change some version of your box.
You just have to feed it with the correct header file ...

The difficulty now is to know how many arguments have been passed to
your syscall (and also the nature of those arguments).
That is mainly why strace is a very great tool and should be difficult
to maintain ...

> 
> A few answers ago, you said restoring the signal handler is not
> neccessary. Is this a general fact or does it only apply to some kernel
> versions/architectures?

Well, historically in the Unix world there were two main branches of
Unices, the System V created by the Bell Labs. and the BSD created at
Berkeley University.
Many years ago, on System V Unices you had to restore the handler from
within the handler just like you did in your program. On BSD, the
management of signal
were cleaner and much more like you can see on Linux today, you could
specify if the handler should be reseted or not.
Except that the default behaviour is inverted.
On Linux today, the default is to keep the handler, and only an explicit
SA_ONESHOT or SA_RESETHAND prevents it.
I suppose there is a Posix on it ...

Ubuntu
xchat-gnome package

Futex hang when exiting using the window close button

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntuxchat-gnome package

Futex hang when exiting using the window close button

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
xchat-gnome package