Comment 3 for bug 1840958

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

inaddy@ocfs2defrag:~$ sudo cat /proc/1452/stack
[<0>] 0x0

inaddy@ocfs2defrag:~$ while true; do sudo cat /proc/1452/stack; done > bleh.txt
inaddy@ocfs2defrag:~$ cat bleh.txt | sort -u
[<0>] 0x0

Stack does not help me at all (not being updated in execution path from the user<-> kernel context switch). Still, process is consuming100% of user time:

%Cpu3 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 1452 root 20 0 1936 428 360 R 100.0 0.0 6:27.67 defragfs.ocfs2

Initial check on time spend scheduling to discard kernel issue like I suspected before:

inaddy@ocfs2defrag:~$ sudo perf sched timehist -p 1452

           time cpu task name wait time sch delay run time
                        [tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
     111.555973 [0003] defragfs.ocfs2[1452] 0.000 0.000 0.000
     112.123934 [0003] defragfs.ocfs2[1452] 0.029 0.000 567.930
     113.571854 [0003] defragfs.ocfs2[1452] 0.018 0.000 1447.901
     115.139790 [0003] defragfs.ocfs2[1452] 0.027 0.000 1567.908
     115.555743 [0003] defragfs.ocfs2[1452] 0.030 0.000 415.921
     116.123707 [0003] defragfs.ocfs2[1452] 0.027 0.000 567.936
     116.867670 [0003] defragfs.ocfs2[1452] 0.019 0.000 743.943
     117.571630 [0003] defragfs.ocfs2[1452] 0.025 0.000 703.934
     118.243597 [0003] defragfs.ocfs2[1452] 0.026 0.000 671.940
     118.443583 [0003] defragfs.ocfs2[1452] 0.028 0.000 199.957
     119.587530 [0003] defragfs.ocfs2[1452] 0.017 0.000 1143.929
     120.123500 [0003] defragfs.ocfs2[1452] 0.023 0.000 535.945
     121.571458 [0003] defragfs.ocfs2[1452] 0.016 0.000 1447.941
     123.587358 [0003] defragfs.ocfs2[1452] 0.027 0.000 2015.871
     124.123365 [0003] defragfs.ocfs2[1452] 0.036 0.000 535.970
     125.123284 [0003] defragfs.ocfs2[1452] 0.020 0.000 999.899
     125.603259 [0003] defragfs.ocfs2[1452] 0.032 0.000 479.942
     126.883212 [0003] defragfs.ocfs2[1452] 0.029 0.000 1279.924
     127.587179 [0003] defragfs.ocfs2[1452] 0.028 0.000 703.938
     128.123153 [0003] defragfs.ocfs2[1452] 0.027 0.000 535.946

It spends almost no time in waiting for CPU AND absolutely no time in scheduling (as its the only task currently really running), so its something in userland indeed.... debugging it:

(gdb)

#0 0x0000ffffbf618b10 in _getopt_internal_r
    (argc=3, argv=0xfffffffff868, optstring=0xaaaaaaaacfc0 "gvclh",
    longopts=0x0, longind=0x0, long_only=0, d=d@entry=0xffffbf6cad88 <getopt_data>,
    posixly_correct=<optimized out>)
    at getopt.c:508

#1 0x0000ffffbf618f0c in _getopt_internal
    (argc=<optimized out>, argv=<optimized out>, optstring=<optimized out>,
    longopts=<optimized out>, longind=<optimized out>, long_only=<optimized out>,
    posixly_correct=<optimized out>)
    at getopt.c:711

#2 0x0000aaaaaaaab43c in parse_opt
    (_mode_flag=<synthetic pointer>, argv=0xfffffffff868, argc=3) at main.c:585

#3 main (argc=3, argv=0xfffffffff868) at main.c:653

after some time:

(gdb) bt

#0 0x0000ffffbf618950 in _getopt_internal_r
    (argc=3, argv=0xfffffffff868, optstring=0xaaaaaaaacfc0 "gvclh",
    longopts=0x0, longind=0x0, long_only=0, d=d@entry=0xffffbf6cad88 <getopt_data>,
    posixly_correct=<optimized out>)
    at getopt.c:500

#1 0x0000ffffbf618f0c in _getopt_internal
    (argc=<optimized out>, argv=<optimized out>, optstring=<optimized out>,
    longopts=<optimized out>, longind=<optimized out>, long_only=<optimized out>,
    posixly_correct=<optimized out>)
    at getopt.c:711

#2 0x0000aaaaaaaab43c in parse_opt
    (_mode_flag=<synthetic pointer>, argv=0xfffffffff868, argc=3) at main.c:585

#3 main (argc=3, argv=0xfffffffff868) at main.c:653

after some more time:

(gdb) bt

#0 0x0000ffffbf618cec in _getopt_internal_r
    (argc=3, argv=0xfffffffff868, optstring=0xaaaaaaaacfc0 "gvclh",
    longopts=0x0, longind=0x0, long_only=0, d=d@entry=0xffffbf6cad88 <getopt_data>,
    posixly_correct=<optimized out>)
    at getopt.c:512

#1 0x0000ffffbf618f0c in _getopt_internal
    (argc=<optimized out>, argv=<optimized out>, optstring=<optimized out>,
    longopts=<optimized out>, longind=<optimized out>, long_only=<optimized out>,
    posixly_correct=<optimized out>)
    at getopt.c:711

#2 0x0000aaaaaaaab43c in parse_opt
    (_mode_flag=<synthetic pointer>, argv=0xfffffffff868, argc=3) at main.c:585

#3 main (argc=3, argv=0xfffffffff868) at main.c:653

Checking glibc getopt.c code (around lines 500 <-> 512) I can see we are stuck in this loop:

  if (d->__nextchar == NULL || *d->__nextchar == '\0')
    {
      /* Advance to the next ARGV-element. */
      ...
    }

And I'll try to discover why tomorrow (and report upstream).