defragfs.ocfs2 hangs (or takes too long) on arm64, ppc64el

Bug #1840958 reported by Andreas Hasenack
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OCFS2 Tools
Fix Released
Unknown
ocfs2-tools (Ubuntu)
Fix Released
Medium
Rafael David Tinoco

Bug Description

[Impact]

 * ocfs2 defrag tool does not work for ARM64 architecture.

[Test Case]

 * Run the following script:
   https://pastebin.ubuntu.com/p/dYG2xct6dz/

 * When it opens a new shell, run:
   $ defragfs.ocfs2 -v /mnt (as root)

 * Watch defragfs.ocfs2 to consume 100% of CPU and no output.

[Regression Potential]

 * I'm basically changing a (char) for a (int). Potential for
   regression is almost non existent for this case.

[Other Info]

The new defragfs.ocfs2 test added in the 1.8.6-1 version of the package hangs (or takes too long) in our dep8 infrastructure.

I reproduced this on an arm64 VM. The command stays silent, and consuming 99% of CPU. There is no I/O being done (checked with iostat and iotop).

strace -f shows it stopping at this write:
2129 write(1, "defragfs.ocfs2 1.8.6\n", 21) = 21

Which is just a version print.

Also tested with kernel 5.2.0-13-generic from eoan-proposed.

debian's ci only runs this test on amd64 it seems.

On an amd64 VM in the same cloud this tests completes in less than 1s.

summary: - defragfs.ocfs2 hangs (or takes too long) on arm64
+ defragfs.ocfs2 hangs (or takes too long) on arm64, ppc64el
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Okay, I'm marking the bug:

https://bugs.launchpad.net/ubuntu/+source/ocfs2-tools/+bug/1837089

as a duplicate of this, since you reproduced. I'm also assigning it to myself.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ocfs2-tools (Ubuntu):
status: New → Confirmed
Changed in ocfs2-tools (Ubuntu):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in ocfs2-tools (Ubuntu):
status: Confirmed → In Progress
importance: Undecided → Medium
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (5.2 KiB)

inaddy@ocfs2defrag:~$ sudo cat /proc/1452/stack
[<0>] 0x0

inaddy@ocfs2defrag:~$ while true; do sudo cat /proc/1452/stack; done > bleh.txt
inaddy@ocfs2defrag:~$ cat bleh.txt | sort -u
[<0>] 0x0

Stack does not help me at all (not being updated in execution path from the user<-> kernel context switch). Still, process is consuming100% of user time:

%Cpu3 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 1452 root 20 0 1936 428 360 R 100.0 0.0 6:27.67 defragfs.ocfs2

Initial check on time spend scheduling to discard kernel issue like I suspected before:

inaddy@ocfs2defrag:~$ sudo perf sched timehist -p 1452

           time cpu task name wait time sch delay run time
                        [tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
     111.555973 [0003] defragfs.ocfs2[1452] 0.000 0.000 0.000
     112.123934 [0003] defragfs.ocfs2[1452] 0.029 0.000 567.930
     113.571854 [0003] defragfs.ocfs2[1452] 0.018 0.000 1447.901
     115.139790 [0003] defragfs.ocfs2[1452] 0.027 0.000 1567.908
     115.555743 [0003] defragfs.ocfs2[1452] 0.030 0.000 415.921
     116.123707 [0003] defragfs.ocfs2[1452] 0.027 0.000 567.936
     116.867670 [0003] defragfs.ocfs2[1452] 0.019 0.000 743.943
     117.571630 [0003] defragfs.ocfs2[1452] 0.025 0.000 703.934
     118.243597 [0003] defragfs.ocfs2[1452] 0.026 0.000 671.940
     118.443583 [0003] defragfs.ocfs2[1452] 0.028 0.000 199.957
     119.587530 [0003] defragfs.ocfs2[1452] 0.017 0.000 1143.929
     120.123500 [0003] defragfs.ocfs2[1452] 0.023 0.000 535.945
     121.571458 [0003] defragfs.ocfs2[1452] 0.016 0.000 1447.941
     123.587358 [0003] defragfs.ocfs2[1452] 0.027 0.000 2015.871
     124.123365 [0003] defragfs.ocfs2[1452] 0.036 0.000 535.970
     125.123284 [0003] defragfs.ocfs2[1452] 0.020 0.000 999.899
     125.603259 [0003] defragfs.ocfs2[1452] 0.032 0.000 479.942
     126.883212 [0003] defragfs.ocfs2[1452] 0.029 0.000 1279.924
     127.587179 [0003] defragfs.ocfs2[1452] 0.028 0.000 703.938
     128.123153 [0003] defragfs.ocfs2[1452] 0.027 0.000 535.946

It spends almost no time in waiting for CPU AND absolutely no time in scheduling (as its the only task currently really running), so its something in userland indeed.... debugging it:

(gdb)

#0 0x0000ffffbf618b10 in _getopt_internal_r
    (argc=3, argv=0xfffffffff868, optstring=0xaaaaaaaacfc0 "gvclh",
    longopts=0x0, longind=0x0, long_only=0, d=d@entry=0xffffbf6cad88 <getopt_data>,
    posixly_correct=<optimized out>)
    a...

Read more...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Alright, this was easier than an issue in glibc... arm64 does not handle (char) overflow, when getting the ret value from getopt(). An overflow to (char) num will be seen as 255 in arm64, instead of -1, like the getopt() from main expects, making the getopt() logic to loop forever.

I'll suggest a small patch shortly.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
description: updated
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Fix was merged but there is still a s390x regression you can follow here:

https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-eoan/eoan/s390x/o/ocfs2-tools/20190903_131540_75995@/log.gz

=== o2image ===
Segmentation fault (core dumped)
umount: /mnt: not mounted.
autopkgtest [13:15:14]: test pcmk: -----------------------]
pcmk FAIL non-zero exit status 139
autopkgtest [13:15:15]: test pcmk: - - - - - - - - - - results - - - - - - - - - -
autopkgtest [13:15:15]: test pcmk: - - - - - - - - - - stderr - - - - - - - - - -
Segmentation fault (core dumped)
umount: /mnt: not mounted.
autopkgtest [13:15:15]: @@@@@@@@@@@@@@@@@@@@ summary
basic PASS
o2cb FAIL non-zero exit status 139
pcmk FAIL non-zero exit status 139
Exit request sent.

Changed in ocfs2-tools (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

This bug is being addressed here:

https://bugs.launchpad.net/ubuntu/+source/ocfs2-tools/+bug/1745155

And I'll try to mitigate it in that bug (possibly doing another merge/review).

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ocfs2-tools - 1.8.6-1ubuntu1

---------------
ocfs2-tools (1.8.6-1ubuntu1) eoan; urgency=medium

  * d/p/defrag.ocfs2-make-getopt-portable.patch:
    make defragfs.ocfs2 portable to ARM64 (LP: #1840958)

 -- Rafael David Tinoco <email address hidden> Mon, 02 Sep 2019 21:21:13 +0000

Changed in ocfs2-tools (Ubuntu):
status: Fix Committed → Fix Released
Changed in ocfs2-tools:
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.