autopkgtests sometimes hang on armhf

Bug #1853571 reported by Olivier Tilloy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
firefox (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

This has been most recently observed with firefox 70.0.1+build1-0ubuntu2 on focal, but I believe this is not new, it has been observed in the past with older versions of firefox.

I cannot reproduce the problem consistently, but I've observed it at least once in a test environment (focal/armhf lxd container running on a bionic/arm64 cloud instance), and it has happened quite a few times lately on the ubuntu autopkgtest infrastructure.

The html5test hangs indefinitely, and only times out after more than 5 hours (=20000 seconds, which is the test timeout hardcoded in the infrastructure).

When I reproduced the hang, this was the list of (relevant) running processes:

root 5554 5545 0 05:55 ? 00:00:02 /usr/bin/python3 /tmp/autopkgtest.xdq8Z8/build.Zik/real-tree/debian/tests/html5test
root 5561 5554 0 05:55 ? 00:00:00 geckodriver --port 42963
root 5569 5561 0 05:55 ? 00:00:17 /usr/lib/firefox/firefox -marionette -headless -foreground -no-remote -profile /tmp/rust_mozprofileA720rs
root 5611 5569 0 05:55 ? 00:00:05 /usr/lib/firefox/firefox -contentproc -childID 1 -isForBrowser -prefsLen 1 -prefMapSize 198008 …
root 5632 5569 0 05:55 ? 00:00:14 /usr/lib/firefox/firefox -contentproc -childID 2 -isForBrowser -prefsLen 62 -prefMapSize 198008 …
root 5676 5569 0 05:55 ? 00:00:00 /usr/lib/firefox/firefox -contentproc -childID 3 -isForBrowser -prefsLen 8264 -prefMapSize 198008 …

I straced them all and the geckodriver process is the one that appeared blocked:

    root@focal-armhf:~# strace -p 5561
    strace: Process 5561 attached
    futex(0xf7a79288, FUTEX_WAIT, 5562, NULL

Here is the test output when it hangs:

autopkgtest [07:14:56]: test firefox-version: - - - - - - - - - - results - - - - - - - - - -
firefox-version PASS
autopkgtest [07:14:59]: test html5test: preparing testbed
Reading package lists...
Building dependency tree...
Reading state information...
Starting pkgProblemResolver with broken count: 0
Starting 2 pkgProblemResolver with broken count: 0
Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
1 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Setting up autopkgtest-satdep (0) ...
(Reading database ... 59615 files and directories currently installed.)
Removing autopkgtest-satdep (0) ...
autopkgtest [07:15:29]: test html5test: [-----------------------
127.0.0.1 - - [22/Nov/2019 07:15:45] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [22/Nov/2019 07:15:47] "GET /css/main.css HTTP/1.1" 200 -
127.0.0.1 - - [22/Nov/2019 07:15:47] "GET /scripts/base.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Nov/2019 07:15:47] "GET /scripts/8/engine.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Nov/2019 07:15:47] "GET /scripts/8/data.js HTTP/1.1" 200 -
127.0.0.1 - - [22/Nov/2019 07:15:52] "GET /images/html5.png HTTP/1.1" 200 -
127.0.0.1 - - [22/Nov/2019 07:15:53] "GET /images/sponsors/cloudvps.png HTTP/1.1" 200 -
127.0.0.1 - - [22/Nov/2019 07:15:53] "GET /fonts/html5test.woff HTTP/1.1" 200 -
127.0.0.1 - - [22/Nov/2019 07:15:53] "GET /fonts/leaguegothic-regular-webfont.woff HTTP/1.1" 200 -
127.0.0.1 - - [22/Nov/2019 07:15:53] "GET /assets/detect.html?twt5r HTTP/1.1" 200 -
127.0.0.1 - - [22/Nov/2019 07:15:53] "GET /assets/detect.html?k44ge HTTP/1.1" 200 -
127.0.0.1 - - [22/Nov/2019 07:15:53] "GET /assets/detect.html?flslr HTTP/1.1" 200 -
127.0.0.1 - - [22/Nov/2019 07:15:53] "GET /assets/detect.html?2l0zy HTTP/1.1" 200 -
127.0.0.1 - - [22/Nov/2019 07:15:53] "GET /images/icons/favicon-196x196.png HTTP/1.1" 200 -
127.0.0.1 - - [22/Nov/2019 07:15:53] "GET /images/icons/favicon-16x16.png HTTP/1.1" 200 -

Revision history for this message
Olivier Tilloy (osomon) wrote :

If the cause of the hang cannot easily be determined, we should at least make sure that the test doesn't hang for 5+ hours, until it times out (i.e. we need a lower timeout, 5 minutes is probably fine).

Changed in firefox (Ubuntu):
assignee: nobody → Olivier Tilloy (osomon)
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Olivier Tilloy (osomon) wrote :

As a mitigation, I committed https://bazaar.launchpad.net/~mozillateam/firefox/firefox.focal/revision/1330.

This ensures that the tests time out after 5 minutes, instead of keeping a test slave hung for 5+ hours.

Revision history for this message
Olivier Tilloy (osomon) wrote :

Attaching geckodriver.log for the hung test run.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package firefox - 71.0+build2-0ubuntu2

---------------
firefox (71.0+build2-0ubuntu2) focal; urgency=medium

  * Increase the autopkgtests timeout to 10 minutes
    - debian/tests/control
  * Update test expectation
    - debian/tests/html5test

 -- Olivier Tilloy <email address hidden> Wed, 27 Nov 2019 15:50:37 +0100

Changed in firefox (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Olivier Tilloy (osomon) wrote :

This is not actually fixed, merely mitigated by the reduced timeout.

Changed in firefox (Ubuntu):
status: Fix Released → Triaged
Olivier Tilloy (osomon)
Changed in firefox (Ubuntu):
assignee: Olivier Tilloy (osomon) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.