upstart,libnih ftbfs on s390x with linux 4.4.0-21.37

Bug #1576914 reported by Steve Langasek
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
upstart (Ubuntu)
New
Undecided
Dimitri John Ledkov

Bug Description

The latest uploads of upstart are failing to build on s390x, with lost fds causing the testsuite to fail to return (which should eventually lead to a timeout, but hasn't as of this writing):

  https://launchpad.net/ubuntu/+source/upstart/1.13.2-0ubuntu22/+build/9658082
  https://launchpad.net/ubuntu/+source/upstart/1.13.2-0ubuntu23/+build/9668581

upstart also fails to build on devac02, with the following test failures:

not ok 12 - ensure log written when directory created accessible with uid 0
        wrong value for output, got unexpected (nil)
        at tests/test_log.c:827 (test_log_new).
/bin/bash: line 5: 31104 Aborted (core dumped) ${dir}$tst
FAIL: test_log

[...]
not ok 19 - with deletion of top-level directory
        wrong value for source->watch, expected (nil) got 0x2aa43d27cb0
        at tests/test_conf.c:1006 (test_source_reload_job_dir).
Aborted (core dumped)
FAIL: test_conf_preload.sh

The second of these failures definitely looks like a lost inotify event.

So I looked at libnih, and libnih's testsuite hangs indefinitely on devac02, after:
  PASS: test_file 78 - with simple directory loop

This is the last test in test_file; the next test that it hangs on is test_watch. Running this test directly shows that it hangs after:
  ok 17 - nih_watch_reader

strace shows the test_watch waiting indefinitely on a select, waiting to read the inotify fd.

All of this is on systems running linux 4.4.0-21.

The upstart testsuite was passing as recently as 4.4.0-18.

https://launchpad.net/ubuntu/+source/upstart/1.13.2-0ubuntu21/+build/9278268

I don't know what in the kernel could have changed to break this, but that seems the most likely explanation. There was also a new glibc version since the last successful upstart build, 2.23-0ubuntu3, which includes an s390x-specific change, but that seems quite unlikely to impact inotify.

Steve Langasek (vorlon)
Changed in upstart (Ubuntu):
assignee: nobody → Dimitri John Ledkov (xnox)
Revision history for this message
Steve Langasek (vorlon) wrote :

Just to be sure, I tried downgrading libc6 in the build environment from 2.23-0ubuntu3 to 2.23-0ubuntu2. The problem is still reproducible.

And the build filesystem is ext4, which should be fairly inotify-safe.

Revision history for this message
Steve Langasek (vorlon) wrote :

Notwithstanding the above, I've just been surprised to find that test_watch passes on the devac02 host system, where it fails under a yakkety schroot. It turns out the libnih test suite is ignoring the value of TMPDIR, and always writing to /tmp, and /tmp is an overlayfs mount - which means a lost inotify event for a deleted parent directory is more or less expected behavior.

Until we have a full build log from launchpad, it's unknown whether the failure in launchpad is related or not.

Revision history for this message
Steve Langasek (vorlon) wrote :

Build https://launchpad.net/ubuntu/+source/upstart/1.13.2-0ubuntu23/+build/9668581 has been cancelled, and the failure shown in the log there is:

not ok 13 - ensure remainder of log written when file deleted with uid 0
 wrong value for output, got unexpected (nil)
 at tests/test_log.c:868 (test_log_new).
/bin/bash: line 5: 40392 Aborted ${dir}$tst
FAIL: test_log

This was not the same failure that I saw, but possibly related.

After adjusting settings in the chroot, libnih still fails its testsuite with wholly uninteresting 'string' problems related to terminal sizes that are almost certainly unrelated and specific to the test environment.

And upstart now fails locally with a variety of apparently flaky tests. E.g.:

test: Failed to spawn test main process: unable to execute: No such file or directory
ok 67 - with no such file, no shell and console log
ok 68 - with debug enabled
ok 69 - ensure sane fds with no console
ok 70 - ensure sane fds with console log
not ok 71 - ensure multi process output logged
        wrong value for stat (filename, &statbuf), expected 0 got -1
        at tests/test_job_process.c:4803 (test_spawn).
/bin/bash: line 5: 51948 Aborted (core dumped) ${dir}$tst
FAIL: test_job_process

or, in another run:

not ok 147 - ensure re-exec does not disrupt umask
 wrong value for ok, expected 1 got 0
 at tests/test_initctl.c:11385 (test_reexec).
/bin/bash: line 5: 31440 Aborted (core dumped) ${dir}$tst
FAIL: test_initctl

or:

not ok 136 - with child exit notification before child setup success notification
        wrong value for timed_waitpid (pid, 5), expected 63974 got 0
        at tests/test_job_process.c:8957 (test_handler).
/bin/bash: line 5: 63192 Aborted (core dumped) ${dir}$tst
FAIL: test_job_process

There's repetition in the failing tests; a completely clean run is elusive.

And I've just noticed the last good build of upstart on s390x, <https://launchpad.net/ubuntu/+source/upstart/1.13.2-0ubuntu21/+build/9278268>, was more than a month after the package was uploaded. So the successful build is the result of a retry of a failed build, which means there's nothing to say that a kernel change between 4.4.0-18 and 4.4.0-21 has anything to do with this.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

irrelevant on s390x, as we don't use upstart as pid 1, nor support desktop sessions? =)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.