systemd unit test regression in autopkgtest (oomd-utils)

Bug #1942113 reported by Simon Chopin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd
Unknown
Unknown
glibc (Ubuntu)
Invalid
Undecided
Unassigned
systemd (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

The latest systemd upload fails its tests against glibc 2.34. The failing test is test-oomd-util within the unit-tests, with the following logs:

Last pgscan 33 greater than current pgscan 1 for /herp.slice/derp.scope. Using last pgscan of zero.
Last pgscan 33 greater than current pgscan 1 for /herp.slice/derp.scope. Using last pgscan of zero.
Last pgscan 33 greater than current pgscan 2 for /zupa.slice. Using last pgscan of zero.
Last pgscan 33 greater than current pgscan 2 for /zupa.slice. Using last pgscan of zero.
Last pgscan 33 greater than current pgscan 1 for /herp.slice/derp.scope. Using last pgscan of zero.
Last pgscan 33 greater than current pgscan 2 for /zupa.slice. Using last pgscan of zero.
Last pgscan 33 greater than current pgscan 1 for /herp.slice/derp.scope. Using last pgscan of zero.
Bus n/a: changing state UNSET → OPENING
sd-bus: starting bus by connecting to /run/dbus/system_bus_socket...
Bus n/a: changing state OPENING → AUTHENTICATING
Bus n/a: changing state AUTHENTICATING → HELLO
Sent message type=method_call sender=n/a destination=org.freedesktop.DBus path=/org/freedesktop/DBus interface=org.freedesktop.DBus member=Hello cookie=1 reply_cookie=0 signature=n/a error-name=n/a error-message=n/a
Sent message type=method_call sender=n/a destination=org.freedesktop.DBus path=/org/freedesktop/DBus interface=org.freedesktop.DBus member=AddMatch cookie=2 reply_cookie=0 signature=s error-name=n/a error-message=n/a
Got message type=method_return sender=org.freedesktop.DBus destination=:1.23 path=n/a interface=n/a member=n/a cookie=1 reply_cookie=1 signature=s error-name=n/a error-message=n/a
Bus n/a: changing state HELLO → RUNNING
Sent message type=method_call sender=n/a destination=org.freedesktop.systemd1 path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=StartTransientUnit cookie=3 reply_cookie=0 signature=ssa(sv)a(sa(sv)) error-name=n/a error-message=n/a
Got message type=method_return sender=:1.0 destination=:1.23 path=n/a interface=n/a member=n/a cookie=299 reply_cookie=3 signature=o error-name=n/a error-message=n/a
Got message type=signal sender=org.freedesktop.DBus destination=:1.23 path=/org/freedesktop/DBus interface=org.freedesktop.DBus member=NameAcquired cookie=2 reply_cookie=0 signature=s error-name=n/a error-message=n/a
Got message type=method_return sender=org.freedesktop.DBus destination=:1.23 path=n/a interface=n/a member=n/a cookie=3 reply_cookie=2 signature=n/a error-name=n/a error-message=n/a
Match type='signal',sender='org.freedesktop.systemd1',path='/org/freedesktop/systemd1',interface='org.freedesktop.systemd1.Manager',member='JobRemoved' successfully installed.
Got message type=signal sender=:1.0 destination=n/a path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=304 reply_cookie=0 signature=uoss error-name=n/a error-message=n/a
Got result done/Success for job test-oomd-util-9beb716b2ffb3d99.scope
Bus n/a: changing state RUNNING → CLOSED
Found cgroup2 on /sys/fs/cgroup/, full unified hierarchy
test-oomd-util: Cannot set user xattrs, skipping tests.
Error getting memory.current from /system.slice/test-oomd-util-9beb716b2ffb3d99.scope: No data available
Assertion 'oomd_cgroup_context_acquire(cgroup, &ctx) == 0' failed at src/oom/test-oomd-util.c:117, function test_oomd_cgroup_context_acquire_and_insert(). Aborting.

I wasn't able to reproduce the failure, either using autopkgtest-virt-lxd or manually in an armhf VM. You'll find below the test logs of the VM run for this particular test, as a comparison point.

ubuntu@autopkgtest:~$ sudo /usr/lib/systemd/tests/test-oomd-util
Last pgscan 33 greater than current pgscan 2 for /zupa.slice. Using last pgscan of zero.
Last pgscan 33 greater than current pgscan 1 for /herp.slice/derp.scope. Using last pgscan of zero.
Last pgscan 33 greater than current pgscan 2 for /zupa.slice. Using last pgscan of zero.
Last pgscan 33 greater than current pgscan 1 for /herp.slice/derp.scope. Using last pgscan of zero.
Last pgscan 33 greater than current pgscan 2 for /zupa.slice. Using last pgscan of zero.
Last pgscan 33 greater than current pgscan 2 for /zupa.slice. Using last pgscan of zero.
Last pgscan 33 greater than current pgscan 1 for /herp.slice/derp.scope. Using last pgscan of zero.
Bus n/a: changing state UNSET → OPENING
sd-bus: starting bus by connecting to /run/dbus/system_bus_socket...
Bus n/a: changing state OPENING → AUTHENTICATING
Bus n/a: changing state AUTHENTICATING → HELLO
Sent message type=method_call sender=n/a destination=org.freedesktop.DBus path=/org/freedesktop/DBus interface=org.freedesktop.DBus member=Hello cookie=1 reply_cookie=0 signature=n/a error-name=n/a error-message
=n/a
Sent message type=method_call sender=n/a destination=org.freedesktop.DBus path=/org/freedesktop/DBus interface=org.freedesktop.DBus member=AddMatch cookie=2 reply_cookie=0 signature=s error-name=n/a error-messag
e=n/a
Got message type=method_return sender=org.freedesktop.DBus destination=:1.13 path=n/a interface=n/a member=n/a cookie=1 reply_cookie=1 signature=s error-name=n/a error-message=n/a
Bus n/a: changing state HELLO → RUNNING
Sent message type=method_call sender=n/a destination=org.freedesktop.systemd1 path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=StartTransientUnit cookie=3 reply_cookie=0 signature
=ssa(sv)a(sa(sv)) error-name=n/a error-message=n/a
Got message type=method_return sender=:1.2 destination=:1.13 path=n/a interface=n/a member=n/a cookie=228 reply_cookie=3 signature=o error-name=n/a error-message=n/a
Got message type=signal sender=org.freedesktop.DBus destination=:1.13 path=/org/freedesktop/DBus interface=org.freedesktop.DBus member=NameAcquired cookie=2 reply_cookie=0 signature=s error-name=n/a error-messag
e=n/a
Got message type=method_return sender=org.freedesktop.DBus destination=:1.13 path=n/a interface=n/a member=n/a cookie=3 reply_cookie=2 signature=n/a error-name=n/a error-message=n/a
Match type='signal',sender='org.freedesktop.systemd1',path='/org/freedesktop/systemd1',interface='org.freedesktop.systemd1.Manager',member='JobRemoved' successfully installed.
Got message type=signal sender=:1.2 destination=n/a path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=233 reply_cookie=0 signature=uoss error-name=n/a error-messa
ge=n/a
Got result done/Success for job test-oomd-util-72cb961924930039.scope
Bus n/a: changing state RUNNING → CLOSED
Found cgroup2 on /sys/fs/cgroup/, full unified hierarchy
oomd attempting to kill 893 with KILL
oomd attempting to kill 894 with KILL
oomd attempting to kill 895 with KILL
oomd attempting to kill 896 with KILL
Last pgscan 18446744073709551615 greater than current pgscan 0 for /system.slice/test-oomd-util-72cb961924930039.scope. Using last pgscan of zero.

Revision history for this message
Simon Chopin (schopin) wrote :

Funny thing, though : https://autopkgtest.ubuntu.com/results/autopkgtest-impish/impish/arm64/s/systemd/20210830_074606_ed6ee@/log.gz -> tests triggered by the systemd upload. The mentioned test passed. Even though the libc version *is the same* (2.34-0ubuntu1 has been pulled to satisfy dependencies).

tags: added: update-excuse
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

The difference I can see between the passing and failing run is that in the failing run, systemd and glibc are upgraded before the "rebooting testbed after setup commands that affected boot" and in the passing run they are upgraded after that step. That doesn't make a lot of sense but maybe the test depends on some boot time actions that the newer libc/systemd are not doing correctly?

The fact that this doesn't reproduce outside the autopkgtest infrastructure makes me less worried about the consequences of hinting past this but also having systemd hinted is not a great place to be.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

So yes, that is what matters. When a system is booted with systemd 248.3-1ubuntu3, it sets up the cgroups in "hybrid" mode. systemd 248.3-1ubuntu5 defaults to "unified" mode but simply upgrading the package does not redo this setup -- the system remains in hybrid mode. The test cases that are failing on the autopkgtest runners only run if the system is is in unified mode, so the only case where they run is if systemd was installed before the reboot.

OK, so that's settled and I think we can conclude that this is not a regression. So the next question is why do they fail? Well I assume they are not compatible in some way with the focal GA kernel but I haven't proven this yet...

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Hmm something is off here: I'm pretty sure containers inherit their cgroups mode from the host, so the default hierarchy changing in the systemd in the archive shouldn't affect anything. I can reproduce the failure of this report by adding systemd.unified_cgroup_hierarchy=1 to the kernel command line of the container host, but then the test also fails with the systemd from the release pocket (and surely we aren't running our armhf autopkgtest container hosts with unified cgroups?).

So I'm still pretty confused but think I support hinting past this failure...

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

So it turns out that if you have security.nesting=true set on the container, then the systemd in proposed manages to set up a unified hierarchy and the test fails. This is the difference between the production test runs and mine.

I filed a bug upstream https://github.com/systemd/systemd/issues/20593

I'm not sure I want to argue for a hint for this, because the systemd tests are failing for other reasons when triggered by other packages. It's probably better to upload a systemd that skips the failing test case.

Revision history for this message
Dan Streetman (ddstreet) wrote :

Just a suggestion, before opening an upstream bug, y'all might want to test with the latest upstream code, as I think this is fixed already?

You can use this ppa to test with the latest upstream (daily) code if it makes things easier for you:
https://launchpad.net/~ubuntu-support-team/+archive/ubuntu/systemd

If I've misunderstood the bug and this isn't fixed upstream yet, I apologize and please ignore the noise :)

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote : Re: [Bug 1942113] Re: systemd unit test regression in autopkgtest (oomd-utils)

Sigh yes it does look like you found amd fixed the same bug. Oh well at
least I learnt some things yesterday.

Revision history for this message
Lukas Märdian (slyon) wrote :

Unfortunately, it doesn't seem to be the same bug, but might be similar. I can still reporduce the issue with the upstream daily-build.

I can reproduce it on an Ubuntu Impish host (kernel 5.11.0-31-generic, systemd 248.3-1ubuntu3), running an Ubuntu Impish container (kernel 5.11.0-31-generic, systemd 249.743.g8fd4d27f3c+21.10.20210909132725 – daily upstream build from [ppa:ubuntu-support-team/systemd](https://launchpad.net/~ubuntu-support-team/+archive/ubuntu/systemd)) – or basically any container that uses the unified cgroups hierarchy (like Fedora 34, as mentioned before).

Interestingly, in other places that try to read `memory.current` (cgtop.c/cgroup.c) there is a fallback to `memory.usage_in_bytes` if the system does not detect an all unified hierarchy. But those seem to be mutually exclusive:

On a host using the **hybrid** hierarchy I can only see `memory.usage_in_bytes` (same goes for hybrid container on hybrid host):
```
$ find /sys/fs/cgroup/ -wholename *system.slice/memory.current
$ find /sys/fs/cgroup/ -wholename *system.slice/memory.usage_in_bytes
/sys/fs/cgroup/memory/system.slice/memory.usage_in_bytes
```
=> Tests are skipped ("test-oomd-util: cgroups are not running in unified mode, skipping tests.")

On a host using the **unified** hierarchy I can only see `memory.current` (same goes for unified container on unified host and hybrid container on unified host):
```
$ find /sys/fs/cgroup/ -wholename *system.slice/memory.current
/sys/fs/cgroup/system.slice/memory.current
$ find /sys/fs/cgroup/ -wholename *system.slice/memory.usage_in_bytes
```
=> Tests pass as expected.

BUT: Inside a **unified** container on a **hybrid** host, I can see neither:
```
$ find /sys/fs/cgroup/ -wholename *system.slice/memory.current
$ find /sys/fs/cgroup/ -wholename *system.slice/memory.usage_in_bytes
```
=> This is a problem. The test fails, because it cannot access `memory.current` but is not skiped either.

Lukas Märdian (slyon)
affects: glibc → systemd
Revision history for this message
Lukas Märdian (slyon) wrote :

We do now have an upstream patch/PR: https://github.com/systemd/systemd/pull/20705

It's not much different from what was already landed in the systemd package (skipping the test in this scenario). So we should probably only stage it for the next systemd upload, after it was merged upstream.

Revision history for this message
Lukas Märdian (slyon) wrote :
Changed in systemd (Ubuntu):
status: New → Triaged
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 248.3-1ubuntu7

---------------
systemd (248.3-1ubuntu7) impish; urgency=medium

  * d/tests/tests-in-lxd: suppress the cgroups v2 warning on stderr from
    lxd/lxc even more comprehensively until the snapd change required to
    do it nicely gets into a release.

 -- Michael Hudson-Doyle <email address hidden> Wed, 01 Sep 2021 19:21:58 +1200

Changed in systemd (Ubuntu):
status: Triaged → Fix Released
Changed in glibc (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.