ubuntu-fan autopkgtests are broken against systemd-resolved

Bug #1718548 reported by Steve Langasek
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-fan (Ubuntu)
Triaged
High
Unassigned
Xenial
Fix Committed
Medium
Stefan Bader
Zesty
Fix Committed
Medium
Stefan Bader

Bug Description

Now that 17.10 has migrated to netplan+systemd-networkd+systemd-resolved by default, ubuntu-fan autopkgtests are failing; at least one reason for this failure is wrong detection of the default network interface because autopkgtest environments appear for some reason to be getting multiple default routes:

autopkgtest [20:14:59]: test lxd: [-----------------------
Error: either "dev" is duplicate, or "ens2" is a garbage.
II: Auto-init LXD...
LXD has been successfully configured.
II: Creating Fan Bridge...
/usr/sbin/fanatic: .0.0/16: unknown underlay network format
FAIL: Error on enable-fan
autopkgtest [20:15:10]: test lxd: -----------------------]

(https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-artful/artful/amd64/u/ubuntu-fan/20170916_203221_4f037@/log.gz)

The other part failure appears to be a race with systemd-resolved in setting up the network inside of docker containers:

autopkgtest [17:25:39]: test docker: [-----------------------
Running in the Canonical CI environment
II: Auto-create Fan Bridge...
configuring fan underlay:10.220.0.0/16 overlay:250.0.0.0/8
II: Create docker Fan Network...
configuring docker for underlay:10.220.0.0/16 overlay:250.0.0.0/8 (fan-250 250.4
6.84.0/24)
1c8a03aa50ab88b997c3b3aee88bfa6933b14368f1594db4a800c2ae8d62074f
II: Test docker...
local docker test: pulling container images ...
Using default tag: latest
latest: Pulling from library/ubuntu
d5c6f90da05d: Pulling fs layer
[...]
2b8db33536d4: Pull complete
Digest: sha256:2b9285d3e340ae9d4297f83fed6a9563493945935fc787e98cc32a69f5687641
Status: Downloaded newer image for ubuntu:latest
local docker test: creating test container ...
5cbdd232b71bc0738f27f1fa3eae4360716bfe3f8cea976874bf6309b1843169
 slave: installing ping ...
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial/InRelease Temporary failure resolving 'archive.ubuntu.com'
[...]
E: Unable to locate package iputils-ping
 slave: installing nc ...
E: Unable to locate package netcat-openbsd
test master: ping test (250.46.84.2) ...
test slave: ping test (250.46.84.1) ...
test slave: ping test ... FAIL
--- transcript start ---
/bin/sh: 108: ping: not found
--- transcript end ---
test slave: short data test (250.46.84.2 -> 250.46.84.1) ...
test master: ping test ... PASS
test master: short data test (250.46.84.1 -> 250.46.84.2) ...
autopkgtest [20:12:19]: ERROR: timed out on command [...]

I've reproduced this problem locally and prepared a partial patch based on a similar fix to the docker.io package's autopkgtests - but so far this only fixes races in the lxd test, not in the docker test, so it needs further investigation.

Revision history for this message
Steve Langasek (vorlon) wrote :

Attached is the partial patch, which should also provide insight into how to fix the docker test.

Revision history for this message
Stefan Bader (smb) wrote :

Ok, I get the drift, but frankly when systemd-resolvd/systemd-networkd change the startup timing that drastically I guess fan tests failing might be the least of our problems...

Stefan Bader (smb)
Changed in ubuntu-fan (Ubuntu):
assignee: nobody → Stefan Bader (smb)
importance: Undecided → High
status: New → In Progress
Revision history for this message
Stefan Bader (smb) wrote :

I could not see the docker failure when running a local ADT test. So I added the checking with some additional output to see what is going on. The latest test failure (at least now it is detected quickly) http://autopkgtest.ubuntu.com/packages/u/ubuntu-fan/artful/amd64 shows that the host side is ok (using systemd-resolve --status on the first default route found returns a sensible forwarder). Also the fact that the image can be downloaded hints that the host side is good. And the LXD test is using the same setup (also the same underlay of 10.220.0.0/16) and works.

The difference is that LXD is using the dnsmasq based resolver which the fan tools set up, while docker is using its own (127.0.0.11) service. Not sure what the exact environment setup in ADT testing is, but could docker be affected by the same multiple default route issue which we had with the LXD tests? I don't think fanatic can do much as dns does not seem to be something one can influence with network create.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ubuntu-fan - 0.12.6

---------------
ubuntu-fan (0.12.6) artful; urgency=medium

  * fanatic: Add short success delay to nc_send (LP: #1721352)
  * fanatic: Catch test preparation steps failing (LP: #1718548)
  * fanatic: Add DNS checks to local-test preparation (LP: #1718548)

ubuntu-fan (0.12.5) artful; urgency=medium

  * DEP8: Fix LXD default interface detection (LP: #1718548)
  * fanctl: return error on fail_up (LP: #1719644)

 -- Stefan Bader <email address hidden> Fri, 06 Oct 2017 12:15:38 +0200

Changed in ubuntu-fan (Ubuntu):
status: In Progress → Fix Released
Stefan Bader (smb)
Changed in ubuntu-fan (Ubuntu Xenial):
assignee: nobody → Stefan Bader (smb)
importance: Undecided → Medium
status: New → In Progress
Changed in ubuntu-fan (Ubuntu Zesty):
assignee: nobody → Stefan Bader (smb)
importance: Undecided → Medium
status: New → In Progress
Changed in ubuntu-fan (Ubuntu):
assignee: Stefan Bader (smb) → nobody
Changed in ubuntu-fan (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in ubuntu-fan (Ubuntu Zesty):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-zesty
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Steve, or anyone else affected,

Accepted ubuntu-fan into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ubuntu-fan/0.12.7~17.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-zesty to verification-done-zesty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-zesty. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Steve, or anyone else affected,

Accepted ubuntu-fan into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ubuntu-fan/0.12.7~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed-xenial
Revision history for this message
Steve Langasek (vorlon) wrote :

I don't understand why this was SRUed to xenial, where we do not use systemd-networkd / systemd-resolved; and there is no SRU test case listed in the bug. Łukasz, why was this accepted and what do you expect for testing?

tags: added: verification-failed-xenial
removed: verification-needed-xenial
Revision history for this message
Steve Langasek (vorlon) wrote :

The britney hint was bumped for ubuntu-fan to override the failing autopkgtests. Which means this bug was automatically marked as closed by the migration of a version of the package that still had failing autopkgtests. Reopening.

Changed in ubuntu-fan (Ubuntu):
status: Fix Released → Triaged
Revision history for this message
Stefan Bader (smb) wrote :

@Steve, we try to get the code base of Fan synced across releases (one part of the reasons is that docker and lxd may backport things at any time, the other part is to avoid having to maintain subtly different code bases). The changes I attributed to this bug are an additional exit in an awk script which is verified by the tests still working. The other part is adding tests and print statements for DNS configuration (host and container side) which can be verified by looking at the test logs.

As for the docker tests failing, we probably should move that to its own bug. Right now this only happens in the CI environment and I cannot see much that the Fan testing could do:
- the hosting VM (artful/bionic) is using netplan and from print statements
  has systemd-resolv ready before launching the container
- the container (docker) uses its own DNS resolver, we pull latest lts images so
  those are xenial based atm.
- the image is pulled by dockerd, so that has at least DNS resolution for the
  proxy
- however inside the container (and unfortunately that is rather limited in the
  supplied commands) DNS lookups via the docker resolver fail.

I can pull the current cloud-images used by adt locally and get all testing passed without the slightest issue.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 1718548] Re: ubuntu-fan autopkgtests are broken against systemd-resolved

On Wed, Nov 15, 2017 at 09:32:18AM -0000, Stefan Bader wrote:
> @Steve, we try to get the code base of Fan synced across releases (one
> part of the reasons is that docker and lxd may backport things at any
> time, the other part is to avoid having to maintain subtly different
> code bases). The changes I attributed to this bug are an additional exit
> in an awk script which is verified by the tests still working. The other
> part is adding tests and print statements for DNS configuration (host
> and container side) which can be verified by looking at the test logs.

> As for the docker tests failing, we probably should move that to its own bug. Right now this only happens in the CI environment and I cannot see much that the Fan testing could do:

The bottom line here is that an SRU with failing autopkgtests is not
releasable. The CI infrastructure is the *primary target* for autopkgtests.
If you are not going to support these autopkgtests in the CI infrastructure,
they should be disabled, not left failing.

From my perspective the failing docker.io tests aren't a different bug, they
started failing at the same time according to
<http://autopkgtest.ubuntu.com/packages/u/ubuntu-fan/artful/amd64>.

Revision history for this message
Stefan Bader (smb) wrote :

Event though both test failures where triggered by the change from ifupdown to systemd-networkd/resolvd the reasons for the problems are completely different. For lxd the problem is in the dep8 test script. For docker it is turns out to be a problem of docker itself to automatically guess which nameserver to use.

For this reason I created three individual bug reports:
- bug #1732739 -> lxd breakage
- bug #1732747 -> additional DNS checks and logging (pre-req
                  for docker fix)
- bug #1732717 -> docker breakage

I will duplicate this bug against the first new one because the bug description was talking mostly about this one. And then use the new bug reports for driving the delivery of fixes.

Stefan Bader (smb)
tags: removed: verification-failed-xenial verification-needed verification-needed-zesty
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.