Linux kernel autopkgtests often fail due to version mismatch between running kernel and tested kernel.

Bug #1668353 reported by Robert Bruce Park
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Auto Package Testing
New
Undecided
Unassigned
britney
New
Undecided
Unassigned

Bug Description

Quite often, kernel autopkgtests will fail because the kernel version running on the autopkgtest system is older than the one being tested by the autopkgtest.

Need some solution for rebooting autopkgtest vm with newer kernel in these cases.

Changed in britney:
assignee: nobody → Robert Bruce Park (robru)
Revision history for this message
Iain Lane (laney) wrote :

Can you please share a log file that exhibits this problem?

Revision history for this message
Brad Figg (brad-figg) wrote :

I have run into this problem in the past when doing automated bare-metal testing. My "solutions" are:

1. We test when kernels hit -proposed. We wait approx. 2 hrs. after LP says the packages are in proposed to let publishing "settle" before kicking off testing.
2. I do my deployments using MAAS. After deploying a system, one of the first things I do is remove /etc/apt/apt.conf.d/90curtin-aptproxy which has caused problems in the past with me being able to get to a -proposed kernel.

Revision history for this message
Robert Bruce Park (robru) wrote :

I don't have a log, sorry, slangasek asked me to file this on his behalf.

Revision history for this message
Martin Pitt (pitti) wrote :

Please do share a log file. Our production cloud/LXC runnners all use ftpmaster.internal just like britney, there should not be any mirror lag between those.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 1668353] Re: Linux kernel autopkgtests often fail due to version mismatch between running kernel and tested kernel.

On Mon, Feb 27, 2017 at 08:12:14PM -0000, Martin Pitt wrote:
> Please do share a log file. Our production cloud/LXC runnners all use
> ftpmaster.internal just like britney, there should not be any mirror lag
> between those.

The point is not mirror lag, but that AIUI if there is a newer kernel in
-proposed than the one we've booted on the instance, the tests fail due to
the mismatch.

http://autopkgtest.ubuntu.com/packages/l/linux/zesty/amd64 actually shows
no successful test runs except the first.

Revision history for this message
Iain Lane (laney) wrote :

On Mon, Feb 27, 2017 at 09:38:19PM -0000, Steve Langasek wrote:
> On Mon, Feb 27, 2017 at 08:12:14PM -0000, Martin Pitt wrote:
> > Please do share a log file. Our production cloud/LXC runnners all use
> > ftpmaster.internal just like britney, there should not be any mirror lag
> > between those.
>
> The point is not mirror lag, but that AIUI if there is a newer kernel in
> -proposed than the one we've booted on the instance, the tests fail due to
> the mismatch.
>
> http://autopkgtest.ubuntu.com/packages/l/linux/zesty/amd64 actually shows
> no successful test runs except the first.

I was under the impression that these are 'real' failures that need to
be investigated. Let's take the latest one at the time of writing, which
I suggest you download rather than opening in your browser

  https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-zesty/zesty/amd64/l/linux/20170227_214035_3911d@/log.gz

...this reports that the 'ubuntu-regression-suite' test is failing.

If you go to line 74580 you can see that the current kernel in
zesty-proposed gets installed. The machine then reboots (74830), and the
tests report that the new one is in use and is the same version as the
source package (75248).

I feel like I'm probably misunderstanding the issue - so if you could
point to lines in a log file that show the issue you're talking about as
a problem in the infrastructure, we could certainly take a look at the
problem.

--
Iain Lane [ <email address hidden> ]
Debian Developer [ <email address hidden> ]
Ubuntu Developer [ <email address hidden> ]

Revision history for this message
Steve Langasek (vorlon) wrote :

On Mon, Feb 27, 2017 at 10:30:39PM -0000, Iain Lane wrote:
> I was under the impression that these are 'real' failures that need to
> be investigated. Let's take the latest one at the time of writing, which
> I suggest you download rather than opening in your browser

> https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac
> /autopkgtest-zesty/zesty/amd64/l/linux/20170227_214035_3911d@/log.gz

> ...this reports that the 'ubuntu-regression-suite' test is failing.

> If you go to line 74580 you can see that the current kernel in
> zesty-proposed gets installed. The machine then reboots (74830), and the
> tests report that the new one is in use and is the same version as the
> source package (75248).

> I feel like I'm probably misunderstanding the issue - so if you could
> point to lines in a log file that show the issue you're talking about as
> a problem in the infrastructure, we could certainly take a look at the
> problem.

It's possible that when I talked to Robert about the problem, I was
operating on stale data. If those are all genuine failing tests, that's
also bad.

But when I look at that last log, which shows in the table as: Version:
4.10.0-9.11, Triggers: gcc-6/6.3.0-8ubuntu1, this looks to me like the
*wrong* version of Linux is being tested. Why are we testing a
proposed-only version of the kernel to validate the new upload of gcc-6 in
proposed? We should be testing the released version of linux.

That looks to me like the autopkgtest behavior has changed, as a workaround
for the previous failures, but that it's still not testing the pairs that we
would want tested for CI.

Revision history for this message
Martin Pitt (pitti) wrote :

> The point is not mirror lag, but that AIUI if there is a newer kernel in -proposed than the one we've booted on the instance, the tests fail due to the mismatch.

This is not generally true. Whenever the setup commands (like "upgrade testbed to -proposed") install a kernel or anythign else that affects booting from -proposed, it gets rebooted before running the test. E. g. in this one:

https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-zesty/zesty/amd64/l/linux/20170111_222734_8664a@/log.gz

you see:

    autopkgtest [16:43:42]: testbed running kernel: Linux 4.9.0-12-generic #13-Ubuntu SMP Mon Jan 9 20:06:25 UTC 2017

and that's indeed the one that was previously installed from -proposed.

> http://autopkgtest.ubuntu.com/packages/l/linux/zesty/amd64 actually shows
no successful test runs except the first.

If you look at the first one more closely, it's 4.4 from April 2016. It's not a "real" zesty test, it's just the last successful result that we imported into zesty (and yakkety); we do that for all packages at the beginning of a release cycle to ensure that our "always failed vs. regression" logic works. Our kernel tests actually have been broken since then.

> But when I look at that last log, which shows in the table as: Version: 4.10.0-9.11, Triggers: gcc-6/6.3.0-8ubuntu1, this looks to me like the *wrong* version of Linux is being tested. Why are we testing a proposed-only version of the kernel to validate the new upload of gcc-6 in proposed?

Wrong way around. This tests current gcc-6 against a proposed -linux. We do that to ensure gcc gets along with the new linux-libc-dev. In particular, gcc, binutils, and linux all cross-test each other on every change.

Revision history for this message
Martin Pitt (pitti) wrote :

> > this looks to me like the *wrong* version of Linux is being tested. Why are we testing a proposed-only version of the kernel to validate the new upload of gcc-6 in proposed?

> Wrong way around.

Sorry, right way around, I was confused. The real answer is that we disable apt pinning for linux, as that gets confused too much with linux-meta: https://git.launchpad.net/~ubuntu-release/+git/autopkgtest-cloud/tree/worker/worker#n341

Revision history for this message
Steve Langasek (vorlon) wrote :

This issue also affects autopkgtests of all kernel sources other than the generic flavor, from what I can see. E.g.:

http://autopkgtest.ubuntu.com/packages/l/linux-hwe/xenial/i386
autopkgtest [19:14:57]: test ubuntu-regression-suite: [-----------------------
Source Package Version: 4.8.0-44.47~16.04.1
Running Kernel Version: 4.4.0-70.91
ERROR: running version does not match source package
autopkgtest [19:15:01]: test ubuntu-regression-suite: -----------------------]

Or http://autopkgtest.ubuntu.com/packages/l/linux-hwe-edge/xenial/amd64
Or http://autopkgtest.ubuntu.com/packages/l/linux-gke/xenial/amd64

It may not be appropriate in all cases to run non-generic kernels in the autopkgtest infrastructure, but at least the tests should not fail because the autopkgtests are incapable of running the right kernel altogether.

Changed in britney:
assignee: Robert Bruce Park (robru) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.