edk2 autopkgtest spotted a real issue, still we need to mitigate it for now

Bug #2008865 reported by Christian Ehrhardt 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
edk2 (Ubuntu)
Fix Released
Undecided
dann frazier
qemu (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Hi,
as Dannf and I have discussed on IRC already this is a bit convoluted, so I need to outline what this is about.

Qemu has landed a patch [1] upstream that fixes serious issues in emulation and dword access.

Due to being important Debian has pulled it into qemu 7.2 (upstream hasn't yet released a version with it applied, just in the main branch).

Sadly this affects TCG emulation on s390x (needs to be big endian) when running with more virtual cpus (two) than real (one).
That issue is reported upstream [2], but due to being an edge case might not get a fast resolution (if at all). In general anything using more vcpus than real cpus is kind of out of support - so I'm not sure we can expect much.

Sadly the edk2 tests have it all
- #1 s390x - all tests except s390x are fine, I tested arm&power they do not expose the slowdown due to [1][2].
- #2 host cpus - The autopkgtest of edk2 run with the default of 1 vcpu in the Ubuntu autopkgtest infrastructure.
- #3 guest vcpus - edk2 tests use Qemu.QemuCommand from debian/python/UEFI/Qemu.py which sets a hardcoded -smp 2
- #4 timing - the test runs in ~5-6 seconds usually, but with the problem exposed it is ~55-65 seconds which hits the timeout in the tests of edk2 (set to 60s)

We have many options now (listed from worst to best):
- #1 we are just barely on the 60s timeout, we could hit retry rather often to pass by chance at some point
- #2 wait until qemu has a patch and apply, but that is unsure to happen in time
- #3 we could modify src:edk2 to bump the test timeout 60 -> 120, that will mask the issue
- #4 if we need -smp 2 for any of the edk2 tests, then we should add it to big_packages [3] to ensure we never hit vcpu > host cpu (which could also cause other pain elsewhere)
- #5 we could modify src:edk2 to use '-smp 1,sockets=1,cores=1,threads=1' instead if that has no other negative implications

@Dannf - I think I'd want your input to pick what we should do, so WDYT?

[1]: https://gitlab.com/qemu-project/qemu/-/commit/dab30fbef3896bb652a09d46c37d3f55657cbcbb
[2]: https://gitlab.com/qemu-project/qemu/-/issues/1520
[3]: https://git.launchpad.net/~ubuntu-release/autopkgtest-cloud/+git/autopkgtest-package-configs

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Added a qemu task and update-excuse so people will find it more easily.

Changed in qemu (Ubuntu):
status: New → Confirmed
tags: added: update-excuse
Changed in edk2 (Ubuntu):
assignee: nobody → dann frazier (dannf)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Assigning to Dannf to pop up in his inbox so he can let us know the preferred way out of the current situation.

summary: - edk2 autopkgtest spotted a real issue, still we need to mitigate for now
+ edk2 autopkgtest spotted a real issue, still we need to mitigate it for
+ now
Revision history for this message
dann frazier (dannf) wrote :

Thanks for figuring out what was going on! I'm unaware of any negative implications of going 1 cpu - the 2 cpu config was cargo-culted from ovmf-vars-generator. Fix uploaded to Debian.

Changed in edk2 (Ubuntu):
status: New → In Progress
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package edk2 - 2022.11-5

---------------
edk2 (2022.11-5) unstable; urgency=medium

  * autopkgtest: Use 1 CPU QEMU models instead of 2. This avoids a
    performance issue on s390x instances with 1 host CPU that can
    result in timeouts. LP: #2008865.

 -- dann frazier <email address hidden> Wed, 01 Mar 2023 21:55:27 -0700

Changed in edk2 (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This has been fixed, time to stop showing up as update-excuse

tags: removed: update-excuse
Changed in qemu (Ubuntu):
status: Confirmed → Invalid
status: Invalid → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.