Performance workaround for Dell 7390 2-in-1 Ice Lake

Bug #1874933 reported by Srinivas Pandruvada
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
thermald (Ubuntu)
Fix Released
High
Colin Ian King
Focal
Fix Released
Undecided
Unassigned

Bug Description

== SRU justification focal ==

As reported here:
https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/intel-linux/1174225-dell-xps-7390-intel-ice-lake-performance-hit-hard-by-a-linux-kernel-regression?view=stream

This primarily impacts "Ubuntu 20.04 LTS (Focal Fossa)." as it switched to 5.4 kernel.
The 5.4 kernel added support for "Processor thermal device", for Ice Lake, which will expose the power tables (via PPCC).

This system default "max RAPL long term power limit" is 15W. But this power table is specifying as 9W. So thermald will limit power to 9W.

If dptfxtract is executed, then power limit will be higher than power up value, but most of the users will use out of the box setup. So this need a workaround.

This workaround will ignore any power limit less than the power up power limit.

This is addressed in thermald 2.1 with two commits:
https://github.com/intel/thermal_daemon/commit/f7db434293387c965e8d9141608f855893740e3a
https://github.com/intel/thermal_daemon/commit/c3461690eafb7304bf59a39fb02955a5154b3861

I know 20.04 LTS uses 1.9.1. I can assist in backport if required.

== Fix ==

Two upstream commits to ease backporting:
   - eeadf7d2efe Restore to min state on deactivation without
     depending on hardware state
   - 9a6dc27879a Clean up the code and documentation

Two upstream commits for the fix:
   - f7db4342933 Avoid polling power in non PPCC case
   - c3461690eaf Ignore invalid PPCC max power limit

== Test case ==

Open two terminals:
-In the first terminal run the following command:
   "sudo turbostat --show PkgWatt"
-In the second terminal run some all CPU busy workload, like stress-ng or mprime

After few seconds turbostat will show that power is capped around 9W.

Install the updated thermald, and repeat.

Now with this fix the power should be capped around 15W.

== Regression Potential ==

This fix involves changing the power limits logic so there is a potential that this may affect change the throttling behaviour of other systems with
poorly defined PPCC power tables because it now ignores the power limits
less than the power up limits. Users will see their machines run faster
and hence active cooling may crank up (e.g. fans) but I think the speed
improvement outweighs the noise factor.

Note that these changes are already in thermald 2.1 that is now in Ubuntu Groovy 20.10.

---------------------------

description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1874933

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Francois Thirioux (fthx) wrote :

Does 2.1 addresses, in the mean time, the performance bug affecting Thinkpads ?

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

I am not sure what is the thinkpad issue. Is it something new or old which should have been fixed with dptfxtract and thermald?

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

Power limits from this platform:
abuser@labuser-XPS-13-7390-2-in-1:/$ grep -r . sys/bus/pci/devices/0000\:00\:04.0/power_limits/*
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_max_uw:9000000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_min_uw:2500000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_step_uw:100000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_tmax_us:28000000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_tmin_us:24000000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_max_uw:15000000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_min_uw:6000000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_step_uw:100000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_tmax_us:28000000
sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_tmin_us:24000000

You can see 9000000 as max power limit 0.

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

Please change this to "Confirmed".
As you can see the power limits, it will limit performance from what you can get at 15W.

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

Anything more is required this to be applied?

affects: linux (Ubuntu) → thermald (Ubuntu)
Revision history for this message
Mitchell Lomme (mlomme) wrote :

Same issue for me on Dell XPS 9300. CPU is i7-1065G7.

Using 20.04 LTS + Kernel 5.6.11 and thermald 1.9.1-1build1.

root@laptop:/home/root# grep -r . /sys/bus/pci/devices/0000\:00\:04.0/power_limits/*
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_max_uw:9000000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_min_uw:2500000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_step_uw:100000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_tmax_us:28000000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_0_tmin_us:24000000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_max_uw:15000000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_min_uw:6000000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_step_uw:100000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_tmax_us:28000000
/sys/bus/pci/devices/0000:00:04.0/power_limits/power_limit_1_tmin_us:24000000

Revision history for this message
Colin Ian King (colin-king) wrote :

Ubuntu 20.10 Groovy will have the latest 2.1 thermald hopefully in the next few hours. I'll backport the fixes and SRU this for focal.

Changed in thermald (Ubuntu):
status: Incomplete → In Progress
importance: Undecided → Medium
importance: Medium → High
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Colin Ian King (colin-king) wrote :

@Srinivas, commit f7db434293387c965e8d9141608f855893740e3a does not apply cleanly, I guess there are some RAPL related patches that are prerequisites. Do you mind assisting on a backport here as I don't want to miss out the important commits that are also required.

description: updated
description: updated
Changed in thermald (Ubuntu):
status: In Progress → Fix Committed
Steve Langasek (vorlon)
Changed in thermald (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

For such a big code change I would like to see a clear test case before approving this SRU. Currently the test case is "As reported here: <URL>", which is not very easy-to-follow. Actually, even on the phoronix post, without proper context, I can't really easily find any reproduction steps. How would one formally check if performance workaround really works? Can you outline those in the bug description?

I don't want to reject this upload from the queue as it is good in principle, but for such a big diff I'd like to have a decent test-case, if possible.

Changed in thermald (Ubuntu Focal):
status: New → Incomplete
Revision history for this message
Jin-Dong Kim (jindong-kim) wrote :

Is this fix going to be released? Or, abandoned? I got a XPS-13-7390-2-in-1, and was waiting for the release of this fix. If necessary, I may want to provide a test.

Revision history for this message
Robie Basak (racb) wrote :

This is blocked on someone writing a test case as requested by Łukasz in comment 10.

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

To reproduce this:

Boot Ubuntu 20.04 LTS (Focal Fossa)with 5.4 kernel.

Open two terminals:
-In the first terminal run the following command "turbostat --show PkgWatt"
-In the second terminal run some all CPU busy workload, like stress-ng or mprime

After few seconds turbostat will show that power is capped around 9W.
Now with this fix the power will be capped around 15W.

So you gain performance worth 6W.

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

What else is needed here?

Revision history for this message
Chris Halse Rogers (raof) wrote :

I've added the testcase to the bug description. That seems like a sensible enough reproducer.

description: updated
Changed in thermald (Ubuntu Focal):
status: Incomplete → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Chris Halse Rogers (raof) wrote : Please test proposed package

Hello Srinivas, or anyone else affected,

Accepted thermald into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/thermald/1.9.1-1ubuntu0.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

The attached file contains two screen shots:
- power_limit_before.png (old version thermald/now 1.9.1-1ubuntu0.1 amd64)
- power_limit_after.png (new version thermald/now 1.9.1-1ubuntu0.2 amd64)

Under "stress" workload, the max power consumed is capped below 9W. With the new version it is maintaining up to 15W. So the proposed version ignored PPCC power limit of 9W.

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

Used version
#apt list | grep thermald

thermald/now 1.9.1-1ubuntu0.2 amd64 [installed,local]

tags: added: verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

# dpkg -l thermald | cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============-================-============-=========================================
ii thermald 1.9.1-1ubuntu0.2 amd64 Thermal monitoring and controlling daemon

hardware
Handle 0x0100, DMI type 1, 27 bytes
System Information
 Manufacturer: Dell Inc.
 Product Name: XPS 13 7390 2-in-1
 Version: Not Specified

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package thermald - 1.9.1-1ubuntu0.2

---------------
thermald (1.9.1-1ubuntu0.2) focal; urgency=medium

  * Performance workaround for Dell 7390 2-in-1 Ice Lake (LP: #1874933)
   - 5.4 kernel added support for "Processor thermal device" for Ice Lake
     via the PPCC power tables. The power table specified for Dell 7390
     2-in-1 specifies this as 9W so thermald will limit it to this.
     This is a workaround that will ignore power limits less than the
     power up power limit to workaround this throttling. Requires a
     couple of prerequisite patches to apply and final 2 patches for
     the fix.
   - eeadf7d2efe Restore to min state on deactivation without
     depending on hardware state
   - 9a6dc27879a Clean up the code and documentation
   - f7db4342933 Avoid polling power in non PPCC case
   - c3461690eaf Ignore invalid PPCC max power limit

 -- Colin King <email address hidden> Mon, 18 May 2020 09:26:23 +0100

Changed in thermald (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for thermald has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.