On demand cpufreq governor causes large amounts of jitter

Bug #1483586 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sysvinit (Ubuntu)
Fix Released
High
Adam Conrad
Trusty
Fix Released
High
Adam Conrad
Vivid
Won't Fix
Undecided
Unassigned

Bug Description

== Comment: #0 - Anton Blanchard <email address hidden> - 2015-07-16 22:22:09 ==
We are seeing large amounts of jitter caused by od_dbs_timer(). We should slow down the rate of updates and or turn this into a timer. Having a workqueue execute so often is very noticeable.

# echo 1 > /sys/kernel/debug/tracing/events/workqueue/workqueue_execute_start/enable

(wait a while)

# cat /sys/kernel/debug/trace

           <...>-67605 [040] .... 849622.393576: workqueue_execute_start: work struct c0000007fba1ba20: function od_dbs_timer
           <...>-67605 [040] .... 849622.403574: workqueue_execute_start: work struct c0000007fba1ba20: function od_dbs_timer
           <...>-116685 [048] .... 849622.403575: workqueue_execute_start: work struct c0000007fbc1ba20: function od_dbs_timer
           <...>-116685 [048] .... 849622.413574: workqueue_execute_start: work struct c0000007fbc1ba20: function od_dbs_timer
           <...>-67605 [040] .... 849622.413575: workqueue_execute_start: work struct c0000007fba1ba20: function od_dbs_timer
           <...>-67605 [040] .... 849622.423575: workqueue_execute_start: work struct c0000007fba1ba20: function od_dbs_timer
           <...>-116685 [048] .... 849622.433574: workqueue_execute_start: work struct c0000007fbc1ba20: function od_dbs_timer
           <...>-67605 [040] .... 849622.433574: workqueue_execute_start: work struct c0000007fba1ba20: function od_dbs_timer
           <...>-116685 [048] .... 849622.443573: workqueue_execute_start: work struct c0000007fbc1ba20: function od_dbs_timer

== Comment: #1 - Shilpasri G. Bhat <email address hidden> - 2015-07-22 19:42:38 ==
Hi Anton,

We can set the governor's tunable 'sampling_down_factor' to decrease the rate of updates. When this tunable is set to a value greater than 1, the sampling period of the governor is increased during the peak load to sampling_period times sampling_down_factor. This will reduce the jitter caused by od_dbs_timer() when the cpu is busy.

I am currently running benchmarks to find out the optimal value for this tunable and will post them soon.

Thanks and Regards,
Shilpa

== Comment: #2 - Anton Blanchard <email address hidden> - 2015-07-31 03:44:49 ==
FYI We are also seeing high levels of CPU consumed by this on a LAMP workload:

     2.54% kworker/0:0 [kernel.kallsyms] [k] osq_lock
            |
            ---osq_lock
               |
               |--99.83%-- mutex_optimistic_spin
               | __mutex_lock_slowpath
               | mutex_lock
               | |
               | |--80.08%-- od_dbs_timer

2.5% of total CPU time spent in the od_dbs_timer mutex.

== Comment: #3 - Anton Blanchard <email address hidden> - 2015-07-31 06:00:45 ==
Hitting this on a customer setup, raising priority

== Comment: #4 - Shilpasri G. Bhat <email address hidden> - 2015-08-03 06:47:40 ==
I used `perf top` and `perf record` to observe the overhead caused by 'osq_lock'.
Both with ebizzy and SPECPower's ssjb workload I am able to see an overhead of 0.03% caused by 'osq_lock' with default governor settings.
With sampling_down_factor=100, (1second) I am able to see 0.00% of overhead by 'osq_lock'.

So this might not be a good data point to showcase, but by reducing the od_dbs_timer interrupts we are guaranteed to decrease the overhead caused by 'osq_lock'.

== Comment: #5 - VAIDYANATHAN SRINIVASAN <email address hidden> - 2015-08-03 09:09:09 ==
Hi Anton,

Thanks for opening the bz to track and fix this issue. Shilpa is trying different workarounds. Here is our plan:

(1) Use sampling_down_factor and other tunables in current Ubuntu releases to workaround the issue or minimise the impact.

(2) Redesign cpufreq subsystem on powerpc similar to intel pstate driver so that we can program timers and cancel them dynamically based on different utilization points. Target Ubuntu 16.04 and then backport to 14.04.x and other distros.

(3) Enhance design for (2) buy estimating core level utilization without running timers in each thread and then decide the target PState

(4) Explore hardware assist so that we can avoid per-core estimation in software but still be able to set per-core PState. We need to take an interrupt or work-queue only to change PState and not really for estimation of load. Hence steady state load will experience zero jitter from cpufreq.

--Vaidy

== Comment: #7 - Shilpasri G. Bhat <email address hidden> - 2015-08-04 07:49:23 ==
Workaround using ondemand tunable 'sampling_down_factor':

File: /etc/init.d/ondemand (shell script which sets the governor after boot)
        if [ "$GOVERNOR" = "ondemand" ]; then
                echo 100 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
        fi

Setting the value of 100 to sampling_down_factor will increase the sampling period of ondemand governor to one second when the cpu is busy.

Revision history for this message
bugproxy (bugproxy) wrote : /etc/init.d/ondemand

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-127761 severity-high targetmilestone-inin1504
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote : Re: On demand cpufreq govneror causes large amounts of jitter

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1483586/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Revision history for this message
Breno Leitão (breno-leitao) wrote :

This bug is against 14.04 and Standard Release.

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
Revision history for this message
Steve Langasek (vorlon) wrote :

Assigning to Foundations for the init script workaround.

affects: ubuntu → sysvinit (Ubuntu)
Changed in sysvinit (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Adam Conrad (adconrad)
importance: Undecided → High
status: New → Triaged
bugproxy (bugproxy)
tags: added: targetmilestone-inin1510
removed: targetmilestone-inin1504
Revision history for this message
Breno Leitão (breno-leitao) wrote :

Hi Adam,

I understand that we will not be able to have this fixed by the 15.10 release, right?

Revision history for this message
Adam Conrad (adconrad) wrote :

Sure, we can get it fixed for 15.10 (and SRUed back).

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sysvinit - 2.88dsf-59.2ubuntu2

---------------
sysvinit (2.88dsf-59.2ubuntu2) wily; urgency=medium

  * Adjust sampling_down_factor to 100 on ppc64 kernels (LP: #1483586)

 -- Adam Conrad <email address hidden> Thu, 15 Oct 2015 20:43:12 -0600

Changed in sysvinit (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Breno Leitão (breno-leitao) wrote :

I just tested it on Wily and it is setting the sampling_down_factor properly. Thanks!

$ cat /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
100
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
ondemand

Steve Langasek (vorlon)
Changed in sysvinit (Ubuntu Trusty):
assignee: nobody → Adam Conrad (adconrad)
status: New → Triaged
importance: Undecided → High
tags: added: targetmilestone-inin1404
bugproxy (bugproxy)
tags: removed: targetmilestone-inin1404
Steve Langasek (vorlon)
Changed in sysvinit (Ubuntu Trusty):
milestone: none → ubuntu-14.04.4
summary: - On demand cpufreq govneror causes large amounts of jitter
+ On demand cpufreq governor causes large amounts of jitter
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2015-11-16 23:34 EDT-------
There are not any dates driving the Inclusion in trusty 14.04, so is it not urgent. Wiley was the concern which is complete.

Revision history for this message
Adam Conrad (adconrad) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted sysvinit into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/sysvinit/2.88dsf-41ubuntu6.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in sysvinit (Ubuntu Vivid):
status: New → Won't Fix
Changed in sysvinit (Ubuntu Trusty):
status: Triaged → Fix Committed
tags: added: verification-needed
Adam Conrad (adconrad)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sysvinit - 2.88dsf-41ubuntu6.3

---------------
sysvinit (2.88dsf-41ubuntu6.3) trusty; urgency=medium

  * Adjust sampling_down_factor to 100 on ppc64 kernels (LP: #1483586)

 -- Adam Conrad <email address hidden> Thu, 15 Oct 2015 20:43:12 -0600

Changed in sysvinit (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Adam Conrad (adconrad) wrote : Update Released

The verification of the Stable Release Update for sysvinit has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.