CPU overheats during high usage "throttling <not supported>"

Bug #22336 reported by Daugirdas
182
This bug affects 1 person
Affects Status Importance Assigned to Milestone
acpi
Fix Released
Unknown
Baltix
Invalid
Undecided
Unassigned
Fedora
New
Undecided
Unassigned
acpi-support (Ubuntu)
Invalid
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Unassigned
linux-source-2.6.17 (Ubuntu)
Won't Fix
High
Unassigned
linux-source-2.6.20 (Ubuntu)
Invalid
Critical
Unassigned

Bug Description

I can't compile anything larger than ndiswrapper. I couldn't even produce a
backtrace for gthum. I can't play tuxracer either. My laptop overheats while
running linux, and is powered off by kernel:
Sep 23 12:21:20 localhost kernel: [ 728.519975] Critical temperature reached
(80 C), shutting down.

I must stress that this issue doesn't occur while running WinXP x64 edition. I
ran CPU Burn-in for 7 min without any problems while it takes only 3min on linux...
I can play games for say half an hour (since later I just lose any interest).

My hardware is: Acer Aspire 1522WLMi (AMD64 3000+).
/etc/modules contains powernow-k8, and cpufreq_userspace.
I haven't tried any other distro on this laptop properly so I can't confirm if
it is ubuntu specific.

Revision history for this message
Matt Zimmerman (mdz) wrote :

CPU frequency scaling doesn't keep your laptop from overheating; is the fan
activating? Are the fan and thermal modules loaded?

Revision history for this message
Matthew Garrett (mjg59) wrote :

Is the fan switching on?

Revision history for this message
Daugirdas (daugirdas) wrote :

yes, thermal and fan are loaded. However it seems only BIOS has control of a fan
speed (I know this for sure, just don't ask me where from,- possibly acer.com)
The fan spins up at the full speed when I am CPU usage increases. So that all
seems ok. I guess you might argue that my laptop is defective. But on the other
hand I was not able to reproduce that on windows. There is AMD proc. driver for
windows which might be actually doing all the job.

Sorry I can't be any more specific.

Revision history for this message
Matt Zimmerman (mdz) wrote :

If the fan is running at full speed and yet the CPU is overheating, I don't see
how this can be the fault of the operating system

Revision history for this message
Daugirdas (daugirdas) wrote :

yet somehow windows works. I guess there should be some module/daemon which
tries to reduce CPU voltage in case too much heat is eliberated.

Revision history for this message
Matthew Garrett (mjg59) wrote :

This should be happening automatically when your system reaches the passive trip
point in /proc/acpi/thermal_zone/*/trip_points. Can you attach the contents of
that file?

Revision history for this message
Daugirdas (daugirdas) wrote :

Created an attachment (id=4031)
/proc/acpli/thermal_zone/THRC/

Revision history for this message
Daugirdas (daugirdas) wrote :

Created an attachment (id=4032)
.../THRS/

When the temperature of 80C is reached the laptop is shutdown (in software mode
rather than hard poweroff). As you mentioned in your message the temperature
should be lowered by readjusting [something] instead.

Revision history for this message
Matthew Garrett (mjg59) wrote :

Hmm. Interesting. Can you attach the contents of /proc/acpi/dsdt ?

Revision history for this message
Daugirdas (daugirdas) wrote :

Created an attachment (id=4035)
/proc/acpi/dsdt

Revision history for this message
Matthew Garrett (mjg59) wrote :

Ok. THRS is your system temperature rather than your CPU temperature - the
system is supposed to start slowing the CPU down once it gets to 75 degrees. Can
you try the following:

watch cat /proc/acpi/thermal_zone/THRS/trip_points

then do something CPU intensive. When the temperature gets above 75 degrees, cat
/proc/acpi/processor/*/throttling and see if it changes - a * should appear by
the currently active field.

In *theory*, once the temperature gets above 75 degrees, the processor should
throttle down heavily. In practice, 5 degrees may be sufficiently little that
the kernel may not ramp up the throttling fast enough to help. Your DSDT states
that the resampling should only take place every 30 seconds - if the kernel
doesn't throttle the machine sufficiently, it may reach the trip point before
that period has elapsed.

Revision history for this message
Daugirdas (daugirdas) wrote :

daugirdas@dtr-linux64:~$ cat /proc/acpi/processor/CPU0/throttling
<not supported>

I didn't run anything CPu hungry yet as I don't like [the above] at all. Any
modules or whatever else could be missing here?

Revision history for this message
Matthew Garrett (mjg59) wrote :

Ooh. That would explain it. I'll look into why that might be appearing.

Revision history for this message
Matthew Garrett (mjg59) wrote :

Ok, your system doesn't support throttling. That's a little surprising, but not
necessarily fatal. Can you attach the output of lsmod?

Revision history for this message
Matthew Garrett (mjg59) wrote :

Ah. I bet I know what it is. The ACPI layer should slow your CPU down, but
powernowd immediately notices that and speeds it up again. Can you stop
powernowd, make sure that powernow-k8 is still loaded and see if the system
still fails?

Revision history for this message
Daugirdas (daugirdas) wrote :

Created an attachment (id=4045)
output of lsmod

Revision history for this message
Daugirdas (daugirdas) wrote :

Created an attachment (id=4046)
CPU capabilities

generated on windows. The file supports the idea that throttling is not enabled
on this CPU.

Revision history for this message
Daugirdas (daugirdas) wrote :

Well I tried disabling powernowd. That doesn't seem to help at all. It still
powers down. ../THRS/temperature indicated 54C just before shutting down. There
is that enormous increase in just a fraction of a second.

Revision history for this message
Matthew Garrett (mjg59) wrote :

Hmm. Are you running the latest kernel (2.6.12-9)? That has some code that may
help in this respect (temperature events being reported slowly).

Revision history for this message
Daugirdas (daugirdas) wrote :

Yes, 2.6.12-9 k8 ubuntu stock kernel. I've been always running ubuntu latest kernel

Revision history for this message
Matthew Garrett (mjg59) wrote :

Still with powernowd disabled, can you try

echo 1 >/proc/acpi/thermal_zone/THRS/polling_frequency

and see if that results in the temperature being read any more smoothly?

Revision history for this message
Daugirdas (daugirdas) wrote :

Still I get poweroff @ 5X C. I wonder if it is the actual temperature that
kernel responds to in this case.

Revision history for this message
Chris Moore (dooglus) wrote :

I just noticed this bug.

I have the same problem. If I run a CPU intensive task, my laptop switches off.
 If I want to recompile the kernel for instance, I have to keep hitting
control-Z to pause the job.

I used to run both Windows XP and Mandrake Linux and didn't see the problem in
either of those.

Revision history for this message
Øivind Hoel (eruin) wrote :

Well, I guess this is a "me too". I haven't actually bothered reporting anything
until now as I've grown quite fond of the computer resetting and that famous
"recovering journal" message ;-)

I don't have overheating problems in windows, even while running very
cpuintensive games, but things like running aclocal while compiling software
will easily kill my poor laptop... Same story here - fan switches on until it
gets rather loud, then black screen and a lovely reboot.

The laptop in question is an elitegroup g556e (
http://www.ecsusa.com/products/g556.html ).

eruin@ubuntu:~$ cat /proc/acpi/processor/CPU1/throttling
<not supported>

Revision history for this message
Øivind Hoel (eruin) wrote :

Created an attachment (id=4149)
output of lscpi

Revision history for this message
Øivind Hoel (eruin) wrote :

Created an attachment (id=4150)
Output of lsmod, ofcourse...

Revision history for this message
Daugirdas (daugirdas) wrote :

I installed SUSE 10.0 amd64 and it seems to be much more stable. I haven't
managed to reproduce the problem using SUSE yet.

Windows are unaffected at all.

Revision history for this message
Matt Zimmerman (mdz) wrote :

Is this still a problem with 6.06 beta 2 or current Dapper?

Revision history for this message
Michele Campeotto (michelec) wrote :

Yes, I get similar problems on a P4 desktop. We have three identical computers (Acer 7600GT with NVidia FX6600) running Dapper here and two of them have this problem.

$ cat /proc/acpi/thermal_zone/THRM/trip_points
critical (S5): 90 C
$ cat /proc/acpi/thermal_zone/THRM/cooling_mode
<setting not supported>
cooling mode: critical
$ cat /proc/acpi/thermal_zone/THRM/polling_frequency
<polling disabled>
$ cat /proc/acpi/thermal_zone/THRM/state
state: ok

I just tried disabling powernowd so see if it helps.

Revision history for this message
Michele Campeotto (michelec) wrote :

Upon more investigation, it seems that 90°C is NOT my CPU's critical temperature, I don't know the exact CPU model I have, but all Intel's datasheets for P4s have lower values (67-73°C).

I stopped powernowd and set the polling frequency to 1s, the temperature hasn't changed so far. Let's see.

Revision history for this message
Michele Campeotto (michelec) wrote :

More info: my system is running at 62-63°C, while an identical system next to mine with Windows XP runs at about 54°C.

Revision history for this message
GreatBunzinni (greatbunzinni) wrote :

I've got an Acer Aspire 1524 that is running Kubuntu 6.06 and right now it shut down due to overheating while watching a google video.

If anyone wants some king of system info, please ask and preferably specify what command you wish that I log.

Revision history for this message
ChristianGramsch (kozzah) wrote :

I am experiencing the same problem here on a Fujitsu-Siemens Notebook (Amilo A1650G) with a Mobile Sempron 3400+.

It happened on Ubuntu 6.06 so I switched to Suse 10.1 because 10.0 worked fine a few months ago. But now I have the same problem as descriped above - so maybe it is not a problem with Ubuntu but a problem with a new version of a certain software.

When the CPU switches to 2000 MHz, the Notebook switches off within 5 minutes when the cpu-load keeps being high, /proc/acpi/thermal_zone/TZS0/temperature showing ~75 °C.

The problem doesn't occur when running very cpu-intensive applications on windows.

One thing seems very strange to me - I often read on several websites, that my notebook doesn't even have a sensor for the temperature, and on previous versions of Suse and Ubuntu I could not get any temperatures, the same on windows.

When I let the system run at 2000 MHz for a while and switch it manually to 800 MHz the temperature shown falls down from ~75 to ~50 within a second.

Revision history for this message
Daugirdas (daugirdas) wrote :

Well I upgraded to SUSE 10.1 AMD64 and it is working fine. Therefor the problem is ubuntu specific. Since I am not running ubuntu anymore on a laptop (my mum is still using breezy on a desktop i686 - that one is fine) I can't check it unless I buy a new hdd. Could someone please try booting into ubuntu with suse 10.1 amd64 kernel. If the problem is gone we would at least know where to begin.

Revision history for this message
Michele Campeotto (michelec) wrote :

I think I have my problem fixed, I'm still testing, but the module p4-clockmod. I have loaded it (and added to /etc/modules) and now powernowd seems to work and (most importantly) my CPU is way cooler.

Revision history for this message
Michele Campeotto (michelec) wrote :

err... no... with the conservative governor, the clock at 50% (1.27GHz) the temperature is back up to 62°C...

Revision history for this message
Sylvain (s-delahaies) wrote :

I've got the same problem, ie my toshiba SPA40 crashes when it gets too hot, which happens very often. I am using ubuntu (?), I don't know which version, how can I find which version I am running? I used Debian for about a year on this laptop and I never had any problem, same with Fedora core 4, and with Demudi.
Inspired by the last posts I used cpufreq-set to fix my cpu at 1.6 GHz , it works fine but a bit slow, no overheating so far, the problem now is that I can't change frequency anymore!!

1 comments hidden view all 274 comments
Revision history for this message
Chris Moore (dooglus) wrote :

I run ubuntu and debian on the same laptop. ubuntu crashes if I use the CPU for more than a few minutes at a time, debian doesn't. I can tell when ubuntu is about to power down because the fan in the laptop starts running at full speed continuously, making quite a loud noise, whereas in debian the fan alternates between full speed and something slower. It's as if debian is noticing that the fan isn't able to cool the CPU enough and does something to make the CPU generate less heat, whereas ubuntu doesn't.

What are the significant differences between ubuntu and debian with regards to the CPU speed management? Hopefully I can find a way of either reproducing the problem in debian, or making it disappear in ubuntu - but what should I tinker with?

Revision history for this message
Chris C Moore (moochris) wrote :

I have the same laptop as the originator of this bug, slightly different model due to uprated CPU (Acer Aspire 1524Wlmi).

I have been experiencing the same problems due to overheating. I have fixed any compile errors and warnings in the DSDT, but still have no throttling support and the following CPU info is reported:

cat /proc/acpi/processor/CPU0/info
processor id: 0
acpi id: 0
bus mastering control: no
power management: no
throttling control: no
limit interface: no

powernow-k8 is loaded and I've tried stopping powernowd and changing the polling frequency, but didn't seem to help.

magilus (magilus)
Changed in linux-source-2.6.20:
assignee: nobody → ubuntu-kernel-team
status: Unconfirmed → Confirmed
Changed in acpi-support:
status: Unconfirmed → Confirmed
Changed in linux-source-2.6.20:
importance: Undecided → High
Changed in linux-source-2.6.17:
assignee: mjg59 → ubuntu-kernel-team
Changed in linux-source-2.6.20:
importance: High → Critical
Matthew Garrett (mjg59)
Changed in acpi-support:
status: Confirmed → Invalid
Changed in linux:
status: New → Incomplete
194 comments hidden view all 274 comments
Revision history for this message
Zamiere Vonthokikkeiin (kikkeartworx) wrote :
Revision history for this message
Zamiere Vonthokikkeiin (kikkeartworx) wrote :

csim says:

"I noticed the following message during boot:

ACPI: Looking for DSDT ... not found!

So i went on installing IASL and looking at what's wrong with the DSDT, when i compiled the resulting dsdt.dsl file, it gave me 9 errors and 23 warnings.

http://gentoo-wiki.com/HOWTO_Fix_Common_ACPI_Problems

I found out that by replacing all _T_0 with T_0, _T_1 with T_1 and _T_2 with T_2 would fix the errors and it did! Now i only had 23 warnings, i was only able to fix 2, since i wasn't sure about the other ones."

Revision history for this message
Len Brown (len-brown) wrote : RE: [Bug 22336] Re: CPU overheats during high usage "throttling <notsupported>"

>"I noticed the following message during boot:
>
>ACPI: Looking for DSDT ... not found!

You can ignore that message.
It is normal for Ubuntu, which prints it when
no DSDT override is found in the initrd.

Revision history for this message
przemo24555 (przemo2) wrote :

I solved this problem yesterday, it's not about linux distribution or kernel version :)
All you need is 3 min :)

http://przemo2.blogspot.com/2008/03/cpu-overheats-cpu-si-przegrzewa.html

Revision history for this message
gunashekar (gunashekar) wrote :

The overheating problem was resolved after a bios upgrade on my HP/Compaq Presario V6000 laptop

Revision history for this message
iMatt (anti-spam-imatt) wrote :

My T60p overheating problem with 7.10 appears to be solved.

I did a BIOS upgrade to version 2.21 from the IBM/Lenovo website. Released 2008/02/13.
http://www-307.ibm.com/pc/support/site.wss/document.do?sitestyle=lenovo&lndocid=MIGR-63027

The temperature skyrocketed when I ran a cpuburn test (up past 80C for 20 mins) but it wouldn't lock up. Minutes prior to the update, the machine would lock up in the 60-65C range.

It also appears the fan is running about 600rpm faster by default. Verified by "cat /proc/acpi/ibm/fan". I no longer have to physically set the fan to speed "7" anymore.

This leads me to believe it wasn't solely a CPU temperature issue, as the machine can run MUCH warmer now without lockup.
To all out there experiencing this issue, check for a BIOS upgrade - it just may fix the problem.

Should the problem reoccur I will post up.

Revision history for this message
tempura (tempura) wrote :

For me the problem was resolved by updating to Hardy (8.04). Now everything seems to run fine.

Revision history for this message
clickwir (clickwir) wrote :

CPU scaling not working on my laptop with latest hardy.
See bug: https://bugs.launchpad.net/ubuntu/+source/powernowd/+bug/231534

My desktop, is working with CPU frequency scaling. That's an AMD Athlon X2 4000+ Brisbane.

What else can I provide to help get this fixed?

Revision history for this message
Sergio Zanchetta (primes2h) wrote :

The 18 month support period for Edgy Eft 6.10 has reached it's end of life. As a result, we are closing the linux-source-2.6.17 Edgy Eft kernel task. However, please note that this report will remain open against the actively developed kernel. Thank you for your continued support and help as we debug this issue.

Changed in linux-source-2.6.17:
status: Confirmed → Invalid
Revision history for this message
mathew (meta23) wrote :

Still getting overheating in Hardy, on an IBM ThinkPad T42p.

It's sufficiently bad that I can't rsync at full speed or the system overheats and shuts down.

Revision history for this message
Daugirdas (daugirdas) wrote :

I tested 8.04 x64 kubuntu live cd on my notorious Acer Aspire 1522Wlmi. Unfortunately, the issue is still there.

It reached 90C PASSIVE point and shut down

The issue is now in suse as well. It was introduced in 10.3 as a "bugfix". I am feeling a bit hopeless.

I'll try the 32bit kubuntu. If that works I'll just give away the laptop to mum and leave it. If not, it is hard to say that but it will be WINDOWS ONLY machine.

Regards,
Daugirdas

Revision history for this message
Thomas Renninger (trenn) wrote : Re: [Bug 22336] Re: CPU overheats during high usage "throttling <not supported>"

On Tuesday 01 July 2008 01:58:17 Daugirdas wrote:
> I tested 8.04 x64 kubuntu live cd on my notorious Acer Aspire 1522Wlmi.
> Unfortunately, the issue is still there.
>
> It reached 90C PASSIVE point and shut down
>
> The issue is now in suse as well. It was introduced in 10.3 as a
> "bugfix". I am feeling a bit hopeless.

This could be a good hint to find it.
Since when do see this happenening?
Which kernel was still working?
Can you do a:
rpm -q --changelog kernel-xy-0.00-0 |head -n30
of the working and the not working kernel and send it, pls.

Thanks,

        Thomas

Revision history for this message
Daugirdas (daugirdas) wrote :

I did some testing today. I tried kubuntu 32 bit 8.04 - no luck.

vmlinuz-2.6.25.5-1.1-vanilla kernel from suse 64bit worked beautifully. The system goes to 800MHz once it his 90C and stays at that speed until it cools down to 75C. The graph shows it nicely. The system did not shut down and this is very important.

So we need to narrow it down to a specific patch. That is the list of suse patches http://www.mirrorservice.org/sites/ftp.opensuse.org/pub/opensuse/update/10.3/repodata/ .
http://www.mirrorservice.org/sites/ftp.opensuse.org/pub/opensuse/update/10.3/repodata/patch-kernel-4749.xml --- looks particularly suspicious and would roughly fit the timescale. I'll try to do some more testing in the evening to narrow it down.

patches.arch/acpi_thermal_passive_blacklist.patch: Avoid
 critical temp shutdowns on specific ThinkPad T4x(p) and
 R40 [#333043]

Daugirdas

Revision history for this message
magilus (magilus) wrote :

When you say that you have tried the vanilla kernel then it is the one without any patches which can be found on kernel.org.

So Ubuntu applies some patch which breaks things.

Revision history for this message
Daugirdas (daugirdas) wrote :

This is the summary of the kernel package I used:
"kernel-vanilla - The Standard Kernel - without any SUSE patches

The standard kernel - without any SUSE patches Source Timestamp: 2008-06-07 01:55:22 +0200"

So that would my imply both *ubuntu and SUSE have some patch which breaks power management.

Revision history for this message
magilus (magilus) wrote :

That means that it is up to the Ubuntu devs which are not very responsive here...

Revision history for this message
magilus (magilus) wrote :

Bug in some patch that Ubuntu ships. Bug does not happen with upstream tarball.

Changed in linux:
assignee: nobody → ubuntu-kernel-team
status: Incomplete → Confirmed
Revision history for this message
Daugirdas (daugirdas) wrote :

I did some source comparison between suse and vanilla 2.6.25.5 kernels. The ./drivers/acpi/processor_thermal.c and processor_throttling.c were identical. Thermal.c were different (vanilla on the left):

daugirdas@dtrsuse64:~/Desktop/linux-2.6.25.5/drivers/acpi> diff thermal.c thermal.cs
443,445c443
< if (ACPI_FAILURE(status))
< tz->trips.passive.flags.valid = 0;
< else
---
> if (ACPI_SUCCESS(status)) {
447,452c445,454
<
< if (memcmp(&tz->trips.passive.devices, &devices,
< sizeof(struct acpi_handle_list))) {
< memcpy(&tz->trips.passive.devices, &devices,
< sizeof(struct acpi_handle_list));
< ACPI_THERMAL_TRIPS_EXCEPTION(flag, "device");
---
> if (memcmp(&tz->trips.passive.devices, &devices,
> sizeof(struct acpi_handle_list))) {
> memcpy(&tz->trips.passive.devices, &devices,
> sizeof(struct acpi_handle_list));
> ACPI_THERMAL_TRIPS_EXCEPTION(flag, "device");
> }
> } else {
> tz->trips.passive.flags.valid = 0;
> ACPI_EXCEPTION((AE_INFO, status, "Invalid passiv trip"
> " point\n"));

I can't read this but hopefully this would suggest something. Especially since thermal.c contains these lines further down:

 /* take no action if nocrt is set */
 if(!nocrt) {
  printk(KERN_EMERG
   "Critical temperature reached (%ld C), shutting down.\n",
   KELVIN_TO_CELSIUS(tz->temperature));
  orderly_poweroff(true);
 }

Another point: THRC critical point on my system is 97C. 90C is PASSIVE, but I get shutdowns at 90C. That may also mean kernel confuses CRITICAL with PASSIVE!

Revision history for this message
Daugirdas (daugirdas) wrote :

files in drivers/thermal and drivers/cpufreq are identical

Changed in linux-source-2.6.17:
status: Invalid → Won't Fix
Revision history for this message
dtsmith1984 (dtsmith1984) wrote :

I am having the same problem with an Acer Aspire 3620 on 8.04.

Critical point is either 80 or 85C. And it reaches that quite easily with any graphics intensive program. 10 minutes of tuxcart and it shuts down.

Does compiling a new kernel from kernel.org solve the problem?

This is the first time ive ever put ubuntu (or linux for that matter) on a laptop. My girlfriend prefers linux and wanted me to get rid of windows, which i gladly did. But now her laptops overheating. She never had any problems in windows. I would like to find a solution so that i don't have to go back to windows.

Revision history for this message
Carlos Perelló Marín (carlos) wrote :

I didn't have this problem with Ubuntu Hardy, however, since I upgraded to Ibex, my laptop is getting this problem and the battery life went down a lot. I have a Lenovo Thinkpad X61s.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
clickwir (clickwir) wrote :

clickwir@lappy:~$ sudo modprobe powernow-k8
FATAL: Error inserting powernow_k8 (/lib/modules/2.6.27-1-generic/kernel/arch/x86/kernel/cpu/cpufreq/powernow-k8.ko): No such device
clickwir@lappy:~$ sudo modprobe acpi-cpufreq
FATAL: Error inserting acpi_cpufreq (/lib/modules/2.6.27-1-generic/kernel/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.ko): No such device
clickwir@lappy:~$ uname -a
Linux lappy 2.6.27-1-generic #1 SMP Sat Aug 23 23:19:01 UTC 2008 x86_64 GNU/Linux

Didn't work for me. Note, I just added a few intrepid repos and then updated anything just related to 2.6.27 and it's supporting modules.

My Intel wireless Pro/2200BG works fine, automatically picked up and connected to the network. Seems to be just fine, just no cpu scaling going on. This is an Acer Aspire 3004 laptop.

Revision history for this message
Colin Muller (colin-durbanet) wrote :

2.6.27-1-generic from kernel.ubuntu.com made no difference on my notebook.

I tried both with polling disabled and with polling at 2 seconds, as set in /proc/acpi/thermal_zone/THRM/polling_frequency then ran a test with stress, and the temperature just kept climbing until it hit critical, which ACPI detected, shutting down the machine.

The notebook (currently running Hardy, but the problem has been present since Warty):
http://www.durbanet.co.za/colin/mecer-linux/mecer_n223ii_notebook_ubuntu_linux.html

On machines like this, which don't raise an alert via ACPI at any temperature apart from CRITICAL, but which do have a constantly-updated record accessible via ACPI of what the current temperature is, is it not possible for the kernel to do the following:

a. Poll the current temperature at a user-configurable period, with a reasonable default
b. Turn on throttling or whatever else is required if the temperature goes above CRITICAL minus n, where n is a user-configurable value with a reasonable default.
c. To turn off the protective throttling (or whatever) when the temperature drops below beneath CRITICAL minus n minus m, where m is user-configurable with a reasonable default.

In the past, I've tried without success to achieve the above using powersave (which was not part of my Hardy install, so I haven't tried that again recently). I currently keep the machine permanently thorttled by having this line in /etc/rc.local:

/bin/echo -n 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq

Revision history for this message
Luka Renko (lure) wrote :

Similar solution (in kernel) was discussed in kernel.org Bugzilla, but proposed solution was not accepted for upstream kernel:
http://bugzilla.kernel.org/show_bug.cgi?id=10658

Revision history for this message
clickwir (clickwir) wrote :

Latest kernel and modules not working for me.

clickwir@lappy:~$ uname -a
Linux lappy 2.6.27-3-generic #1 SMP Wed Sep 10 16:18:52 UTC 2008 x86_64 GNU/Linux
clickwir@lappy:~$ sudo modprobe powernow-k8
FATAL: Error inserting powernow_k8 (/lib/modules/2.6.27-3-generic/kernel/arch/x86/kernel/cpu/cpufreq/powernow-k8.ko): No such device
clickwir@lappy:~$ sudo modprobe acpi-cpufreq
FATAL: Error inserting acpi_cpufreq (/lib/modules/2.6.27-3-generic/kernel/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.ko): No such device

Anything else I should try?

Revision history for this message
clickwir (clickwir) wrote :

clickwir@lappy:~$ uname -a
Linux lappy 2.6.27-6-generic #1 SMP Tue Oct 7 04:15:23 UTC 2008 x86_64 GNU/Linux
clickwir@lappy:~$ sudo modprobe powernow-k8
FATAL: Error inserting powernow_k8 (/lib/modules/2.6.27-6-generic/kernel/arch/x86/kernel/cpu/cpufreq/powernow-k8.ko): No such device
clickwir@lappy:~$ sudo modprobe acpi-cpufreq
FATAL: Error inserting acpi_cpufreq (/lib/modules/2.6.27-6-generic/kernel/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.ko): No such device

Nada, no cpu frequency scaling. :-(

Revision history for this message
Sergio Zanchetta (primes2h) wrote :

The 18 month support period for Feisty Fawn 7.04 has reached it's end of life. As a result, we are closing the linux-source-2.6.20 Feisty Fawn kernel task. However, please note that this report will remain open against the actively developed kernel. Thank you for your continued support and help as we debug this issue.

Changed in linux-source-2.6.20:
status: Confirmed → Invalid
Revision history for this message
zeddock (zeddock) wrote :

I am now on 2.6.27 and problem still persists under 8.10.
Please continue this bug?

Thanx!

zeddock

Revision history for this message
mplexus (mike-plexousakis) wrote :
Download full text (4.4 KiB)

Hello everyone!

My laptop is Acer 1524 WLMi with AMD64 3400+ (2.2 GHz). It surely suffers from shutting down due to overheating. I am using Ubuntu 8.10 64bit with default kernel 2.6.27-9-generic (yet the problem exists as many in this tread state from earlier versions).

I clean the dust from the heatsink often enough and when I do it takes more time until the laptop shuts down from overheating - yet it still does shut down eventually.

As I understand it has something to do with thermal trip points.
I didn't compile my own kernel (as to fix DSDT errors etc).
I use the on-demand governor (as default).
These are the power-specific modules that are typically loaded:

# lsmod | grep power
powernow_k8 23684 0
cpufreq_powersave 10368 0
freq_table 13568 3 powernow_k8,cpufreq_stats,cpufreq_ondemand
processor 47800 2 thermal,powernow_k8
#

my /proc/acpi/thermal_zone/THRC has the following contents:

# cat cooling_mode
0 - Active; 1 - Passive
# cat trip_points
critical (S5): 97 C
passive: 90 C: tc1=2 tc2=5 tsp=300 devices=CPU0
# cat polling_frequency
<polling disabled>
#

On-demand governor works fine - the cpu frequency varies from 800-1200-2000-2200 MHz up and down according to cpu load. The problem is that when reaching 90 degrees of cpu temperature the high-scale frequency should scale down a bit to let cpu cool off and then rise back up to cope with heavy load. This scaling down does not happen and critical temperature is reached shutting down the system.

After reading Thomas Renninger's post,

[Thomas Renninger wrote on 2007-04-25: ...The tsp value (time in 1/10s how often temp should be polled when passive cooling is on) can be overridden by passing as thermal module parameter. ...]

I started to think maybe I should change thermal's module parameters.

This is the output of modinfo thermal in my system:

filename: /lib/modules/2.6.27-9-generic/kernel/drivers/acpi/thermal.ko
license: GPL
description: ACPI Thermal Zone Driver
author: Paul Diefenbaugh
srcversion: 1787CE9FEB053C917D031A9
alias: acpi*:LNXTHERM:*
depends: processor
vermagic: 2.6.27-9-generic SMP mod_unload modversions
parm: act:Disable or override all lowest active trip points. (int)
parm: crt:Disable or lower all critical trip points. (int)
parm: tzp:Thermal zone polling frequency, in 1/10 seconds. (int)
parm: nocrt:Set to take no action upon ACPI thermal zone critical trips points. (int)
parm: off:Set to disable ACPI thermal support. (int)
parm: psv:Disable or override all passive trip points. (int)

I edited my /etc/modprobe.d/options file and added

options thermal tzp=30 act=0 crt=0 psv=0

The option tzp=30 means 3 seconds of polling, and as for the other zeros I thought they would let me override thermal trip points (I ave already tried manually overriding them by sudo -i ... echo etc only reaching to the conclusion that newer kernel doesn't allow it [ubuntu forums]).

This was a heisty action just to see what happens. It turns out this has some good affects on thermal behaviour of my sy...

Read more...

Revision history for this message
dr.spock (dr.spock) wrote :

I'm using 8.10 32 bit edition, but my CPU is a 2.0 Dothan, so power management is done by acpi-cpufreq instead of powernow_k8.

$ lsmod | grep cpufreq
acpi_cpufreq 15500 0
cpufreq_userspace 11396 0
cpufreq_stats 13188 0
cpufreq_powersave 9856 1
cpufreq_ondemand 14988 0
freq_table 12672 3 acpi_cpufreq,cpufreq_stats,cpufreq_ondemand
cpufreq_conservative 14600 0
processor 42156 3 acpi_cpufreq,thermal

My /proc/acpi/thermal_zone/ has a THRM directory, not THRC, and it contains:

$ cat cooling_mode
<setting not supported>

$ cat trip_points
critical (S5): 100 C

$ cat polling_frequency
<polling disabled>

Adding "options thermal tzp=30 act=0 crt=0 psv=0" to /etc/modprobe.d/options does not change this value after reboot, it keeps showing "<polling disabled>", but one time I have managed to unload module thermal and reload it with these parameters, it showed "3 seconds". Anyway it does not work, and it shuts down when it reaches critical trip point.

Now I'm trying to repeat test but I can't unload thermal, because it says it's in use.

Any hint?

My kernel is 2.6.27-20-generic, but thermal module shows the same info:

spock@vulcan:/proc/acpi/thermal_zone/THRM$ modinfo thermal
filename: /lib/modules/2.6.27-10-generic/kernel/drivers/acpi/thermal.ko
license: GPL
description: ACPI Thermal Zone Driver
author: Paul Diefenbaugh
srcversion: 1787CE9FEB053C917D031A9
alias: acpi*:LNXTHERM:*
depends: processor
vermagic: 2.6.27-10-generic SMP mod_unload modversions 586
parm: act:Disable or override all lowest active trip points. (int)
parm: crt:Disable or lower all critical trip points. (int)
parm: tzp:Thermal zone polling frequency, in 1/10 seconds. (int)
parm: nocrt:Set to take no action upon ACPI thermal zone critical trips points. (int)
parm: off:Set to disable ACPI thermal support. (int)
parm: psv:Disable or override all passive trip points. (int)

Cheers.

Revision history for this message
mplexus (mike-plexousakis) wrote :

Well, it turns out that after a reboot my good thermal state was all gone and i was back to shut down due to overheating - and my polling frequency was still <polling disabled>.

I removed thermal module and modprobed itback again and things worked all right again, now my polling freq is back to 3 seconds.

So, adding into /etc/rc.local the below lines did it or me:

rmmod thermal
modprobe thermal

You probably cannot unload thermal because in your terminal you "are" inside /proc/acpi/thermal_zone directory. Get out and try rmmod and modprobe.

Note: using this trick feels the right thing to do for my laptop. It just feels thermal behaviour is in its best state ever: for the first time i can see my cpu use all 4 scales (800, 1200, 2000 and 2200) of frequencies (for several minutes not "instantly") where before it only scaled from lowest straight to highest (and stayed there on high load). Cool ! :-)

Revision history for this message
dr.spock (dr.spock) wrote :

Thanks mplexus, surely I didn't care I was inside one directory created by the module.

Now I have modified rc.local and polling frequency is set to 3 secs. I have tested it again with command 'yes | sha1sum' while monitoring CPU temp and freq, and freq raises inmediately 2GHz (max), and it shuts down when it reaches 100º (curiously, a message is shown on console that says it will shut down because it has reached 72º).

Well, in my case polling is working fine, but I think it doesn't work because my trip_points file only contains 'critical (S5): 100 C', and the only event possible, then, is to shut down the computer.

Revision history for this message
mplexus (mike-plexousakis) wrote :

Actually, I don't know if 1 second polling is better..

Anyway, I am still testing this thermal module and what possibilities it gives me. Experiment is all i can do :-)

Another note: for the first time power management works as supposed to in regards to battery reaching critical : now, it does what i told it to do, shut down the system cleanly. Previously, it ignored me and waited until battery went totally empty and then instantly hard-power-off. So, this new behavior is a good thing.

Truly changing the trip points should be the goal. Newer kernels don't allow this i suppose. Thermal module's options say something about it.. We'll see.

Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Changed in acpi:
status: Unknown → Confirmed
Revision history for this message
Antonio Salazar (asalazarmx) wrote :

Just to add another laptop model to the list: A Sony Vaio GN-FZ250FE.

This laptop came with Window Vista, which runs the CPU significantly cooler than Ubuntu, the latter needing to use the cooling fan all the time because its temperature rarely drops from 55 Celsius. The NVidia GPU runs about 5 degrees hotter than the CPU, even when using Metacity instead of Compiz Fusion.

I've seen this behavior in a brand-new Toshiba laptop too, but I don't have it near to check the model, and reading all the comments I realize any people have this problem too. When asked, it's kinda embarrassing to admit that Windows Vista is greener than Ubuntu 8.10 on many laptops.

Changed in acpi:
status: Confirmed → Fix Released
Changed in acpi:
status: Fix Released → Confirmed
Changed in acpi:
status: Confirmed → Fix Released
Revision history for this message
Leho Kraav (lkraav) wrote :

hi everyone

i just ran into this overheating+shutdown issue with an amilo a1650 laptop, mobile sempron 3100+ (800, 1600,1800 MHz steps), gentoo 2.6.24 and 2.6.28-r10. in my case this was clearly a problem related to horrific thermal paste situation. afaik nobody has touched the cpu+hsf since the machine came from factory some years ago, but the cpu idle temperature was 56 C @ 800MHz. using performance or ondemand governor the cpu temperature would almost immediately skyrocket to 100+ C and the machine would almost immediately power off.

before, this laptop had run windows xp for a almost 2 years now where it also exhibited some occasional freezes and shutdowns, although it would not do it so abruptly, and freezes also seemed to be related to which Mobility Radeon 200M IXP video driver i would use (the newer, the worse!). but overall it would be in a working condition, although you could visibly see (ProcExp) it would throttle to 800 MHz way too often and bog down any task which hogged the cpu for any period of time.

yesterday i opened heatsink up and saw the thermal paste was in a terrible situation, as in it looked like nothing you would expect from a decent thermal paste application - hardened into pieces, scattered around the core, with random blobs stuck on the heatsink. applied some arctic cooling mx-1 and voila - it looks to have a worked. cpu idle temperature was 36 C after booting up, stabilizing at around 45 C @ 800 MHz after staying on for a while. i'm using ondemand governor, so 800 MHz is the usual working speed. in performance situations, the temperature does ratchet up double and more, but thanks to decent thermal paste, the fan has more time to kick in at higher speed and now the maximum temperature during 'make bzImage' was 98 C for a second. the cpu does get automatically throttled down to 800 MHz at high temperatures with performance governor, then switched up again at around 65-70C.

i also and monitored and graphed it (attached).

as you can see, these crappy low end laptops are just not very good performance or gaming machines. the cpu burns up easily and hardware has to scale back the MHz, killing any chance at high performance for extended periods. fortunately this one will do nothing sit idle most of the time with office work, so 800 MHz with occasional bump-up should work fine from now on. but obviously it's a worrying sign that coming out the factory, it was ill-prepared to do even that. googling "amilo overheat" is a clear indicator of that.

Revision history for this message
Manoj Iyer (manjo) wrote :

Looks like this is an old bug for which a fix has already been released. Marking as Fix Released. If this is still an issue on Jaunty/Karmic kernels please open a separate bug.

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Leho Kraav (lkraav) wrote :

as a final note, i've just switched the mobile sempron 3100+ (62W TDP) out for a turion mt-34 (25W TDP) for $15 off ebay. switching on performance governor for full 1800 MHz the turion temperature tops out at around 56C and does not get higher... which is almost the same as usual 800 MHz temperature for sempron! hopefully this ends the overheating issue.

Revision history for this message
edward stroupe (e-stroupe) wrote :

processor is presently running @ 50% plus - with just system moniter open, recently started doing this when opening yahoo e-mail account. Did this with Windows XP, but nit with Ubuntu @ early install and until just a few days ago. When running video clips near
HD, off line, fan turns on. Use an under pad with 2-fans to add cooling plus a side desk fan. 2004 Dell model I-5150. The 'Van Halen
of computers?! Ed.

Displaying first 40 and last 40 comments. View all 274 comments or add a comment.