Thinkpad T43 2.6.15-23/25, system slow to use due to CPU spikes

Bug #50024 reported by Joe Kislo
This bug report is a duplicate of:  Bug #30557: cpu idle time in /proc/stat wrong. Edit Remove
6
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
New
Undecided
Unassigned

Bug Description

Let me apologise in advance for not being able to narrow this one down further, but it's something in the kernel... so my user space tools aren't helping me track this down.

I have a thinkpad T43 (1.87GHz P-M). The system ran great when I was running breezy. When I upgraded to dapper (final release), the system was slugish, even after a fresh bootup. The cpu monitor applet showed cpu spikes every few seconds, lasting about one second (or one slice on the cpu monitor). How often these spikes happen, gets progressively worse and worse the more you use the system. The frequency of these spikes eventually gets so bad the cpu won't spin down to 800MHz, and will stay at 1.06GHz, then it'll get worse, and evenutally just stay stuck at 1.87GHz. Even after you quit all of your apps and sit at a blank desktop.

Running top shows no cpu being used (except for top, which runs about .3% cpu or so). vmstat however, shows the *real* picture. Here is my system after a fresh bootup, but sitting in X windows:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 0 0 0 375608 187040 230140 0 0 0 14 1126 217 5 15 80 0
 2 0 0 375608 187048 230132 0 0 0 3 1124 212 5 23 73 0
 0 0 0 375608 187056 230124 0 0 0 4 1112 200 23 12 65 0
 0 0 0 375600 187064 230116 0 0 0 3 1093 230 12 16 72 0
 0 0 0 375600 187072 230108 0 0 0 3 1090 237 8 27 65 0
 0 0 0 375608 187072 230108 0 0 0 13 1089 221 21 14 64 0

Note all the user and system time. After using my computer (mostly webbrowsing, maybe a bunch of ssh sessions) for about an hour it looks like this:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 0 0 0 217664 200236 256792 0 0 0 90 1217 911 65 16 19 0
 0 0 0 217788 200244 256784 0 0 0 15 1390 979 56 24 20 0
 0 0 0 219300 200252 256776 0 0 0 4 1388 1700 61 20 18 0
 1 0 0 219300 200260 256768 0 0 0 14 1297 1223 61 21 18 0
 0 0 0 219300 200264 256764 0 0 0 6 1126 633 58 18 23 0
 1 0 0 220664 200264 256764 0 0 0 25 1354 1658 65 20 15 0

Now it's mostly NOT idle.

I still have my 2.6.12 kernel (from breezy) installed. If I boot back into that kernel, everything is competely fine. This problem is totally not present. I have rebooted a dozen times, and it's not just some flakey thing, it happens everytime I boot the 2.6.15-23 kernel, and NEVER when I boot the 2.6.12-9 kernel. (I am using a 686 kernel for both).

To make matters worse, I upgraded to the 2.6.12-25 kernel last night. This kernel is basically unusable on my T43. The kernel still has the "spikey" cpu problem... But now when I use an aterm (after a freshbootup noless) the system is so sluggish it can't keep up with my typing. The graphics when it opens a new window (the rectangle coming out from the icon when you click it) is so slow it takes a full second to render. Some apps can keep up with my typing, for example firefox. It seems 'okay'. However, very wierd.. If I drag the firefox window around the screen (even if there are no windows below it) it takes nearly my ENTIRE CPU (according to the CPU applet). However top shows Xorg only taking 7%. Top shows 50% system time being taken. If I boot back into the -23 kernel, this particular problem goes away. So again, another problem that seems kernel related.

So if you can think of anythign I can do to try to debug this... I will happily perform tests. I can't figure out where to go from here. I tried killing all of the user space programs last night (including X)... still seeing the cpu spikes.

At this point, none of the dapper kernels are really usable for me (although the -23 kernel is atleast somewhat useful if I'm not on battery). I have to use the breezy kernel now, although some of my devices don't work now that the rest of the system has been dapperized.

Tags: linux
Revision history for this message
Adam Koszela (adam-koszela) wrote :

Can confirm 2.6.15-25-686 being sluggish on my Sony VAIO VGN-FS295VP. Can't really remember how the breezy kernel was, used dapper throughout development. 2.6.15-23-686 is much faster for me.

Revision history for this message
Jeff Bowden (jlb) wrote :

Just to add another datapoint, I'm running -23 on my T43 and I have not noticed this behavior. I have the cpu frequency monitor up all the time and it's almost always at 800Mhz. Perhaps if we compare hardware details we can find the way to the culprit. I will add an attachment with my lspci and etc for you to compare.

Revision history for this message
Jeff Bowden (jlb) wrote : specs from a t43 that does not exhibit the problem

Let me know if I can run any tests to help out.

Revision history for this message
Adam Koszela (adam-koszela) wrote : specs from a VGN-FS295VP with slowdown

My specs

Revision history for this message
Jeff Bowden (jlb) wrote : stuff that's different in the non-working system

This looks to be the list of what's in your system that's not in mine. If the original reporter (who also has a T43) could check to see if he's got any of these that would probably pinpoint the problem.

Revision history for this message
Adam Koszela (adam-koszela) wrote :

Could the original reporter try booting with 'acpi=off'? That fixed it for me, though now I don't get my battery level anymore.

Revision history for this message
Joe Kislo (joe-k12s) wrote :

So for my own sanities sake let me give numbers to the symptoms

Symptom #1:
CPU spikes, which over time, evenutally turn into a continual stream of user and system time. Eventually the cpu will never spin down from MAX.
Kernels present in: 2.6.15-23, 2.6.15-25
Kernels NOT present in: 2.6.12-9

Symptom #2:
Holycrap the system is unusably slow. I can type faster than my terminal window can keep up with me. Opening a window causes the video to be so slow I can watch it paint
Kernels present in: 2.6.15-25
Kernels NOT present in: 2.6.15-23, 2.6.12-9

I can confirm that if I boot the 2.12.15-23 kernel with ACPI=off, symptom #1 goes away. If I boot the 2.12.15-25 kernel with ACPI=off, both symptom #1 and #2 goes away.

So this makes my laptop usable again. Yay! Except it's a plugged-in-only laptop now.. Since all of the battery management doesn't work, and the cpu is stuck at max.

I looked at the differences for the lspci, and the only difference is the wireless card (I have the ABG card)

-0000:04:02.0 Network controller: Intel Corporation PRO/Wireless 2915ABG MiniPCI Adapter (rev 05)
+0000:04:02.0 Network controller: Intel Corporation PRO/Wireless 2200BG (rev 05)

the lsmod is pretty different. I will attach both.

Revision history for this message
Joe Kislo (joe-k12s) wrote : T43 with problem lspci

T43 with problem lspci

Revision history for this message
Joe Kislo (joe-k12s) wrote : T43 with problem lsmod

T43 with problem lsmod

Revision history for this message
Joe Kislo (joe-k12s) wrote :

Perhaps one difference between the working T43 and mine is the bios revision? Jeff, can you give me your bios revision? Mine is:

1.05
2005-04-28

Embedded Controller Version: 1.03

Revision history for this message
Joe Kislo (joe-k12s) wrote :

I should note the lsmod I included is from a bootup with acpi=off

Revision history for this message
Jeff Bowden (jlb) wrote :

Not sure where you got Embedded Controller Version. There's a couple of BIOS version numbers in kern.log

PCI: PCI BIOS revision 2.10 entry at 0xfd8f7, last bus=7

and

apm: BIOS version 1.2 Flags 0x03 (Driver version 1.16ac)

Revision history for this message
Adam Koszela (adam-koszela) wrote :

I marked this bug as a duplicate of #30557.

Revision history for this message
Joe Kislo (joe-k12s) wrote :

Jeff. I got my bios version by going into the bios. While the big IBM is on the screen after a reboot, hit F1. It should be listed there. Any chance you can reboot at some point and send over that #?

Thanks,

Revision history for this message
Adam Koszela (adam-koszela) wrote :

Until this gets fixed, change your kernel to the regular -i386. No need for acpi=off with that one (see #30557).

Revision history for this message
sohail (launchpad-taggedtype) wrote : Re: [Bug 50024] Re: Thinkpad T43 2.6.15-23/25, system slow to use due to CPU spikes

On Tue, 2006-06-20 at 20:26 +0000, Adam Koszela wrote:
> *** This bug is a duplicate of bug 30557 ***
>
> Until this gets fixed, change your kernel to the regular -i386. No need
> for acpi=off with that one (see #30557).

Or add: echo 1 > /sys/module/processor/parameters/max_cstate

to /etc/rc.local

Revision history for this message
Jeff Bowden (jlb) wrote :

my BIOS is revision 1.24, 2005-11-07. Embedded Controller Version 1.04

Revision history for this message
Joe Kislo (joe-k12s) wrote :

Setting the max_cstate to 1 makes the system usable.

I just upgraded to 1.27 (May 24 2006) and 1.06 (Jun 6 2006), and set the max_cstate back to default (8). That seems to have made the2.6.12-25 kernel MUCH better. I can't tell if it's totally cured, but it definately is usable. I'll have to use my laptop more tonight and make sure it's totally fixed with the firemware upgrade

Revision history for this message
Launchpad Janitor (janitor) wrote : This bug is now reported against the 'linux' package

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this linux-source-2.6.15 kernel bug to the new "linux" package. We appreciate your patience and understanding as we make this transition. Also, if you would be interested in testing the upcoming Intrepid Ibex 8.10 release, it is available at http://www.ubuntu.com/testing . Please let us know your results. Thanks!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.