kernel performance is *very* slow with 8GB RAM on AMD64. 6GB is fine. kernel 2.6.22-8 x86_64

Bug #129172 reported by RichardNeill
6
Affects Status Importance Assigned to Milestone
Linux
Invalid
Medium
linux (Ubuntu)
Fix Released
Undecided
Unassigned
linux-source-2.6.22 (Ubuntu)
Won't Fix
Medium
Unassigned

Bug Description

Binary package hint: linux-image-2.6-amd64-generic

I have a new P35-based motherboard, with room for 8GB of RAM, and a Q6600 CPU. If all 4 of the 2GB DIMMs are present, the performance of the kernel is extremely poor: the system runs at about 1/100th of its normal speed. If 1 DIMM is removed, everything is fine.

This is the version as installed by gutsy nightly, on 28th July: kernel 2.6.22-8 x86_64

I originally filed it as bug #128977 in the installer; there are some more details and the exact hardware listing there.

I've uploaded a tarball here: http://www.ruo3.org/~rjn/8gbbug.tar.gz (490 kB) containing as many diagnostics as I can think of, including
   dmesg, var/log/messages, most of /proc
Hope this is useful

There's nothing helpful about this bug anywhere on google, as far as I can see.

Thanks very much - Richard

Revision history for this message
RichardNeill (ubuntu-richardneill) wrote :

Probably more helpful if I attach the file here.

Revision history for this message
RichardNeill (ubuntu-richardneill) wrote :

Still true with 2.6.22-9.

Revision history for this message
RichardNeill (ubuntu-richardneill) wrote : Experiments with mem=XXXM. Workaround.

A useful test/workaround is provided by mem=XXX. This allowed me to experiment. Here are the results.
In all cases, the system was booted in recovery mode (for speed of testing), appending mem=XXXM to the kernel command-line. All 4 DIMMs (8GB in total)
were always present.

Results:

boot param result of free-m (total) performance
-------------- ---------------------------- ----------------
mem=2048M 2015 normal
mem=6144M 5477 normal
mem=8192M 7497 normal
mem=8792M 8002 100x slow #measured by timing: for ((i=0;i<10000;i++)); do echo $i > /dev/null ;done
mem=10000M 100x slow
  [none] 100x slow

I then did a binary search to find the optimum. Where bootup was very slow, it was terminated with Alt-SysRQ-[RSEIUB] before completion, so "free -m" was not measured. (Unfortunately, it doesn't work to use "init=/bin/bash" to speed up the test cycle - why not?)

boot param free -m performance
-------------- --------- ----------------

mem=8500M slow
mem=8300M 7604 normal
mem=8400M slow
mem=8350M slow
mem=8325M slow
mem=8315M 7618 normal
mem=8320M slow
mem=8318M 7621 normal
mem=8319M slow

I also found these other links which might be relevant:
  http://www.hostingforum.ca/172078-slow-kernel-when-memory-8g.html
  http://lkml.org/lkml/2007/5/30/79

Lastly, looking at the difference between real, and reported memory: (always with 8GB of physical RAM), from above data

mem= free -m difference
-------- --------- ------------
2048 2015 33
6144 5477 667
8192 7497 695
8792 8002 790
8300 7604 696
8315 7618 697
              => No very clear pattern.

Anyway, the workaround, for now is to boot with "mem=8318M", which actually provides 7621MB of RAM (i.e. wasting 1171 MB).

Revision history for this message
RichardNeill (ubuntu-richardneill) wrote :

Oops - I can't subtract! I meant: 8192 actual - 7621 available => 571MB wasted. So at least I can use about 93% of the installed RAM.

Revision history for this message
RichardNeill (ubuntu-richardneill) wrote :

This bug also occurs with Gentoo and Mandriva 64-bit Live-CDs. Cross-filing as a kernel bug:
http://bugzilla.kernel.org/show_bug.cgi?id=8883

Revision history for this message
Chuck Short (zulcss) wrote :

But it looks like more of a hardware problem according to the bugzilla report.

Changed in linux-meta:
assignee: nobody → zulcss
Revision history for this message
RichardNeill (ubuntu-richardneill) wrote :

When you say "looks like a hardware problem", I'm not quite sure what you mean. At first, I'd be inclined to agree with you, however it's entirely new hardware, it passes memtest fine (for 12 hours), and everything works perfectly, with the exception that I can't use all the physical RAM. All the diagnostics seem to be OK.

I guess we could think of this as a (design) flaw in the motherboard/chipset, which prevents it from working properly with the current Linux kernel - but I don't think that gets us very far ;-) Do you have any suggestions for further tests which could narrow this down a bit? Thanks for your help.

Revision history for this message
Amit Kucheria (amitk) wrote :

Could you try moving the memory modules around? e.g. moving the last one to the first.

Revision history for this message
RichardNeill (ubuntu-richardneill) wrote :

Already tried that. I've moved things around such that I've tested every DIMM and every slot.
Also, if I remove one DIMM (from any slot), leaving 6GB present, everything works fine.
The DIMMs are a matched set of 4 x 2GB Geil Black Dragon, PC6400. Everything passes memtest.

Revision history for this message
Amit Kucheria (amitk) wrote :

Richard, could you post the contents of /proc/mtrr?

Revision history for this message
Amit Kucheria (amitk) wrote :

To elaborate, this might be a bios bug similar to #132577.

Revision history for this message
RichardNeill (ubuntu-richardneill) wrote :

Thanks, Amit.

Here's /proc/mtrr:
reg00: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
reg01: base=0x100000000 (4096MB), size=4096MB: write-back, count=1
reg02: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg03: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg04: base=0xdff00000 (3583MB), size= 1MB: write-through, count=1

In the meantime, I'd done a bit more testing, and found that this also affects the Mandriva and Gentoo installers/Live CDs. Therefore, I cross-filed it as a kernel bug (I hope that was the right thing to do): http://bugzilla.kernel.org/show_bug.cgi?id=8883
Interestingly, the performance is slow on Fedora core 7's x86 install CD, but fine on Knoppix 5.01 x86.

I've also asked Gigabyte tech support in case it's a BIOS bug; no reply as yet.
And yes, I do think it sounds quite similar to #132577.

Revision history for this message
Amit Kucheria (amitk) wrote :

Does the Knoppix CD enable all 8Gb of RAM? I would suspect not. In any case, getting the output of /proc/mtrr from the Knoppix Live CD might tell us why it works.

Yes. filing it upstream if the best thing to do in such cases.

Revision history for this message
Amit Kucheria (amitk) wrote :

Confirm the existence of the bug and adding upstream dependency.

Changed in linux-source-2.6.22:
assignee: zulcss → nobody
importance: Undecided → Medium
status: New → Confirmed
Changed in linux:
status: Unknown → Invalid
Revision history for this message
RichardNeill (ubuntu-richardneill) wrote : It's a BIOS bug - fixed :-)

Had a reply from Gigabyte - yes, it *is* a BIOS bug, and the BIOS upgrade fixes it. The kernel bug was rejected as invalid (because it's a BIOS problem, not a kernel problem).
I'll give full details below, because this wasn't quite straightfoward to fix, and hopefully it will help someone else.

1)Get the latest BIOS update from gigabyte. I used this one successfully: motherboard_bios_ga-p35c-ds3r_f4g_beta.exe

2)Extract the BIOS with wine (0.9.41 works fine). Gigabyte don't provide the md5sum, so, for info: 6f283d38e272ea433b4478f39a6cdb03 P35CDS3R.F4g

3)Copy the BIOS onto a floppy. [The manual claims that a USB key is also supported: the Q-flash utility loads the new image fine, but fails at the last step with "BIOS ID CHECK ERROR". You have to use a real old-fashioned diskette.]

4)Flash BIOS, Load optimized defaults. Enjoy.

5)Once updated, free -m reports 8002 MB free (presumably, the rest is used by the kernel). The MTRRs are now:

$ cat /proc/mtrr
reg00: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg01: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg02: base=0x100000000 (4096MB), size=4096MB: write-back, count=1
reg03: base=0x200000000 (8192MB), size= 512MB: write-back, count=1
reg04: base=0xdff00000 (3583MB), size= 1MB: write-through, count=1

And the performance is about 50% better than the best it was before :-)

I'm sure Gigabyte have shipped lots of boards with the original broken BIOS - is there any way to make the kernel print a warning? It would be nice if the next Ubuntu user who hits this bug can find the fix more easily!

Thanks once again for your help - Richard

Tim Gardner (timg-tpi)
Changed in linux-source-2.6.22:
status: Confirmed → Invalid
Revision history for this message
Matthew Garrett (mjg59) wrote :

While this is arguably a BIOS bug, Linux's failure to support PAT is the key failure here. Even then, we could handle this much better than we currently do by clipping memory size to the level covered by the MTRRs.

Changed in linux-source-2.6.22:
status: Invalid → Confirmed
Revision history for this message
Grzegorz "McCartney" Oledzki (grzegon) wrote :

The same happens on slightly different motherboard (Gigabyte GA-G33-DS3R), with the same processor (Q6600) and 8GB RAM. When running on 4GB it works fine. With 8GB it's terribly slow.

Similar behaviour can be observed when trying to run the Knoppix Live CD (32-bit, 4.0). With 8GB it doesn't boot correctly, stopping with "Can't find KNOPPIX filesystem, sorry. Dropping you into a very limited shell. Press reset button to quit".

Will try with the mem parameter.

Revision history for this message
Grzegorz "McCartney" Oledzki (grzegon) wrote :

I tried with the mem value (mem=8318M) and it was slow, so tried with (mem=7168M) and it booted quickly. But that hasn't solved the problem. Once I've started JBOSS on that server, the JBOSS was starting very slowly. So my gut feeling is that without the mem parameter the kernel puts itself into 'slow memory areas', and with the mem parameter we can make kernel put itself into 'fast memory', but other applications can be assigned 'slow memory'.
Will have to try with flashing the BIOS.

For others. If you don't have the floppy drive and still want to flash the BIOS, maybe you could find this useful:
http://www.linuxinsight.com/how-to-flash-motherboard-bios-from-linux-no-dos-windows-no-floppy-drive.html
I'ven't tried it yet, but it seems to be a good idea.

Revision history for this message
Grzegorz "McCartney" Oledzki (grzegon) wrote :

Flashing the BIOS has helped (luckily Gigabyte provide a DOS flashing utility) and now booting with 8GB system memory installed, and no "mem=" parameters, makes the system behave correctly. So I would consider that bug not a kernel bug, but a hardware issue indeed.

Revision history for this message
Brian Murray (brian-murray) wrote :

I am assigning this bug to the 'ubuntu-kernel-team' per their bug policy. For future reference you can learn more about their bug policy at https://wiki.ubuntu.com/KernelTeamBugPolicies .

Changed in linux-source-2.6.22:
assignee: nobody → ubuntu-kernel-team
Revision history for this message
Launchpad Janitor (janitor) wrote : This bug is now reported against the 'linux' package

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this bug to the new "linux" package. However, development has already began for the upcoming Intrepid Ibex 8.10 release. It would be helpful if you could test the upcoming release and verify if this is still an issue - http://www.ubuntu.com/testing . If the issue still exists, please update this report by changing the Status of the "linux" task from "Incomplete" to "New". We appreciate your patience and understanding as we make this transition. Thanks!

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
RichardNeill (ubuntu-richardneill) wrote :

Thanks for your message. I've just had a chance to try it on Alpha 6. The bug is gone, as far as I can tell. At any rate, repeating the original test (for ((i=0;i<10000;i++)); do echo $i > /dev/null ;done), completes in about 1/2 second, which is now perfectly OK.
Thanks for your help - Richard

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Thanks Richard.

I'm going to go ahead and tentatively mark this "Fix Released" for Intrepid. If you see any regressions with this bug prior to Intrepid's final release please feel free to reopen by setting the status back to "New". Thanks.

Changed in linux:
status: Incomplete → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Changed in linux:
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.