poor disk performance during heavy io

Bug #43484 reported by Allison Karlitskaya
36
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
linux-source-2.6.15 (Ubuntu)
Invalid
Medium
Unassigned
linux-source-2.6.22 (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Binary package hint: linux-source-2.6.15

1. Start a copy of a large file between two disks (this happens when I rsync my home directory to the backup drive for example).

2. Disk performance (in terms of latency) is REALLY bad now.

For example, it takes about 10-15 seconds to :wq from a vim session. Music does not skip at all (using muine).

This did not happen with breezy. I'm not sure when the problem started or why.

I've tried switching the elevator to cfq. It didn't help.

Tags: cft-2.6.27
Revision history for this message
Allison Karlitskaya (desrt) wrote :

btw: ext3 all around.

Revision history for this message
DiegoCG (diegocg) wrote :

Could you check if UDMA is enabled in the misbehaving version? Also, could you test how many time takes to copy that file with both kernels?

Revision history for this message
Allison Karlitskaya (desrt) wrote :

The disks are serialata so I don't know how to check for UDMA or if this question is applicable.

Both drives are 250GB Western Digital drives with 8 or 16 MB cache (can't remember).

Model: WDC WD2500JD-00H

Here's a test case that always works for me:

desrt@moonpix:~$ dd if=/dev/zero of=big bs=4M count=1k

while that's running, try to do some stuff. Open up vi and save a small file to the disk. It's _really slow_. It reliably takes as much as 20-30 seconds to save a one-line file to disk.

One note: if you save in vi just after starting the 'dd' command then it only takes a few seconds. It starts to get longer and longer as 'dd' has been running for a while.

One idea:

Maybe the disk driver is submitting a whole whack of tagged entries to the drive in order to latency hide. When the new request comes in (from vi) then no matter what the IO elevator in the kernel says to do the new request gets stuck behind all the other tagged requests already in queue.

Another possibility is that the drive is using the on-disk cache memory to implement its own IO elevator that is grouping requests too agressively (resulting in vi being croweded out).

Revision history for this message
Allison Karlitskaya (desrt) wrote :

Also tried deadline elevator. No love.

Whatever the problem is, it seems to be below the level of the kernel IO scheduler (perhaps disk driver (= ata_piix) or hardware (= intel ICH5)).

Revision history for this message
DiegoCG (diegocg) wrote :

While your idea may have sense, it shouldn't be reproducible with other io schedulers, IMO. Maybe it's a SATA bug, dunno :/ Does the dmesg differs in something in both kernels (not just SATA, but other things aswell)? Your test (dd if=/dev/zero of=big bs=4M count=1k) takes more time in the misbehaving version or the proble is just bad latency when using vim and other "interactive" tools?

Revision history for this message
Gareth Fitzworthington (mapping-gp-deactivatedaccount) wrote :

This bug has had no activity for a considerable period. This is a check to see if there is still interest in investigating this bug report.
Is this still an issue with later releases?

Does the following help?
http://linux-ata.org/faq.html#combined
The slow down as a result of the PATA/SATA combination may be causing at least some of the above mentioned problems.
This will only affect those with Intel chipsets.

Changed in linux-source-2.6.15:
status: New → Incomplete
Revision history for this message
J. Bruce Fields (bfields-fieldses) wrote :

This is indeed very annoying. I typically see it when trying to edit files while doing a big kernel compile. For a hypothesis, (probably easy to confirm by strace'ing vim), see the email thread appended to this article:

http://kerneltrap.org/node/14148

"it turns out that it's due to vim doing an occasional fsync not only on writeout, but during normal use too. "set nofsync" in the .vimrc solves this problem."

But I haven't tested that. It sounds like it might also help to mount with data=writeback (but see "man mount" for consistency warning), or to switch to a filesystem other than ext3.

Revision history for this message
Gareth Fitzworthington (mapping-gp-deactivatedaccount) wrote :

Thanks Bruce.
What kernels have you observed this against?

Revision history for this message
J. Bruce Fields (bfields-fieldses) wrote :

I don't recall ever finding a kernel that this was *not* a problem with, so I suspect the above-referenced discussion is correct, and that this is to some degree inherent to the ext3 design.

But, to give a specific example, I can reproduce the problem with whatever's in current Gutsy; uname -a says:

        Linux pig 2.6.22-14-generic #1 SMP Tue Feb 12 07:42:25 UTC 2008 i686 GNU/Linux

I also trying changing the mount options (adding noatime, nodiratime, and data=writeback), ran a compile and did some file editing, and confirmed that the worst of the latency is gone.

Changed in linux-source-2.6.22:
status: New → Incomplete
Revision history for this message
Gareth Fitzworthington (mapping-gp-deactivatedaccount) wrote :

Confirming.
Bruce, thanks for the information.
The link you supplied earlier (attached again below) indicates that this issue is well known upstream (Linus T. no less).
http://kerneltrap.org/node/14148

+++++++++++++
Note to Kernel Team:
1/ I can find no upsteam bug or enhancement request with regards to this matter.
2/ This does seem to be an issue of some importance & inconvenience with heavy io loads.
3/ Probably best summarised as "ext3 + fsync + heavy io = very poor disk performance".
4/ A workaround appears to be: adding mount options [noatime, nodiratime, data=writeback]
5/ Workaround for vim also indicated in above discussion.
6/ This is a fundamental design/functionality issue - maybe better handled as a Blueprint ( https://blueprints.launchpad.net/ubuntu ).

Passing to Kernel Team to consider.

Changed in linux-source-2.6.22:
assignee: nobody → kernel-team
status: Incomplete → Confirmed
Changed in linux-source-2.6.15:
status: Incomplete → Confirmed
Changed in linux-source-2.6.24:
status: New → Incomplete
Changed in linux:
status: Incomplete → Confirmed
assignee: nobody → kernel-team
Changed in linux-source-2.6.22:
assignee: kernel-team → ubuntu-kernel-team
Changed in linux:
assignee: kernel-team → ubuntu-kernel-team
Revision history for this message
Gareth Fitzworthington (mapping-gp-deactivatedaccount) wrote :

Upstream report:
http://bugzilla.kernel.org/show_bug.cgi?id=9546
This comment by the assigned person summarises:
http://bugzilla.kernel.org/show_bug.cgi?id=9546#c21

Also note that this issue appears to becoming more of an issue for 'ordinary' users as a result of Firefox 3 using sqlite. See here:
https://bugzilla.mozilla.org/show_bug.cgi?id=421482
and associated launchpad bug : Bug #221009

Revision history for this message
Dominique Pellé (dominique-pelle) wrote :

Journaling file systems (such as ext3) are slow at fsync.

See what Linus writes in this thread about this particular problem
(http://kerneltrap.org/node/14148)

=== BEGIN QUOTE ===
> hm, it turns out that it's due to vim doing an occasional fsync not only
> on writeout, but during normal use too. "set nofsync" in the .vimrc
> solves this problem.

Yes, that's independent. The fact is, ext3 *sucks* at fsync. I hate hate
hate it. It's totally unusable, imnsho.

The whole point of fsync() is that it should sync only that one file, and
avoid syncing all the other stuff that is going on, and ext3 violates
that, because it ends up having to sync the whole log, or something like
that. So even if vim really wants to sync a small file, you end up waiting
for megabytes of data being written out.

I detest logging filesystems.
=== [ END QUOTE ] ===

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message
Sergio Zanchetta (primes2h) wrote :

The 18 month support period for Gutsy Gibbon 7.10 has reached its end of life -
http://www.ubuntu.com/news/ubuntu-7.10-eol . As a result, we are closing the
linux-source-2.6.22 kernel task. It would be helpful if you could test the
new Jaunty Jackalope 9.04 release and confirm if this issue remains -
http://www.ubuntu.com/getubuntu/releasenotes/904overview. If the issue still exists with the Jaunty
release, please update this report by changing the Status of the "linux (Ubuntu)"
task from "Incomplete" to "New". Also please be sure to run the command below
which will automatically gather and attach updated debug information to this
report. Thanks in advance.

apport-collect -p linux-image-2.6.28-11-generic 43484

Changed in linux-source-2.6.22 (Ubuntu):
status: Confirmed → Won't Fix
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Changed in linux-source-2.6.15 (Ubuntu):
status: Confirmed → Invalid
Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.