Write load on DM-Crypt LUKS partition (reiserfs and ext3) jams system

Bug #82528 reported by Ulrich Lukas
18
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Medium
linux (Ubuntu)
Fix Released
Undecided
Unassigned
linux-source-2.6.20 (Ubuntu)
Won't Fix
Medium
Unassigned

Bug Description

Binary package hint: linux-image-2.6.20-5-generic

I'm running Kubuntu Feisty, installed via the i386-desktop ISO-image snapshot of 2007-01-11, with the latest apt-get upgrades as of today.
The recently purchased computer has an AMD "Sempron 3400+" CPU (AM2-socket) installed on a mainboard with Nvidia n-force 6100-430 chipset.
The harddrive is a new 400GB SATA disk connected to the onboard SATA-terminal (kernel module: sata_nv).

In addition to three smaller and unencrypted partitions (system, swap and home; still unencrypted because it's a testing setup), I created a 322-GB partition for encrypted data storage.
This partition is LUKS-formatted via cryptsetup/DM-Crypt and has a Reiserfs file system.

The problem is the following:

Whenever I copy a slightly bigger file (e.g. 500MB) /to/ the dm-crypt-partition, the system is almost blocked for the duration of the copying process.
This means that not only the system responsiveness gets rather low, but that, approx. once every two seconds, the system is completely "stalled" for a second.
Even the mouse pointer and Amarok sound playback periodically stop completely.

It doesn't matter if the source is one of the unencrypted partitions on the same harddisk, a CD-ROM or even an NFS-mount from a (compared to local copying) relatively slow remote server.

This is particularly undesireable, e.g. if a database system for sensitive customer data is using the partition, or if the recording of video live-streams from surveillance cameras blocks the rest of the system.

And especially since I thought the new standard CFQ-IO-scheduler would prevent those problems.

By observing the write performance of my system, I could determine an anomaly which also seems to be connected with the above described lock-ups:

Data transfer rates obtained via " time cp 'sourcelocation/testfile' 'destination' " with a testfile of 1GByte random data.

Write performance:

copy:
  from one unencrypted filesystem to annother unencrypted filesystem on the same harddrive: 38MByte/sec
  from unencrypted filesystem to encrypted filesystem on the same harddrive: 16MByte/sec

  from 100-MBit/sec LAN NFS-mount to unencrypted filesystem on local machine: 11.2MByte/sec (maximum for 100MBit-LAN)
  from 100-MBit/sec LAN NFS-mount to encrypted filesystem on local machine: 6.5MByte/sec
  (even though 16MB/sec were possible from the local source before, and the LAN is also capable of 11.2MB/sec [!]
    This is the anomaly I meant before.)

Read performance from encrypted partition (" time cp 1-GB-testfile.dat /dev/null " or even " time cp 1-GB-testfile.dat /tmp " (no RAM tmpfs))
is perfect with 28 respectively 26 MByte/sec; no system lock-ups.

To further investigate the issue, I have compiled the vanilla 2.6.19.2 kernel from kernel.org. Result: same behaviour.

If this is an upstream kernel bug, I'm sorry, but I am not that proficient with programming and the kernel and I couldn't determine how much there are userspace programs or even the default configuration details of the Ubuntu distribution are involved.

I'm attaching the output of "dmesg" and "lspci" for my system (running linux-image-2.6.20-5-generic).

Revision history for this message
Ulrich Lukas (ulrich-lukas) wrote :
Revision history for this message
Ulrich Lukas (ulrich-lukas) wrote :
Revision history for this message
Ben Collins (ben-collins) wrote :

This issue seems to be that you are doing it on an encrypted filesystem. The kernel obviously has to perform a lot of IO intermixed with computational overhead to do this.

I don't know if the performance problem is expected or not. It all depends on what encryption method you are using, etc.

Revision history for this message
ccc1 (cllccl-deactivatedaccount) wrote :

I can confirm that bug.
Happens also in edgy with an encrypted ext3 partition.

a workaround is to renice the kjournald process i.e. renice -p `pgrep kjournald`

Tim Gardner (timg-tpi)
Changed in linux-source-2.6.20:
assignee: nobody → kernel-team
status: Unconfirmed → Needs Info
Changed in linux-source-2.6.20:
status: Needs Info → Confirmed
importance: Undecided → Medium
Changed in linux-source-2.6.20:
assignee: kernel-team → ubuntu-kernel-team
Revision history for this message
Jan (jan23) wrote :

I can confirm this on Feisty with a ext3 FS.

Revision history for this message
ccc1 (cllccl-deactivatedaccount) wrote :

Just upgrade to feisty and the bug is still there. The workaround i posted above helps a bit, but not 'as much' as in edgy. i.e. the mouse moves jumpy, audio hangs sometimes. Without the workaround the whole system stalls for 2 seconds maybe every 5 seconds ...

i attached the output of dmesg & lspci ...

Revision history for this message
ccc1 (cllccl-deactivatedaccount) wrote :
Revision history for this message
ccc1 (cllccl-deactivatedaccount) wrote :
Revision history for this message
Ulrich Lukas (ulrich-lukas) wrote :
Revision history for this message
ccc1 (cllccl-deactivatedaccount) wrote :

doesn't seems to be a bug in dm-crypt.
i deleted my encrypted partition and still have the problem that the system hangs sometimes, although not as bad as it has been with the encrypted partition.
happens when the cpu-load is high and there is a lot of harddisk activity. Under windows xp i haven't expierienced that ...

is there anyway to debug that issue?

Revision history for this message
tdn (spam-thomasdamgaard) wrote :

I can confirm this bug.
I use LUKS with ext3 on Kubuntu Feisty.

When I have moderate disk activity the system is almost unusable.
I think this is a scheduling problem.
Although extra computations are needed when using encrypted filesystems, these extra computations should not have such a high priority that it should be impossible to use the computer for audio, video, moving the mouse around, etc.

I have a 1.8GHz Pentium M processor. That should be enough to handle the extra computations needed for the encryption while decoding ogg audio or playing video.

Revision history for this message
tdn (spam-thomasdamgaard) wrote :

Someone suggested to renice the kjournald process as a workaround.
Is that really safe?

Revision history for this message
Jarmo Ilonen (trewas) wrote :

I can confirm this on current gutsy. This is very annoying, makes using encrypted partitions nearly impossible (core2duo and 500GB sata disk). With the dual-core processor the computer actually performs fine, the wait-time reported by top is 50% as only one core is being hogged, but the write performance to a dm-crypt encrypted ext3 partition is absolutely horrible (I didn't measure but the average speed seemed to be below 1MB/s). The problem has nothing to do with encryption overhead itself, the computer is idling most of the time without any cpu-load and writing to hd only occasionally. Writing to a plain unencrypted partition is going >50MB/s on the same disk.

Interestingly, write performance is fine to the encrypted partition when there is some other writing going on to a plain ext3 partition at the same time.

Revision history for this message
tdn (spam-thomasdamgaard) wrote :

Maybe this bug should be renamed from "Write load on DM-Crypt LUKS partition with reiserfs jams system" to "Write load on DM-Crypt LUKS partition jams system". I think that it is confirmed that this has nothing to do with reiserfs.

Revision history for this message
Andrew (adhenry) wrote :

I have exactly the same problem on a newly installed Debian Etch system that was fully encrypted (except /boot) using the guided partitioner. I have an external WD MyBook 320GB 1394a disk that was unencrypted. Performance was fine to the 1394a disk (both read/write speeds as well as LAN transfer rates) until I encrypted it then I get exactly the same issues:

LAN performance both read and write is almost exactly halved. 11MB to 6MB ethernet and 2.5MB to 1.5MB read wireless.
Writes to 1394a the disk overload the CPU so that mouse pointer hangs every few seconds.
Writes to 1394a disk don't write at a steady rate, but suspend (no CPU) then after a while continue. kjournal and kcryptd/0 are the 2 processes taking all the CPU, but when the write suspends, they drop to zero.
Read rate is constant but rate is halved.

Does not seem Ubuntu related or related to that particular kernel as Etch does not use same kernel as Feisty or Gutsy.

Revision history for this message
Andrew (adhenry) wrote :

as this does not seem Ubuntu related, I think its very unlikely Canonical will do anything with this bug. Probably better putting it on dmcrypt or kernel lists?

Revision history for this message
Ulrich Lukas (ulrich-lukas) wrote : Re: [Bug 82528] Re: Write load on DM-Crypt LUKS partition with reiserfs jams system

Hi,

I already reported it on the Linux kernel Bugzilla.

Have you tried the latest release-version of the Linux kernel?

Please try it and compile 2.6.24-rc4 if you can. This bug is supposed to
be fixed with that kernel version.

Sadly, I own a dual-core-CPU now, and this makes a bit hard for me
(non-programmer) to verify if the bug is actually fixed.

Revision history for this message
tdn (spam-thomasdamgaard) wrote : Re: Write load on DM-Crypt LUKS partition with reiserfs jams system

> Please try it and compile 2.6.24-rc4 if you can. This bug is supposed to
> be fixed with that kernel version.

Sounds great. But when can I expect this to be fixed in Ubuntu? Will the next release of Ubuntu use this kernel? Or will this fix be imported into Ubuntu, so that I can expect a kernel update in some time?

> Sadly, I own a dual-core-CPU now, and this makes a bit hard for me
> (non-programmer) to verify if the bug is actually fixed.

I am just curious as to why you cannot verify this on a dual-core CPU? Does this but not affect dual-core CPUs?
If not, can someone explain why it doesn't?

Revision history for this message
Ulrich Lukas (ulrich-lukas) wrote : Re: [Bug 82528] Re: Write load on DM-Crypt LUKS partition with reiserfs jams system

> I am just curious as to why you cannot verify this on a dual-core CPU?

It's just that I no longer have the "hard" hangs I had before, since I
upgraded to the fast dual-core-CPU.

I'm just a user, so I don't know if is because of the new CPU exactly,
but it was a real issue for me then.

Revision history for this message
Andrew (adhenry) wrote :

Ulrich Lukas wrote:
> I already reported it on the Linux kernel Bugzilla.
>
> Have you tried the latest release-version of the Linux kernel?
>
> Please try it and compile 2.6.24-rc4 if you can. This bug is supposed to
> be fixed with that kernel version.
I compiled 2.6.23.9 and this seems to have fixed the problem. disk
transfers still take 100% CPU according to gnome-system-monitor (but
kcryptd/0 takes 60% according to top), but disk activity is now fast and
constant.

Thanks for the tip!

--
GnuPG Key ID: ECB18ABA
Fingerprint: FDF3 91FC F5BC 1164 E217 315E 337E 219B ECB1 8ABA

Revision history for this message
Reinhard Tartler (siretart) wrote : Re: Write load on DM-Crypt LUKS partition with reiserfs jams system

according to the upstream bug tracker, this bug has been fixed in 2.6.24. Can someone who is affected by this bug confirm that it has been fixed by upgrading to hardy?

Revision history for this message
Christian Iversen (chrivers) wrote :

I have upgraded to the Hardy beta, and I am running with a 2.6.24-12-generic kernel currently.

The problem is _not_ fixed. I have dm-crypt with LUKS and ext3, and the problem is very easily felt. The whole system is very slow when doing anything on the disk. This is in total contrast to how the system worked before (no crypt on {dapper, edgy, gutsy}), but exactly like dm-crypt on gutsy.

I hope this problem will get looked at, because it makes the computer dreadfully slow, and the only reason I'm still using dm-crypt is because I care about security. For common use it is far too slow.

Changed in linux:
status: Unknown → Fix Released
Revision history for this message
Ulrich Lukas (ulrich-lukas) wrote : Re: [Bug 82528] Re: Write load on DM-Crypt LUKS partition with reiserfs jams system

Hi Christian,

I know, maybe you have already spent a little time with issues like
this, but could you try it again with a custom built 2.6.25-rc8 kernel?

(Don't use an earlyer 2.6.25-rc version than -rc8; there were some bugs
in the dm-specific code before)

There were a lot of changes concerning dm-crypt since 2.6.24; this could
be very interesting!

For me it's a bit difficult to test the performance issue on my machine,
but I've tested every kernel version.

Revision history for this message
Celso Pinto (cpinto) wrote : Re: Write load on DM-Crypt LUKS partition with reiserfs jams system

Right, I'm a bit hot headed now so bear with me: I've just shut down my laptop, which was applying Hardy's upgrades (152 IIRC), because as soon as it started to actually install the packages my desktop completely locked up. I put up with it for about *30 f****** minutes* until I decided I actually needed to get some work done instead of watching the disk access led blinking while kcryptd did it's stuff. Come on guys! Come on! I replaced openSUSE 10.3 which also had luks encryption and was a whole lot more responsive with Hardy to give it a shot. Yeah, I know this is beta software, but we're talking about a freaking LTS. How did this get past QA? You're one day away from gold and using encrypted partitions on a 3yr old machine (which, let me remind you, won't be decommissioned for yet another year since computer hardware has an amortization period of 4 years), or apparently anything with a single core, completely breaks the desktop. So, again, how did this get past QA?

Revision history for this message
Andrew (adhenry) wrote : Re: [Bug 82528] Re: Write load on DM-Crypt LUKS partition with reiserfs jams system

Celso Pinto wrote:
> Right, I'm a bit hot headed now so bear with me: I've just shut down my
> laptop, which was applying Hardy's upgrades (152 IIRC), because as soon
> as it started to actually install the packages my desktop completely
> locked up. I put up with it for about *30 f****** minutes* until I
> decided I actually needed to get some work done instead of watching the
> disk access led blinking while kcryptd did it's stuff. Come on guys!
> Come on! I replaced openSUSE 10.3 which also had luks encryption and was
> a whole lot more responsive with Hardy to give it a shot.
<snip rant>

I feel your pain man. Seems like they focus on other things, and this
is not a hot feature in the marketplace. To be fair though, isn't this
a kernel issue? Maybe Novell do extra fixes in their kernels to get
certain features working, whilst Canonical focuses on desktop stuff.
Suppose that if we want a working dmcrypt, then we need to stick to
Suse/Redhat. I swapped to CentOS5 on my server just because of this
issue (not dmcrypt, as it sucks as bad there cause its an old kernel,
but there is more focus on the server side than I think canonical does).

--andrew

Revision history for this message
tdn (spam-thomasdamgaard) wrote : Re: Write load on DM-Crypt LUKS partition with reiserfs jams system

If this works in Redhat and SUSE, isn't it trivial to just take use the same kernel patches as they do?
(I know nothing about it, so I assume that it isn't trivial. It was just a thought)

Does anyone know if this works in Debian?

Revision history for this message
Reinhard Tartler (siretart) wrote :

Just for the complaining crowd here: On my thinkpad X60s with everything on LVM (including root) on dm-crypt, I do not notice any 'jamming'. The system is responsive as ever, both on gutsy and hardy kernel. (I recently upgraded this laptop).

Revision history for this message
tdn (spam-thomasdamgaard) wrote :

Does anyone know if anything is happening on this bug?

It is still there and it is really annoying. I have just bought a Thinkpad T61p with Core 2 Duo CPU. I was hoping that maybe a second CPU core would make this problem less severe, but it did not help much.

Is there *anything* I can do to help getting this fixed? Do you need more information? How can I debug it further?

It make full disk encryption on Ubuntu laptops virtually impossible.

Revision history for this message
Launchpad Janitor (janitor) wrote : This bug is now reported against the 'linux' package

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this bug to the new "linux" package. However, development has already began for the upcoming Intrepid Ibex 8.10 release. It would be helpful if you could test the upcoming release and verify if this is still an issue - http://www.ubuntu.com/testing . If the issue still exists, please update this report by changing the Status of the "linux" task from "Incomplete" to "New". We appreciate your patience and understanding as we make this transition. Thanks!

Revision history for this message
Jarmo Ilonen (trewas) wrote :

At least for me this has been fixed in hardy. Writes to an encrypted partition go at full and constant speed, very much unlike in gutsy. The upstream bug (http://bugzilla.kernel.org/show_bug.cgi?id=8020) has been marked as fixed and the fix is in 2.6.24, which is in hardy.

As this bug report has quite many comments and at least one kernel bug causing it has been fixed, maybe this bug should be closed and a new one opened if someone is still seeing similar issues.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Jonathan Thomas (echidnaman) wrote :

Closing as fixed due to previous comments. Please open a new bug if you have a similar issue.

Changed in linux:
status: Incomplete → Fix Released
Revision history for this message
tdn (spam-thomasdamgaard) wrote :

I see that this bug is supposedly fixed in 8.10 Ibex, but shouldn't this be fixed in 8.04 as well? Considering that 8.04 is an LTS release?

Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message
slarti (3-launchpad1-20-slart-neverbox-com) wrote :

I still have this issue. When copying large amounts of data (rsync a directory tree with small and big files) to a luks/dm_crypt encrypted partition, the system freezes every 2-5 seconds for 2+ seconds. My system is a Core 2 Duo E8400 with 4 GB RAM and runs runs on Ubuntu 8.10 x64 (Intrepid) with the current standard kernel 2.6.27-11-generic #1 SMP Thu Jan 29 19:28:32 UTC 2009 x86_64 GNU/Linux. Renicing kjournald from -5 to 0 attenuates the situation a little bit -> keyboard input is more responsive, but switching between applications (or for example tabs in firefox) is still painfully slow.

Revision history for this message
Michael Zugelder (anu-xod-deactivatedaccount) wrote :

Extremely annoying bug, happend with hardy, intrepid and jaunty.

Current Configuration:
 - Jaunty x64
 - Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz, 4GiB DDR2
 - Encrypted Home (--cipher aes-xts-plain --key-size 256)
 - kernel.org custom 2.6.30 kernel

I simulate a write load (like copying large files):
while [ true ]; do dd if=/dev/zero of=dummy bs=1M count=4K; sync; rm dummy; sync; done
(sync to force ext4's delayed allocation to write out the blocks)

and now the system is extremely unresponsive.
Doing a simple `ls` takes over 10s, the cursor hangs, etc.
It looks like the reads to the disk arent get
Maybe interesting: iostat shows around 200,000 tps for the dm-5 device while less than 100 tps for the /dev/sda device.

In the next days I'll try to get more information about what causes the problem, checking different ciphers and block sizes, fiddling with the scheduler, etc. Renicing kcryptd didn't help.

Changed in linux:
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.