SATA drive freezes when using LVM over dm-crypt

Bug #156669 reported by Jens P
12
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
linux-source-2.6.22 (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Hi,

I can reproduce this bug on various kernels/systems (including Debian stable, Debian testing and Kubuntu 7.10) and I am a bit unsure, if this is a SATA driver, a dm-crypt devicemapper or a LVM problem.

After initial booting and with complete (non-network) based installation of Kubuntu 7.10 drive access works normal. After a few minutes, the system freezes up completely, showing heaps of SATA errors in the logfile (see below). After a few extra rounds producing errors, the drive then reactivates and works normal for as long as I have been using the system (a few hours). As I said before, this behavior is reproducible over several kernel versions and Distributions. I am using a LVM over dm-crypt installation with the following layout:

SCSI1 (0,0,0) #1 primary 67.1 GB ntfs
                #2 primary 510 MB ext2 /boot
                #3 primary 182.4 GB crypto (sda3_crypt)

 Encrypted Volume (sda3_crypt) 182.4 GB Linux device mapper
     #1 182.4 GB lvm
 LVM VG disk1, LV home 107.4 GB Linux device mapper
     #1 107.4 GB ext2 /home
 LVM VG disk1, LV swap 2.1 GB Linux device mapper
     #1 2.1GB swap swap
 LVM VG disk1, LV system 72.9 GB Linux device mapper
     #1 72.9 GB ext2 /

Error messages as reported by dmesg:

[ 0.000000] Linux version 2.6.22-14-generic (buildd@palmer) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2))
... cont.
[ 22.606745] ata1: SATA max UDMA/133 cmd 0xf8860480 ctl 0xf88604a0 bmdma 0x0001d400 irq 18
[ 22.606748] ata2: SATA max UDMA/133 cmd 0xf8860580 ctl 0xf88605a0 bmdma 0x0001d408 irq 18
[ 23.071589] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 23.116317] ata1.00: ATA-7: ST3250620NS, 3.AEG, max UDMA/133
[ 23.116319] ata1.00: 488397168 sectors, multi 1: LBA48 NCQ (depth 31/32)
[ 23.182879] ata1.00: configured for UDMA/133
[ 23.491089] ata2: SATA link down (SStatus 0 SControl 300)
[ 23.491170] scsi 0:0:0:0: Direct-Access ATA ST3250620NS 3.AE PQ: 0 ANSI: 5
[ 23.491177] ata1: bounce limit 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
... cont.
[ 23.499267] sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
[ 23.499277] sd 0:0:0:0: [sda] Write Protect is off
[ 23.499279] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 23.499289] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 23.499322] sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
[ 23.499328] sd 0:0:0:0: [sda] Write Protect is off
[ 23.499329] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 23.499337] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 23.499341] sda: sda1 sda2 sda3
[ 23.514090] sd 0:0:0:0: [sda] Attached SCSI disk
[ 23.517291] sd 0:0:0:0: Attached scsi generic sg0 type 0
... cont.
[ 113.972000] ata1: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0 next cpb idx 0x0
[ 113.972000] ata1: CPB 1: ctl_flags 0x1f, resp_flags 0x2
[ 113.972000] ata1: CPB 2: ctl_flags 0x1f, resp_flags 0x2
[ 113.972000] ata1: CPB 3: ctl_flags 0x1f, resp_flags 0x2
[ 113.972000] ata1: CPB 4: ctl_flags 0x1f, resp_flags 0x2
[ 113.972000] ata1: timeout waiting for ADMA IDLE, stat=0x400
[ 113.972000] ata1: timeout waiting for ADMA LEGACY, stat=0x400
[ 113.972000] ata1.00: exception Emask 0x0 SAct 0x1e SErr 0x200000 action 0x2 frozen
[ 113.972000] ata1.00: cmd 61/00:08:b5:d1:28/02:00:17:00:00/40 tag 1 cdb 0x0 data 262144 out
[ 113.972000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 113.972000] ata1.00: cmd 61/78:10:b5:d3:28/01:00:17:00:00/40 tag 2 cdb 0x0 data 192512 out
[ 113.972000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 113.972000] ata1.00: cmd 61/08:18:a5:45:27/00:00:17:00:00/40 tag 3 cdb 0x0 data 4096 out
[ 113.972000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 113.972000] ata1.00: cmd 60/10:20:15:4e:b9/00:00:1a:00:00/40 tag 4 cdb 0x0 data 8192 in
[ 113.972000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 114.284000] ata1: soft resetting port
[ 114.440000] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 114.572000] ata1.00: configured for UDMA/133
[ 114.576000] ata1: EH complete
[ 114.576000] sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
[ 114.576000] sd 0:0:0:0: [sda] Write Protect is off
[ 114.576000] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 114.576000] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
... cont.
[ 233.248000] ata1: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x4 next cpb idx 0x0
[ 233.248000] ata1: CPB 0: ctl_flags 0x1f, resp_flags 0x0
[ 233.248000] ata1: CPB 1: ctl_flags 0x1f, resp_flags 0x0
[ 233.248000] ata1: CPB 2: ctl_flags 0x1f, resp_flags 0x0
[ 233.248000] ata1: CPB 3: ctl_flags 0x1f, resp_flags 0x0
[ 233.248000] ata1: CPB 4: ctl_flags 0x1f, resp_flags 0x0
[ 233.248000] ata1: timeout waiting for ADMA IDLE, stat=0x400
[ 233.248000] ata1: timeout waiting for ADMA LEGACY, stat=0x400
[ 233.248000] ata1.00: NCQ disabled due to excessive errors
[ 233.248000] ata1.00: exception Emask 0x0 SAct 0x1f SErr 0x0 action 0x2 frozen
[ 233.248000] ata1.00: cmd 60/08:00:25:4a:df/00:00:1a:00:00/40 tag 0 cdb 0x0 data 4096 in
[ 233.248000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 233.248000] ata1.00: cmd 60/10:08:15:4e:b9/00:00:1a:00:00/40 tag 1 cdb 0x0 data 8192 in
[ 233.248000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 233.248000] ata1.00: cmd 61/08:10:a5:45:27/00:00:17:00:00/40 tag 2 cdb 0x0 data 4096 out
[ 233.248000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 233.248000] ata1.00: cmd 61/78:18:b5:d3:28/01:00:17:00:00/40 tag 3 cdb 0x0 data 192512 out
[ 233.248000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 233.248000] ata1.00: cmd 61/00:20:b5:d1:28/02:00:17:00:00/40 tag 4 cdb 0x0 data 262144 out
[ 233.248000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 233.560000] ata1: soft resetting port
[ 233.716000] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 233.828000] ata1.00: configured for UDMA/133
[ 233.828000] ata1: EH complete
[ 233.896000] sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
[ 233.904000] sd 0:0:0:0: [sda] Write Protect is off
[ 233.904000] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 233.920000] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

From there (s)ata runs without further problems.

Steps to reproduce: Install any kernel > 2.6.18 and use a SATA disk with LVM over dm-crypt as documented above. Make use of disk. Voila.

Please advise what I can do to help narrow down/solve the problem. If it was me, I'd say this bug is "critical", since I do not know if the drive access works correctly and thus whether data storage is reliable.

Thank you for your support!

Cheers

Jens

PS: Of course I overlooked the line "[ 233.248000] ata1.00: NCQ disabled due to excessive errors". Could it be an NCQ problem? I have read there are several blacklisted drives in the driver source:

libata-core.c: /* NCQ hard hangs device under heavier load, needs hard power cycle */
libata-core.c: { "Maxtor 6B250S0", "BANC1B70", ATA_HORKAGE_NONCQ },
libata-core.c: { "HTS541060G9SA00", "MB3OC60D", ATA_HORKAGE_NONCQ, },
libata-core.c: { "HTS541080G9SA00", "MB4OC60D", ATA_HORKAGE_NONCQ, },
libata-core.c: { "HTS541010G9SA00", "MBZOC60D", ATA_HORKAGE_NONCQ, },
libata-core.c: { "HTS541680J9SA00", "SB2IC7EP", ATA_HORKAGE_NONCQ, },
libata-core.c: { "HTS541612J9SA00", "SBDIC7JP", ATA_HORKAGE_NONCQ, },
libata-core.c: { "HTS722012K9A300", "DCCOC54P", ATA_HORKAGE_NONCQ, },
libata-core.c: { "HTS541616J9SA00", "SB4OC70P", ATA_HORKAGE_NONCQ, },
libata-core.c: { "WDC WD740ADFD-00NLR1", NULL, ATA_HORKAGE_NONCQ, },
libata-core.c: { "FUJITSU MHV2080BH", "00840028", ATA_HORKAGE_NONCQ, },

Mine is a Seagate ST3250620NS. Maybe it needs to be added to the list? Unfortunately there seems to be no kernel parameter to disable NCQ at boottime. How do I forward this bug to the libata/sata_nv guys with "possible NCQ issue"?

Jens P (jplaunchpad)
description: updated
Revision history for this message
Jens P (jplaunchpad) wrote :

The problem *is* related to NCQ, however, I believe it is *not* the drive. I am running into the same problems on a different machine with a SAMSUNG SP2504C SATA drive.

--- snip ---
[ 0.000000] Linux version 2.6.22-14-generic (buildd@palmer) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Sun Oct 14 23:05:12 GMT 2007 (Ubuntu 2.6.22-14.46-generic)
...
[ 4.936000] sata_nv 0000:00:08.0: version 3.4
[ 4.936000] sata_nv 0000:00:08.0: Using ADMA mode
...

[ 800.196000] ata1: timeout waiting for ADMA IDLE, stat=0x400
[ 800.196000] ata1: timeout waiting for ADMA LEGACY, stat=0x400
[ 800.196000] ata1.00: exception Emask 0x0 SAct 0x7ffff SErr 0x200000 action 0x2 frozen
[ 800.196000] ata1.00: cmd 61/00:00:1c:03:3c/04:00:1b:00:00/40 tag 0 cdb 0x0 data 524288 out
[ 800.196000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 800.196000] ata1.00: cmd 61/00:08:1c:ff:3b/02:00:1b:00:00/40 tag 1 cdb 0x0 data 262144 out
[ 800.196000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 800.196000] ata1.00: cmd 61/00:10:1c:01:3c/02:00:1b:00:00/40 tag 2 cdb 0x0 data 262144 out
...
[ 918.972000] ata1: timeout waiting for ADMA IDLE, stat=0x400
[ 918.972000] ata1: timeout waiting for ADMA LEGACY, stat=0x400
[ 918.972000] ata1.00: NCQ disabled due to excessive errors
[ 918.972000] ata1.00: exception Emask 0x0 SAct 0x1fffff SErr 0x0 action 0x2 frozen
[ 918.972000] ata1.00: cmd 60/08:00:f4:75:14/00:00:1b:00:00/40 tag 0 cdb 0x0 data 4096 in
[ 918.972000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 918.972000] ata1.00: cmd 61/08:08:bc:1e:fe/00:00:1a:00:00/40 tag 1 cdb 0x0 data 4096 out
[ 918.972000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 918.972000] ata1.00: cmd 61/40:10:ac:1d:4a/00:00:1a:00:00/40 tag 2 cdb 0x0 data 32768 out
[ 918.972000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
...
[ 919.284000] ata1: soft resetting port
[ 919.440000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 919.960000] ata1.00: configured for UDMA/133
[ 919.960000] ata1: EH complete
[ 920.056000] sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
[ 920.056000] sd 0:0:0:0: [sda] Write Protect is off
[ 920.056000] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 920.056000] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

... and the drive is working again.

The common denominator here is: both machines use the NForce4 chipset, both use the sata_nv driver and both are trying to use LVM over dm-crypt.

I am of course not sure, but it is possible that there might be a bug combining sata_nv, NCQ and LVM mapped devices. I wish someone else would look into this issue. The drive does make funky noises during the error phase, but I am not sure if that is healthy at all.

Cheers

Jens

Revision history for this message
Dustin Widmann (blackwaltz) wrote :

This looks startlingly similar to the problem I've been having https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/121273

Again, as you said, the common denominator seems to be the nforce4 chipset+sata_nv+libata.

Have you tried it without using LVM? (I have, and the problem still seemed to occur), and strangely enough, I also tried it without dm_crypt, using truecrypt instead and it still happened.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Hardy Heron Alpha series is currently under development and contains an updated version of the kernel. It would be helpful if you could test the latest Hardy Alpha release: http://www.ubuntu.com/testing . You should be able to then test the new kernel via the LiveCD. If you can, please verify if this bug still exists or not and report back your results. We'll keep this report open against the actively developed kernel bug against 2.6.22 this will be closed. Thanks.

Changed in linux:
status: New → Incomplete
Changed in linux-source-2.6.22:
status: New → Won't Fix
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Simon (mailing-lists) wrote :

I've not reported the bug initially, but as far as I am concerned, the problem was gone once I installed Hardy. I've no possibility to test against Ibex, though.

Revision history for this message
Duane Hinnen (duanedesign) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. You reported this bug a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue for you. Can you try with the latest Ubuntu release? Thanks in advance.

Revision history for this message
Jens P (jplaunchpad) wrote : Re: [Bug 156669] Re: SATA drive freezes when using LVM over dm-crypt

On Mon, 23 Feb 2009 05:07:24 -0000
duanedesign <email address hidden> wrote:

> Thank you for taking the time to report this bug and helping to make
> Ubuntu better. You reported this bug a while ago and there hasn't been
> any activity in it recently. We were wondering if this is still an
> issue for you. Can you try with the latest Ubuntu release? Thanks in
> advance.

Thank you for the reminder. As far as I can tell, the bug has been
fixed in all current versions of *buntu.

Cheers

Jens Prüfer

Revision history for this message
Duane Hinnen (duanedesign) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. You reported this bug a while ago and there hasn't been any activity in it recently. We are closing this bug because it no longer appears to be an issue for the bug reporter. Please feel free to report any future bugs you may experience.

Changed in linux:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.