Hard disk write/read freezes for 10 seconds several times in session

Bug #569680 reported by Pēteris Krišjānis
90
This bug affects 16 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Low
Unassigned

Bug Description

After booting and loging in, computer freezes for 10 or so seconds and dmesg has this. I think - but not sure - this wasn't a problem before updates around beta2, but certainly it is regression from Karmic. I tried to report this with ubuntu-bug, but Ubuntu claimed that newest kernel isn't proper Ubuntu package. Therefore logs as requested follows.

General Hardware info: Athlon XP 1800+, 1 GB memory

This is snippet from dmesg:
[ 81.816082] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 81.816093] ata1.00: failed command: WRITE DMA
[ 81.816102] ata1.00: cmd ca/00:08:a0:f0:15/00:00:00:00:00/eb tag 0 dma 4096 out
[ 81.816104] res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 81.816108] ata1.00: status: { DRDY }
[ 81.816120] ata1: hard resetting link
[ 82.136057] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 82.160401] ata1.00: configured for UDMA/100
[ 82.160411] ata1.00: device reported invalid CHS sector 0
[ 82.160427] ata1: EH complete
[ 140.820019] ata1: drained 32768 bytes to clear DRQ.
[ 140.851302] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x80000 action 0x6 frozen
[ 140.851312] ata1: SError: { 10B8B }
[ 140.851317] ata1.00: failed command: READ DMA
[ 140.851326] ata1.00: cmd c8/00:00:f8:0e:77/00:00:00:00:00/eb tag 0 dma 131072 in
[ 140.851327] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 140.851331] ata1.00: status: { DRDY }
[ 140.851343] ata1: hard resetting link
[ 141.168063] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 141.192478] ata1.00: configured for UDMA/100
[ 141.192488] ata1.00: device reported invalid CHS sector 0
[ 141.192504] ata1: EH complete
[ 172.326591] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 172.326600] ata1.00: BMDMA2 stat 0x8652001
[ 172.326607] ata1.00: failed command: WRITE DMA
[ 172.326616] ata1.00: cmd ca/00:20:58:ce:55/00:00:00:00:00/eb tag 0 dma 16384 out
[ 172.326618] res 51/04:20:58:ce:55/00:00:00:00:00/eb Emask 0x1 (device error)
[ 172.326622] ata1.00: status: { DRDY ERR }
[ 172.326625] ata1.00: error: { ABRT }
[ 172.388500] ata1.00: configured for UDMA/100
[ 172.388534] ata1: EH complete

Revision history for this message
Pēteris Krišjānis (pecisk-gmail) wrote :
Revision history for this message
Pēteris Krišjānis (pecisk-gmail) wrote :
Revision history for this message
Pēteris Krišjānis (pecisk-gmail) wrote :
Revision history for this message
Pēteris Krišjānis (pecisk-gmail) wrote :
Revision history for this message
Pēteris Krišjānis (pecisk-gmail) wrote :

Post scriptum: there are numerous references about 2.6.34 kernel with such issue, such as http://comments.gmane.org/gmane.linux.ide/45166

tags: added: kj-triage
Revision history for this message
itchy8me (itchy8me) wrote :

i'm getting the same with the latest updates from ubuntu netbook remix 10.04 running on a aspire one.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Pēteris,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 569680

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
phaphe (phaphe) wrote :

I can confirm the bug. Problem started after upgrade from Ubuntu 8.04 LTS to Ubuntu 10.04. Sometimes there is only short freeze, but sometimes the only solution is to reboot.

I have also changed the disk to the new one and I have tried to install new version of Ubuntu there. The installation process freezed during formatting disk.

Typical kernel log:
 [ 1129.000045] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
 [ 1129.000101] ata3.00: failed command: WRITE DMA
 [ 1129.000137] ata3.00: cmd ca/00:08:4f:00:00/00:00:00:00:00/e0 tag 0 dma 4096 out
 [ 1129.000139] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
 [ 1129.000233] ata3.00: status: { DRDY }
 [ 1132.381799] [drm:drm_mode_getfb] *ERROR* invalid framebuffer id
 [ 1134.040024] ata3: link is slow to respond, please be patient (ready=0)
 [ 1139.024032] ata3: device not ready (errno=-16), forcing hardreset
 [ 1139.024041] ata3: soft resetting link
 [ 1139.204245] ata3.00: configured for UDMA/100
 [ 1139.204251] ata3.00: device reported invalid CHS sector 0

tags: added: apport-collected
Revision history for this message
phaphe (phaphe) wrote : apport information

AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Architecture: i386
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: roman 1144 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'ICH6'/'Intel ICH6 with AD1981B at irq 21'
   Mixer name : 'Analog Devices AD1981B'
   Components : 'AC97a:41445374'
   Controls : 34
   Simple ctrls : 23
DistroRelease: Ubuntu 10.04
HibernationDevice: RESUME=UUID=c00a73e8-9b83-40b4-9c68-56babeb9e3cc
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
MachineType: Hewlett-Packard HP Compaq dx6100 MT(DX439AV)
Package: linux (not installed)
ProcCmdLine: root=UUID=537c989d-77b2-4b16-85ee-cde728cd7b9b ro quiet splash
ProcEnviron:
 LANGUAGE=en_US:en
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-23.37-generic 2.6.32.15+drm33.5
Regression: Yes
RelatedPackageVersions: linux-firmware 1.34.1
Reproducible: Yes
RfKill:

Tags: lucid filesystem regression-release needs-upstream-testing
Uname: Linux 2.6.32-23-generic i686
UserGroups: adm admin audio cdrom dialout dip floppy fuse lpadmin plugdev sambashare video
dmi.bios.date: 09/13/2004
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: 786C1 v01.57
dmi.board.name: 0984h
dmi.board.vendor: Hewlett-Packard
dmi.chassis.asset.tag: CZC51116KV
dmi.chassis.type: 6
dmi.chassis.vendor: Hewlett-Packard
dmi.modalias: dmi:bvnHewlett-Packard:bvr786C1v01.57:bd09/13/2004:svnHewlett-Packard:pnHPCompaqdx6100MT(DX439AV):pvr:rvnHewlett-Packard:rn0984h:rvr:cvnHewlett-Packard:ct6:cvr:
dmi.product.name: HP Compaq dx6100 MT(DX439AV)
dmi.sys.vendor: Hewlett-Packard

Revision history for this message
phaphe (phaphe) wrote : AlsaDevices.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : AplayDevices.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : ArecordDevices.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : BootDmesg.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : Card0.Amixer.values.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : Card0.Codecs.codec97.0.ac97.0.0.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : Card0.Codecs.codec97.0.ac97.0.0.regs.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : CurrentDmesg.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : Lspci.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : Lsusb.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : PciMultimedia.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : ProcInterrupts.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : ProcModules.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : UdevDb.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : UdevLog.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : UserAsoundrc.txt

apport information

Revision history for this message
phaphe (phaphe) wrote : WifiSyslog.txt

apport information

tags: removed: needs-kernel-logs needs-upstream-testing
phaphe (phaphe)
Changed in linux (Ubuntu):
status: Incomplete → New
Revision history for this message
João Pinto (joaopinto) wrote :

I have tested with linux-image-2.6.36-020636-generic for maverick, the issue is still present.

Revision history for this message
João Pinto (joaopinto) wrote :

This is likely to be a duplicate of Bug #397096 .

Revision history for this message
João Pinto (joaopinto) wrote :

I gave a try to Fedora 13 and I did not experienced the problem, based on the fact that Fedora uses 2.6.33 I decided to test the 2.6.33-02063306-generic from the mainline kernel builds. I was not able to reproduce the problem yet, this is likely a bug introduced with 2.6.33+ .

Revision history for this message
Giuseppe Bottiglieri (giuseppe-bottiglieri) wrote :

Same problem here, Kubuntu 10.10
with kernel linux-image-2.6.32-25-generic and linux-image-2.6.35-22-generic

Revision history for this message
Matthew Morgan (lytithwyn) wrote :

I'm having a similar problem on Lucid 2.6.32-25-generic. Mine doesn't always happen; it's kind of random. I have the same symptoms though...a disk write freezes and I get dmesg output like the original post.

Is it possible that all this is related to a specific hd controller driver or a certain set of them? I noticed nobody seems to be posting what controller they're using. I'm using a sil3114 SATA controller that is handled by sata_sil.

Revision history for this message
Giuseppe Bottiglieri (giuseppe-bottiglieri) wrote :

This problem is also present in Kubuntu 10.10 CD Installation.

I tried to install linux-image-2.6.37-3-generic (from natty) but i get:
Generating grub.cfg ...
error: cannot read from `/dev/sda'.
..but i am using /dev/sda...

Anyway this is my error messages while booting:

 3465.904743] ata1.00: status: { DRDY ERR }
[ 3465.904746] ata1.00: error: { UNC }
[ 3465.912495] ata1: nv_mode_filter: 0x1f39f&0x1f39f->0x1f39f, BIOS=0x1f000 (0xc5c5c0c0) ACPI=0x1f01f (30:30:0x1f)
[ 3465.912503] ata1: nv_mode_filter: 0x1f39f&0x1f39f->0x1f39f, BIOS=0x1f000 (0xc5c5c0c0) ACPI=0x1f01f (30:30:0x1f)
[ 3465.928461] ata1.00: configured for UDMA/66
[ 3465.944521] ata1.01: configured for UDMA/66
[ 3465.944553] sd 0:0:0:0: [sda] Unhandled sense code
[ 3465.944557] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 3465.944562] sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
[ 3465.944568] Descriptor sense data with sense descriptors (in hex):
[ 3465.944571] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 3465.944581] 00 00 00 4a
[ 3465.944585] sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
[ 3465.944592] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 00 00 4a 00 00 05 00
[ 3465.944602] end_request: I/O error, dev sda, sector 74
[ 3465.944610] Buffer I/O error on device sda1, logical block 11
[ 3465.944615] Buffer I/O error on device sda1, logical block 12
[ 3465.944619] Buffer I/O error on device sda1, logical block 13
[ 3465.944622] Buffer I/O error on device sda1, logical block 14
[ 3465.944626] Buffer I/O error on device sda1, logical block 15
[ 3465.944651] ata1: EH complete
[ 3469.421881] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 3469.421889] ata1.00: BMDMA stat 0x64
[ 3469.421894] ata1.00: failed command: READ DMA
[ 3469.421903] ata1.00: cmd c8/00:05:4a:00:00/00:00:00:00:00/e0 tag 0 dma 2560 in
[ 3469.421904] res 51/40:00:4a:00:00/00:00:00:00:00/e0 Emask 0x9 (media error)
[ 3469.421908] ata1.00: status: { DRDY ERR }
[ 3469.421911] ata1.00: error: { UNC }

@Matthew: how can i see what controller i am using?

Revision history for this message
Matthew Morgan (lytithwyn) wrote :

@Giuseppe: The command `lspci` should give you an idea. For instance, here's my lspci output:

00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub Interface (rev 02)
00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller (rev 02)
00:06.0 System peripheral: Intel Corporation 82865G/PE/P Processor to I/O Memory Interface (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation G73 [GeForce 7300 GT] (rev a2)
02:01.0 Multimedia video controller: Internext Compression Inc iTVC16 (CX23416) MPEG-2 Encoder (rev 01)
02:02.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
02:03.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
02:05.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 07)
02:05.1 Input device controller: Creative Labs SB Live! Game Port (rev 07)

See the line in there about my Silicon image SATA controller? Even if you can't pick out which line is your hard drive controller (some onboard ones may not be as obvious) it might help to post your lspci output so others can see what you have. I don't know anything about kernel development, so I was justing guessing about the controller driver being a possible issue; however, I'm sure any information we can provide for those who know what they're doing will be of assistance.

Revision history for this message
João Pinto (joaopinto) wrote :

Hello,
on my case I have found that I had a faulty controller, had to replace the motherboard (with the integrated marvell SATA3 controller), for some reason the fault was not being triggered by older kernels and Windows7.

Revision history for this message
Giuseppe Bottiglieri (giuseppe-bottiglieri) wrote :

Solved!
lspci didn't give me information about controller... so i installed gparted to check better my partion table. gpparted starts and get some time to detecting the device... then i saw and remembered that i have two swaps, but one of them is a swapoff (and it was /dev/sda1 where i get the error message while booting) so i deleted it, and gparted detected again the device but this time very fast... so i reboot the system and in about 15seconds i am logged in kubuntu Maverick 10.10 :-P

If it can help i am using kernel version: 2.6.36-020636-generic

So my problem was a swap partition, i deleted it and everything is ok now :-)
Good luck

Revision history for this message
Lepe (alepe-com) wrote :
Download full text (4.1 KiB)

Suddenly I was watching a movie (reading from the HDD) and the HDD start to make strange noises. Then in few seconds was almost frozen. I ran "dmesg" and the same kind of errors were shown. After few more seconds I got the "Kernel Panic" message (and it frozen).
In less than a week it have happened 3 times. I had swap files (not partition) in the problematic HDD. So I moved them to other drive. That seems to reduce the problem, but this morning, while I was playing a DVD, the HDD start to make those noises again. So I turn it off immediately. Then I thought it may be related to the controllers or a kernel related bug.

System: Kubuntu 10.04.1 - Linux 2.6.32-26-generic x86_64. 1GB RAM

By the way, I think these bugs may be related:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/591532 (Kubuntu, 2.6.31-22-generic)
https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/652059 (Ubuntu 2.6.32-25-generic)
https://bugs.launchpad.net/ubuntu/+source/kernel-package/+bug/555787 (Ubuntu 10.04 2.6.32-18.27-generic)

Output of lspci:

00:00.0 Host bridge: ATI Technologies Inc RS480 Host Bridge (rev 10)
00:02.0 PCI bridge: ATI Technologies Inc RS480 PCI-X Root Port
00:06.0 PCI bridge: ATI Technologies Inc RS480 PCI Bridge
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 PCI bridge: ALi Corporation M5249 HTT to PCI Bridge
00:1c.0 USB Controller: ALi Corporation USB 1.1 Controller (rev 03)
00:1c.1 USB Controller: ALi Corporation USB 1.1 Controller (rev 03)
00:1c.2 USB Controller: ALi Corporation USB 1.1 Controller (rev 03)
00:1c.3 USB Controller: ALi Corporation USB 2.0 Controller (rev 01)
00:1d.0 Audio device: ALi Corporation High Definition Audio/AC'97 Host Controller
00:1e.0 ISA bridge: ALi Corporation PCI to LPC Controller (rev 31)
00:1e.1 Bridge: ALi Corporation M7101 Power Management Controller [PMU]
00:1f.0 IDE interface: ALi Corporation M5229 IDE (rev c7)
00:1f.1 RAID bus controller: ALi Corporation ULi 5287 SATA (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation G84 [GeForce 8600 GT] (rev a1)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 19)
03:12.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev 80)

Part of these morning logs:

Dec 3 09:03:50 kernel: [ 4576.040099] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 3 09:03:50 kernel: [ 4576.040111] ata3.00: failed command: READ DMA
Dec 3 09:03:50 kernel: [ 4576.040125] ata3.00: cmd c8/00:08:4f:0e:fc/00:00:00:00:00/e4 tag 0 dma 4096 in
Dec 3 09:03:50 kernel: [ 4576.040127] res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Dec 3 09:03:50 kernel: [ 4576.040134] ata3.00: status: { DRDY }
Dec 3 09:03:55 kernel: [ 4581.410040] ata3: link is slow to respond, please be patient (ready=0)
Dec 3 09:04:00 ker...

Read more...

Revision history for this message
Lepe (alepe-com) wrote :

I tried the "2.6.35-22-generic" kernel this morning and the problem was still present (same behavior, same errors). Now it is happening almost every day. I'm wondering if my HDD could be failing?

Revision history for this message
Matthew Morgan (lytithwyn) wrote :

@Lepe
That's what is so difficult about this bug. Some people may be getting these message because of a hardware failure. You're getting messages similar to mine, but I have never gotten a kernel panic because of it. Also, I've run extensive tests on my hard drive and while HDD tests are by no means definitive, I think the fact that I've also had absolutely no data corruption or loss is enough to assure me that my drive is fine.

Revision history for this message
Lepe (alepe-com) wrote :

I downgraded the kernel to 2.6.32-24-generic and so far I have had no problems even I left the computer turned on for more than a day. Before it was happening within few hours. I'm almost sure I solved my problem temporally. if someone else could confirm the same behavior, then it could be easier to see which differences were made in the kernel between those versions.

Revision history for this message
Xavier Cauwe (xcauwe) wrote :

I was getting those errors for some time.
However, these apparently became fatal once I passed to kernel 2.6.35

So the only way to still boot up for me is to use kernel 2.6.32-25 (which was the previous one installed)

Revision history for this message
Matthew Morgan (lytithwyn) wrote :

@Lepe
I just downgraded to 2.6.32-24-generic and I'll report as soon as I can. Thanks!

Revision history for this message
Matthew Morgan (lytithwyn) wrote :

So far so good with 2.6.32-24. The problem usually occurs for me during lots of hard drive writes, so I let dd create a 2 GB file of of output from /dev/zero a couple of times, and then once or twice I used /dev/urandom for input. I haven't seen the problem yet. I'll use my computer normally and post if anything changes.

Revision history for this message
Matthew Morgan (lytithwyn) wrote :

Nope...still getting the errors with 2.6.32-24. I just got one about an hour ago.

Revision history for this message
Matthew Morgan (lytithwyn) wrote :

Taking Lepe's lead, I am now trying out a 2.6.32-26 kernel built from stock sources straight from kernel.org. I built it using the config from /boot/config-2.6.32-26-generic and the ubuntu kernel-package tools. This way, we can see if perhaps an Ubuntu kernel patch causes the error. I'll post my findings as soon as I know something.

Revision history for this message
Matthew Morgan (lytithwyn) wrote :

I just got another error with the stock kernel sources. It isn't something introduced by an Ubuntu patch.

I did some more google searching, and came across this:
http://www.google.com/search?q=site:kerneltrap.org+%22failed+command%3A+READ+DMA%22

It looks like similar problems have been cropping up for various people over several different kernel versions. I don't really know what to do next.

Revision history for this message
Lepe (alepe-com) wrote :
Download full text (7.6 KiB)

I tested using 2.6.32-25 and it failed after (about) 2 hours.

These are some of the errors reported:

# Before this point everything was working normal....
Dec 11 22:02:54 kernel: [10420.040102] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 11 22:02:54 kernel: [10420.040113] ata3.00: failed command: READ DMA
Dec 11 22:02:54 kernel: [10420.040127] ata3.00: cmd c8/00:08:4f:f5:ee/00:00:00:00:00/e0 tag 0 dma 4096 in
Dec 11 22:02:54 kernel: [10420.040130] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 11 22:02:54 kernel: [10420.040136] ata3.00: status: { DRDY }
Dec 11 22:03:00 kernel: [10425.410030] ata3: link is slow to respond, please be patient (ready=0)
Dec 11 22:03:04 kernel: [10430.090047] ata3: device not ready (errno=-16), forcing hardreset
Dec 11 22:03:04 kernel: [10430.090061] ata3: soft resetting link
Dec 11 22:03:10 kernel: [10435.290049] ata3: link is slow to respond, please be patient (ready=0)
Dec 11 22:03:14 kernel: [10440.150039] ata3: SRST failed (errno=-16)
Dec 11 22:03:14 kernel: [10440.150054] ata3: soft resetting link
Dec 11 22:03:20 kernel: [10445.350038] ata3: link is slow to respond, please be patient (ready=0)
Dec 11 22:03:25 kernel: [10450.210046] ata3: SRST failed (errno=-16)
Dec 11 22:03:25 kernel: [10450.210061] ata3: soft resetting link
Dec 11 22:03:30 kernel: [10455.410047] ata3: link is slow to respond, please be patient (ready=0)
Dec 11 22:04:00 kernel: [10485.230039] ata3: SRST failed (errno=-16)
Dec 11 22:04:00 kernel: [10485.230053] ata3: limiting SATA link speed to 1.5 Gbps
Dec 11 22:04:00 kernel: [10485.230064] ata3: soft resetting link
Dec 11 22:04:05 kernel: [10490.250053] ata3: SRST failed (errno=-16)
Dec 11 22:04:05 kernel: [10490.250062] ata3: reset failed, giving up
Dec 11 22:04:05 kernel: [10490.250068] ata3.00: disabled
Dec 11 22:04:05 kernel: [10490.250077] ata3.00: device reported invalid CHS sector 0
Dec 11 22:04:05 kernel: [10490.250096] ata3: EH complete

# This block was repeated many many times:
Dec 11 22:04:05 kernel: [10490.250127] sd 2:0:0:0: [sdc] Unhandled error code
Dec 11 22:04:05 kernel: [10490.250131] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 11 22:04:05 kernel: [10490.250138] sd 2:0:0:0: [sdc] CDB: Read(10): 28 00 00 ee f5 4f 00 00 08 00
Dec 11 22:04:05 kernel: [10490.250153] end_request: I/O error, dev sdc, sector 15660367

# This one may be a clue:
Dec 11 22:04:10 kernel: [10495.600172] Aborting journal on device sdc1.
Dec 11 22:04:10 kernel: [10495.600202] EXT3-fs error (device sdc1) in ext3_reserve_inode_write: Journal has aborted
Dec 11 22:04:10 kernel: [10495.600233] ------------[ cut here ]------------
Dec 11 22:04:10 kernel: [10495.600243] WARNING: at /build/buildd/linux-2.6.32/fs/buffer.c:1159 mark_buffer_dirty+0x7f/0xa0()
Dec 11 22:04:10 kernel: [10495.600246] Hardware name: System Product Name
Dec 11 22:04:10 kernel: [10495.600248] Modules linked in: binfmt_misc vboxnetadp vboxnetflt vboxdrv snd_mpu401 snd_mpu401_uart snd_hda_codec_realtek snd_hda
Dec 11 22:04:10 kernel: [10495.600289] Pid: 1590, comm: amule Tainted: P W 2.6.32-25-generic #45-Ubuntu
Dec 11 22:04:10 kernel: [10495.6...

Read more...

Revision history for this message
Lepe (alepe-com) wrote :

Bad news everyone... My assumptions about kernel 2.6.32-24 may be wrong... It just failed with the same kind of errors.
I will need to start again my research. Meanwhile... this is today's error report:

Dec 12 13:25:12 kernel: [16294.040083] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 12 13:25:12 kernel: [16294.040095] ata3.00: failed command: READ DMA
Dec 12 13:25:12 kernel: [16294.040108] ata3.00: cmd c8/00:08:df:01:20/00:00:00:00:00/e0 tag 0 dma 4096 in
Dec 12 13:25:12 kernel: [16294.040111] res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Dec 12 13:25:12 kernel: [16294.040117] ata3.00: status: { DRDY }
Dec 12 13:25:17 kernel: [16299.410040] ata3: link is slow to respond, please be patient (ready=0)
Dec 12 13:25:22 kernel: [16304.090043] ata3: device not ready (errno=-16), forcing hardreset
Dec 12 13:25:22 kernel: [16304.090057] ata3: soft resetting link
Dec 12 13:25:27 kernel: [16309.271751] ata3: link is slow to respond, please be patient (ready=0)
Dec 12 13:25:32 kernel: [16314.140114] ata3: SRST failed (errno=-16)
Dec 12 13:25:32 kernel: [16314.140383] ata3: soft resetting link
Dec 12 13:25:37 kernel: [16319.340133] ata3: link is slow to respond, please be patient (ready=0)
Dec 12 13:25:42 kernel: [16324.200032] ata3: SRST failed (errno=-16)
Dec 12 13:25:42 kernel: [16324.200301] ata3: soft resetting link

Revision history for this message
Ihor Ilkevych (iilkevych) wrote :

Hi all.

Looks my issue is similar.
i have

description: Motherboard
product: D102GGC2
vendor: Intel Corporation

with

description: PCI bridge
product: RS480 PCI Bridge
vendor: ATI Technologies Inc

In 10.04 i copied anything and there were no freezes. Now i've update to 10.10 and X is freezing each time i copy some big file (1G or bigger).

I searched for some simple PCI-Express video and it is not expensive. But i hope Ubuntu will fixe this SH*T for me)

Please let me know if some additional info is necessary and please do not post solutions that could not be reverted to current state.

Thanks in advance,
Ihor Ilkevych

Revision history for this message
Ihor Ilkevych (iilkevych) wrote :

ah.. i also tested with other GV-N220D2-1GI video card - NO FREEZES!

Revision history for this message
Lepe (alepe-com) wrote :

@Ihor : what kind of "freeze" are you commenting about? I have a NVidia Video Card and sincerely (and with all respect) I don't see a connection between the HDD having that kind of behavior and the video card (perhaps there is and I just don't know?).

To add more information to my case, it seems to be happening only in one specific drive (I have 3). Coincidentally that drive was 99% full! (like one month ago). That drive is mounted in /home/ (I use other drive for / ). If I remember correctly I formated the drive as ext3 with "-m 0". I wonder if there is some connection with the present issue... However, my drive is now 90% full and still is having that problem (described before). I hope it can point to some direction.

Revision history for this message
Mauricio (mruibal) wrote :

Hello people!!! I had similar issues with every hard disk that I plugged to my motherboard Asrock 939A8X-M. When the disk was intensively used the system was frozen and a black screen with the mouse pointer is usually obtained. This problem happened using Lucid and Maverick but not using Karmic, with 2.6.31-22 kernel (32 or 64 bits) for instance. then I downgraded the whole system to karmic and since this I am happily working without problems. The freeze happened to me using both PATA or SATA disks, it doesn't depend of whatever kind or brand of hard disks. It seems to be a bug in the new kernel drivers after 2.6.32 with the motherboards Asrock 939A8X-M. My processor is an AMD X2 4200+, and the bug was happened in 32 and in 64 bits of Lucid and Maverick.
I hope that this will be solved in Natty...
With bests regards.

Revision history for this message
Alexei Colin (alexei.colin) wrote :

Hardware: Norhtec MicroClient JrMX (an i582 SoC based on Xcore86) with HD hooked up to SATA
Kernel: compiled from source 2.6.36 #7 Wed Dec 8 22:05:03 EST 2010 i586 GNU/Linux
Distro: Ubuntu 10.04
Symptoms: rare (once in tens of hours) freezes for arbitrary amounts of time during significant network and hard-drive activity

Suspecting this might be a HD problem, I switched to a spare HD (dd'ed the file system) only to find that the problem persists. Until today's miracle I didn't even have any output from my headless box, but today the box unfroze after almost *4 days* of being in the frozen state (I have logs that prove it, in addition to observations). First thing I did after I noticed that it came alive was dmesg and there was the error cited above in this bug report [ attached ].

I vaguely recollect that the freezes started with 10.04, although there were other problems before (network interface dying) [which prompted me to compile the kernels with hopes of newer network driver fixing things]. I will test with an earlier kernel and report back -- I apologize for not doing so before posting: wanted to get another vote out to this bug sooner rather than later.

lspci [useless? the only possibly relevant entries are these, but there exists an SATA interface... ]:
00:00.0 Host bridge: RDC Semiconductor, Inc. R6021 Host Bridge (rev 02)
00:07.0 ISA bridge: RDC Semiconductor, Inc. Device 6036
00:0c.0 IDE interface: RDC Semiconductor, Inc. Device 1011 (rev 01)

Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
SilverWave (silverwave) wrote :
Download full text (4.0 KiB)

Linux 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

lspci
00:00.0 Host bridge: Intel Corporation 82975X Memory Controller Hub (rev c0)
00:01.0 PCI bridge: Intel Corporation 82975X PCI Express Root Port (rev c0)
00:1b.0 Audio device: Intel Corporation N10/ICH 7 Family High Definition Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 1 (rev 01)
00:1c.1 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 2 (rev 01)
00:1c.2 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 3 (rev 01)
00:1c.3 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 4 (rev 01)
00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 5 (rev 01)
00:1d.0 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation N10/ICH 7 Family USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01)
00:1f.2 IDE interface: Intel Corporation N10/ICH7 Family SATA IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation N10/ICH 7 Family SMBus Controller (rev 01)
01:00.0 VGA compatible controller: nVidia Corporation G84 [GeForce 8600 GTS] (rev a1)
03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01)
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)
05:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01)
06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)
07:02.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)

syslog
May 25 07:45:54 iron kernel: [555579.040101] ata3.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
May 25 07:45:54 iron kernel: [555579.040106] ata3.01: failed command: READ DMA
May 25 07:45:54 iron kernel: [555579.040114] ata3.01: cmd c8/00:08:a7:06:01/00:00:00:00:00/f4 tag 0 dma 4096 in
May 25 07:45:54 iron kernel: [555579.040115] res 40/00:00:00:4f:c2/00:00:00:00:00/10 Emask 0x4 (timeout)
May 25 07:45:54 iron kernel: [555579.040119] ata3.01: status: { DRDY }
May 25 07:45:54 iron kernel: [555579.040129] ata3: soft resetting link
May 25 07:45:54 iron kernel: [555579.301002] ata3.00: configured for UDMA/133
May 25 07:45:54 iron kernel: [555579.341252] ata3.01: configured for UDMA/133
May 25 07:45:54 iron kernel: [555579.341261] ata3.01: device reported invalid CHS sector 0
May 25 07:45:54 iron kernel: [555579.341270] ata3: EH complete
May 25 07:46:25 iron kernel: [555610.000072] ata3...

Read more...

Revision history for this message
SilverWave (silverwave) wrote :

Just an update to advise that I have solved my issue by:

Replacing the SATA cables.
Resetting all power connectors.

Setting the bios SATA to enhanced from auto.

Adding a IDE delay of 5 seconds to bios set-up.

HW Details:
AW9D-MAX (Intel i975-ICH7), BIOS 6.00 PG 04/18/2007
Quad Core
8G RAM
3 x 500GB HD.
PIONEER DVD-RW DVR-112D, 1.21, max UDMA/66
WDC WD5000AAKS-22TMA0, 12.01C01, max UDMA/133

Revision history for this message
Alex Filonov (afilonov) wrote :
Download full text (3.8 KiB)

I had a similar problem after upgrade to 2.6.32-32-generic. Boot time became huge, after boot system behaved more or less OK. But after first suspend/resume root filesystem was remounted ro. Review of the log shows the following errors during boot:

May 30 14:03:45 gazelle kernel: [75102.492333] ata1.00: exception Emask 0x0 SAct
 0x0 SErr 0x0 action 0x0
May 30 14:03:45 gazelle kernel: [75102.492342] ata1.00: BMDMA stat 0x4
May 30 14:03:45 gazelle kernel: [75102.492350] ata1.00: failed command: READ DMA
May 30 14:03:45 gazelle kernel: [75102.492364] ata1.00: cmd c8/00:08:4f:42:d5/00
:00:00:00:00/e4 tag 0 dma 4096 in
May 30 14:03:45 gazelle kernel: [75102.492367] res 51/40:07:56:42:d5/00
:00:00:00:00/e4 Emask 0x9 (media error)
May 30 14:03:45 gazelle kernel: [75102.492374] ata1.00: status: { DRDY ERR }
May 30 14:03:45 gazelle kernel: [75102.492379] ata1.00: error: { UNC }
May 30 14:03:45 gazelle kernel: [75102.509587] ata1.00: configured for UDMA/133
May 30 14:03:45 gazelle kernel: [75102.509606] ata1: EH complete
May 30 14:03:50 gazelle kernel: [75106.750764] ata1.00: exception Emask 0x0 SAct
 0x0 SErr 0x0 action 0x0
May 30 14:03:50 gazelle kernel: [75106.750773] ata1.00: BMDMA stat 0x4
May 30 14:03:50 gazelle kernel: [75106.750781] ata1.00: failed command: READ DMA
May 30 14:03:50 gazelle kernel: [75106.750795] ata1.00: cmd c8/00:08:4f:42:d5/00
:00:00:00:00/e4 tag 0 dma 4096 in

After suspend/resume I've got multiple errors with DMA controller. Worse yet, it managed to screw up disk partition, so I needed about 30 min recovery before I could reboot.

Switching back to kernel 2.6.32-31-generic completely fixed this problem.

Don't know if anybody is working on this bug. My computer is System76 Gazelle Performance (GAZP2). Here's lspci output:

root@gazelle:/tmp# lspci
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03)
00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express PCI Express Root Port (rev 03)
00:1b.0 Audio device: Intel Corporation N10/ICH 7 Family High Definition Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 2 (rev 02)
00:1d.0 USB Controller: Intel Corporation N10/ICH7 Family USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation N10/ICH 7 Family USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation N10/ICH 7 Family SMBus Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation G72M [Quadro NVS 110M/GeForce Go 7300] (rev a1)
03:00...

Read more...

Revision history for this message
Alex Filonov (afilonov) wrote :

Looks like this problem exists with 2.6.32-31-generic kernel as well, just doesn't happen that often. It happens under heavy disk usage only.

Revision history for this message
Alex Filonov (afilonov) wrote :

I have to disappoint all who complained here. Looks like it's not a bug, but a feature. My disk was dying for a long time, and new kernel just made it clear to me. After replacement, I have no problems with kernel 2.6.32-32-generic. Probably new drivers allow for less rereads and have stricter DMA wait limits.

Revision history for this message
Matthew Morgan (lytithwyn) wrote :

My disk eventually failed a month or so ago. Whether it had already started to fail 7 months ago, I have no way of knowing. One would hope that if a disk was reading poorly enough to hang that often that it would have pushed it over one of the SMART thresholds. I think I'd run several SMART tests on it over that first month that I had the issue and never had a failure.

Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

I had similar dmesg errors with a very old sis-based motherboard, both in UDMA and in PIO modes.
These made the system unbootable with recent kernels.
But with an old knoppix CD the hard disk was working fine (tried with dd if=/dev/sda of=/dev/null) with no dmesg errors or delays at all.
So I started trying old Ubuntu CDs and found that 7.10 didn't have the problem, while 8.10 did.

Then I read in http://patchwork.ozlabs.org/patch/113668/ about libata.force=mwdma2.
And now my system is working fine with that kernel parameter in all recent Ubuntu versions.

So it's possible that my problem was a bug in pata_sis, and I'm mentioning that kernel parameter in case others would benefit from it.

Revision history for this message
penalvch (penalvch) wrote :

Pēteris Krišjānis, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command in the development release from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux <replace-with-bug-number>

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please do not test the kernel in the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. As well, please comment on which kernel version specifically you tested.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream', and comment as to why specifically you were unable to test it.

Please let us know your results. Thanks in advance.

tags: added: needs-upstream-testing
Changed in linux (Ubuntu):
importance: Undecided → Low
status: Confirmed → Incomplete
tags: removed: apport-collected
tags: added: needs-kernel-logs
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.