Using dd to dupliate an entire drive with bs=128K gives read error at 137GB

Bug #157425 reported by culturespy
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Nominated for Hardy by didier

Bug Description

I use the following command as part of a script to back up my laptop to tape:

dd if=/dev/sda bs=128k

(For testing purposes, it doesn't matter where the output goes. The error is reproducible even if of=/dev/null.)

I recently put in a new drive, a Hitachi TravelStar (kernel sees: Hitachi HTS722020K9SA00), and installed Gutsy on it. Now when I run my backup, it stops with an IO error at 137GB.

smartctrl said that the drive was healthy, but that there had been errors, so I replaced the drive with another identical drive by using a similar command to the above, but with a block size of 512. That operation completed successfully, but was rather slow even over USB2.

After swapping drives and rebooting, I ran the backup script again, and got the same result.

dmesg output after the error (sorry if this is mangled, it had to be copied through email):

[ 9839.496000] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 9839.496000] ata1.00: (BMDMA stat 0x25)
[ 9839.496000] ata1.00: cmd c8/00:f8:08:ff:ff/00:00:00:00:00/ef tag 0
cdb 0x0 data 126976 in
[ 9839.496000] res 51/10:f8:08:ff:ff/00:00:00:00:00/ef Emask
0x81 (invalid argument)
[ 9839.520000] ata1.00: configured for UDMA/133
[ 9839.520000] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE,SUGGEST_OK
[ 9839.520000] sd 0:0:0:0: [sda] Sense Key : Aborted Command [current]
[descriptor]
[ 9839.520000] Descriptor sense data with sense descriptors (in hex):
[ 9839.520000] 72 0b 14 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 9839.520000] 0f ff ff 08
[ 9839.520000] sd 0:0:0:0: [sda] Add. Sense: Recorded entity not found
[ 9839.520000] end_request: I/O error, dev sda, sector 268435208
[ 9839.520000] Buffer I/O error on device sda, logical block 33554401
[ 9839.520000] Buffer I/O error on device sda, logical block 33554402
[ 9839.520000] Buffer I/O error on device sda, logical block 33554403
[ 9839.520000] Buffer I/O error on device sda, logical block 33554404
[ 9839.520000] Buffer I/O error on device sda, logical block 33554405
[ 9839.520000] Buffer I/O error on device sda, logical block 33554406
[ 9839.520000] Buffer I/O error on device sda, logical block 33554407
[ 9839.520000] Buffer I/O error on device sda, logical block 33554408
[ 9839.520000] Buffer I/O error on device sda, logical block 33554409
[ 9839.520000] Buffer I/O error on device sda, logical block 33554410
[ 9839.520000] ata1: EH complete
[ 9839.532000] sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB)
[ 9839.544000] sd 0:0:0:0: [sda] Write Protect is off
[ 9839.544000] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 9839.568000] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA

......

Additional info:

- The machine is a Dell Precision M90.

- NVidia driver is loaded. If anyone believes this could be an issue I will unload it and try again.

- After the error, the machine becomes somewhat sluggish.

- Another part of my backup script fills the remaining space on the drive with zeros. That part never has problems:

dd if=/dev/zero of=/tmp/bigfile

- During one test, dmesg gave a different sector number. I neglected to make a note of it, but it was roughly a hundred sectors past where the error usually happens.

- I updated the machine's BIOS to A07 (latest I could locate), and it had no effect.

- I run BOINC but it was paused during this activity.

Revision history for this message
Emmet Hikory (persia) wrote :

Interesting.. I wouldn't have expected an issue until block 33554432 (32 bits of 1k chunks taken 128 at a time). Just to confirm, the two blocksizes tried are 128 kilobytes and 512 bytes, rather than 512 kilobytes, correct?

Revision history for this message
culturespy (mikek-meyertool-deactivatedaccount) wrote :

Yes, 128 kilobytes, vs 512 bytes.

128K fails at 137GB every time, regardless of destination.

512 bytes was successful copying both to /dev/null during testing and to the second disk attached to USB2 when I was replacing the drive.

137GB seems suspicious to me due to its proximity to the 28 bit boundary, as if this were an LBA problem.

Revision history for this message
James Collier (james-collier412) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. You reported this bug a while ago and there hasn't been any activity in it recently. We were wondering is this still an issue for you? Can you try with latest Ubuntu release? Thanks in advance.

Revision history for this message
culturespy (mikek-meyertool-deactivatedaccount) wrote : Re: [Bug 157425] Re: Using dd to dupliate an entire drive with bs=128K gives read error at 137GB

James Collier wrote:
> Thank you for taking the time to report this bug and helping to make
> Ubuntu better. You reported this bug a while ago and there hasn't been
> any activity in it recently. We were wondering is this still an issue
> for you? Can you try with latest Ubuntu release? Thanks in advance.
>
> ** Changed in: linux (Ubuntu)
> Sourcepackagename: None => linux
> Status: New => Incomplete
>

I am on Hardy now. I will try to reproduce the error later. At some
point, I came up with an even better test procedure. I'll post that as well.

--
Michael Kinney
Network Manager
Meyer Tool, Inc.
3064 Colerain Ave.
Cincinnati, OH 45225
desk: 513.853.4454
cell: 513.259.9190
aim: softwarelovesme

If this document contains information which is designated subject to EAR
and/or ITAR export control by the U.S. Government, it should not be
transferred within the U.S. to any foreign national, or abroad without a
valid export license from the U.S. government, or license exemption is
obtained/available from the United States Department of State. If you
receive this e-mail in error, please so notify the sender and delete or
destroy the material from any media.

Revision history for this message
didier (did447-deactivatedaccount) wrote :

>[ 9839.496000] ata1.00: cmd c8/00:f8:08:ff:ff/00:00:00:00:00/ef tag 0
                                                           ^----------^

Get this one with Hardy too:
ata1.00: cmd ca/00:e0:20:ff:ff/00:00:00:00:00/ef tag 0 dma 114688 out

It's the libata LBA28/LBA48 off-by-one bug:
fixed in 2.6.27-rc6 commit 97b697a11b07e2ebfa69c488132596cc5eb24119

A one line patch.

Changed in linux:
status: Incomplete → Confirmed
Revision history for this message
Imre Gergely (cemc) wrote :

I've just ran into this problem with my Samsung 1TB drive. I managed to reproduce the problem following this:

http://www.gossamer-threads.com/lists/linux/kernel/985985?page=last

"dd if=/dev/sdc bs=512 count=1 skip=268435455 > /dev/null "

Two years have passed since this bugreport was reported, I can't believe nobody ran into this, especially with todays 1TB+ harddrives.

Revision history for this message
Imre Gergely (cemc) wrote :

I've patched the latest kernel from Hardy with this one-liner, and tried on my server. Seems to be working and no other problems so far.
If anybody would like to test it, you can enable my testing PPA (https://launchpad.net/~cemc/+archive/ppa) and install the patched kernel from there.

Revision history for this message
Imre Gergely (cemc) wrote :

Could this one-liner be included in the next security update?

Revision history for this message
penalvch (penalvch) wrote :

Original Reporter account deactivated.

tags: added: gutsy needs-upstream-testing
Changed in linux (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.