Persistent laptop drive clicking in kernels since 2006
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fedora |
Invalid
|
Undecided
|
Unassigned | ||
linux (Ubuntu) |
Expired
|
Undecided
|
Unassigned |
Bug Description
Several laptops with various versions of Ubuntu all exhibit the same behaviour (but not the same as #104535, #59695):
- the drive clicks every few seconds, starting only after being powered on for some time.
- Nothing I can see is accessing the disk, despite attempting to log it
- no commands / nothing I can do will stop it
- it repeatably seems to be severely shortening the life of the drives, I have lost three drives in the past 11 months.
I have been a desktop (and laptop) user since 2001, and have noticed this problem ONLY since mid-late 2006. I have observed in Ubuntu Dapper, Edgy, and Feisty. I have also observed it in Fedora (again, only in later kernels, such as those found in FC6 or F7).
Other sources I have read about on the web have also observed this in kernels only since a certain release (2.6.x, as I recall).
I have lost AT LEAST THREE drives (different brands, Hitachi, IBM, and Fujitsu) since 2006, and the drive in my current laptop is now going (see below).
My company is an Ubuntu partner, and we are supposed to know these things, yet I have not been able to make progress on this.
Note that this is NOT the problem listed in either of: #104535, #59695 - because:
- the Load_Cycle_Count is steady at 4073 and has not changed / does not change.
- the hdparm -B 255 /dev/hda and hdparm -M 128 /dev/hda commands have NO EFFECT on the clicking
- the drive light comes on when clicking, for about 3/4 second each time - seemingly indicating that it is the OS hitting the drive.
Following are some repeated smartctl reports done in rapid successions, showing the load_ctl constant, but other errors increasing:
- - - - -
root@joe-
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 046 Pre-fail Always - 179858
2 Throughput_
3 Spin_Up_Time 0x0003 100 100 025 Pre-fail Always - 1
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 616
5 Reallocated_
7 Seek_Error_Rate 0x000f 100 100 047 Pre-fail Always - 711
8 Seek_Time_
9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 17743618
10 Spin_Retry_Count 0x0013 100 100 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 590
192 Power-Off_
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 4073
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 47 (Lifetime Min/Max 18/60)
195 Hardware_
196 Reallocated_
197 Current_
198 Offline_
199 UDMA_CRC_
200 Multi_Zone_
203 Run_Out_Cancel 0x0002 100 100 000 Old_age Always - 3732294007280
root@joe-
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 046 Pre-fail Always - 179858
2 Throughput_
3 Spin_Up_Time 0x0003 100 100 025 Pre-fail Always - 1
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 616
5 Reallocated_
7 Seek_Error_Rate 0x000f 100 100 047 Pre-fail Always - 715
8 Seek_Time_
9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 17743625
10 Spin_Retry_Count 0x0013 100 100 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 590
192 Power-Off_
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 4073
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 47 (Lifetime Min/Max 18/60)
195 Hardware_
196 Reallocated_
197 Current_
198 Offline_
199 UDMA_CRC_
200 Multi_Zone_
203 Run_Out_Cancel 0x0002 100 100 000 Old_age Always - 3732301281760
root@joe-
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 046 Pre-fail Always - 182257
2 Throughput_
3 Spin_Up_Time 0x0003 100 100 025 Pre-fail Always - 1
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 616
5 Reallocated_
7 Seek_Error_Rate 0x000f 100 100 047 Pre-fail Always - 931
8 Seek_Time_
9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 17743669
10 Spin_Retry_Count 0x0013 100 100 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 590
192 Power-Off_
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 4073
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 47 (Lifetime Min/Max 18/60)
195 Hardware_
196 Reallocated_
197 Current_
198 Offline_
199 UDMA_CRC_
200 Multi_Zone_
203 Run_Out_Cancel 0x0002 100 100 000 Old_age Always - 3732298463925
root@joe-
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 046 Pre-fail Always - 89181
2 Throughput_
3 Spin_Up_Time 0x0003 100 100 025 Pre-fail Always - 1
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 616
5 Reallocated_
7 Seek_Error_Rate 0x000f 100 100 047 Pre-fail Always - 2076
8 Seek_Time_
9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 17744511
10 Spin_Retry_Count 0x0013 100 100 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 590
192 Power-Off_
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 4073
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 46 (Lifetime Min/Max 18/60)
195 Hardware_
196 Reallocated_
197 Current_
198 Offline_
199 UDMA_CRC_
200 Multi_Zone_
203 Run_Out_Cancel 0x0002 100 100 000 Old_age Always - 3732295973444
root@joe-
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 046 Pre-fail Always - 104954
2 Throughput_
3 Spin_Up_Time 0x0003 100 100 025 Pre-fail Always - 1
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 616
5 Reallocated_
7 Seek_Error_Rate 0x000f 100 100 047 Pre-fail Always - 2848
8 Seek_Time_
9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 17745379
10 Spin_Retry_Count 0x0013 100 100 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 590
192 Power-Off_
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 4073
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 46 (Lifetime Min/Max 18/60)
195 Hardware_
196 Reallocated_
197 Current_
198 Offline_
199 UDMA_CRC_
200 Multi_Zone_
203 Run_Out_Cancel 0x0002 100 100 000 Old_age Always - 3732306590247
root@joe-
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://
=== START OF INFORMATION SECTION ===
Device Model: FUJITSU MHV2100AH
Serial Number: NT60T6B28E2T
Firmware Version: 000000A0
User Capacity: 100,030,242,816 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 4a
Local Time is: Fri Nov 23 08:41:58 2007 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
Self-test execution status: ( 0) The previous self-test routine completed
Total time to complete Offline
data collection: ( 544) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
SMART capabilities: (0x0003) Saves SMART data before entering
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 69) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 046 Pre-fail Always - 104954
2 Throughput_
3 Spin_Up_Time 0x0003 100 100 025 Pre-fail Always - 1
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 616
5 Reallocated_
7 Seek_Error_Rate 0x000f 100 100 047 Pre-fail Always - 2850
8 Seek_Time_
9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 17745385
10 Spin_Retry_Count 0x0013 100 100 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 590
192 Power-Off_
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 4073
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 46 (Lifetime Min/Max 18/60)
195 Hardware_
196 Reallocated_
197 Current_
198 Offline_
199 UDMA_CRC_
200 Multi_Zone_
203 Run_Out_Cancel 0x0002 100 100 000 Old_age Always - 3732287650371
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
root@joe-
Just in case, have you tried hdparm -B 254 /dev/hda instead of 255, ie just in case the reported load cycle count of 4073 is not a correct representation? I think the last figure in each line is left up to the drive manufacturer to define (eg one of my drive's "temperature celcius" readings from smartctl reports as 906887219).
Buried way deep in bug 59695, someone pointed out that 255 isn't an officially defined setting. 255 doesn't make any difference to load cycles frequency on my drive, but 254 sure does.