[Disk tests/storage_device_sdx] Test can not be completed.

Bug #1215778 reported by Gabriel Zhi Chen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Checkbox
Fix Released
High
Daniel Manrique

Bug Description

I have try to run 'Disk test/storage_device_sdx' on target machine (Lenovo V4400 Coors-2-SIT) three times, and the test case can not be completed all times. Every time I run the case at night and check test result in the next day morning. The case remains running. Typically, the case should run and complete not more than 2 hours.

The issue interferes with my normal testing, In order to complete the remaining tests, I had to cancel 'Disk tests/storage_device_sdx' test.

In 'Disk tests/storage_device_sdx' test execution two hours later, I run 'vmstat 1' and verify that CPU is in the idle state. I guess 'Disk tests/storage_device_sdx' test case should be executed completely, but there are some causes made that the test case could not stop and move to next case.

There are system hardware info, system logs and vmstat log in attachments. Please refer to them.

Related branches

Revision history for this message
Gabriel Zhi Chen (gabrielzchen) wrote :
Revision history for this message
Gabriel Zhi Chen (gabrielzchen) wrote :
Revision history for this message
Gabriel Zhi Chen (gabrielzchen) wrote :
Daniel Manrique (roadmr)
Changed in checkbox:
status: New → Incomplete
Revision history for this message
Daniel Manrique (roadmr) wrote :

Hi, I notice checkbox is creating two job definitions for you, one for the hard disk and another for the SSD (hard disk is sda, ssd is sdb).

The definitions are as follows:

plugin: shell
name: disk/storage_device_sda
user: root
requires:
 device.path == "/devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:0/4:0:0:0
"
 block_device.sda_state != 'removable'
description: Disk I/O stress test for ST500LM012 HN-M500MBB
command: storage_test sda

plugin: shell
name: disk/storage_device_sdb
user: root
requires:
 device.path == "/devices/pci0000:00/0000:00:1f.2/ata6/host5/target5:0:0/5:0:0:0"
 block_device.sdb_state != 'removable'
description: Disk I/O stress test for LITEONIT LSS-24L6G
command: storage_test sdb

Now, to see which one is stalling, could you please manually run them? please run:

sudo /usr/share/checkbox/scriptc/storage_test sda
sudo /usr/share/checkbox/scriptc/storage_test sdb

and post the output of each command separately here.

This is an example of what you should see for each disk (and the run took 6 minutes on a 2011-era Dell Latitude 6220 with 320 GB SATA hard disk, so I wouldn't expect it to take hours on your hardware):

$ time sudo /usr/share/checkbox/scripts/storage_test sda
/dev/sda is a block device
/dev/sda reports a size of 320GB.
Running bonnie++ on /dev/sda...
Putting scratch disk at /
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
201101-7178 7664M 891 96 97508 5 43649 4 5912 92 124238 8 198.0 6
Latency 12637us 199ms 696ms 9345us 122ms 510ms
Version 1.96 ------Sequential Create------ --------Random Create--------
201101-7178 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency 319us 287us 736us 338us 13us 429us
1.96,1.96,201101-7178,1,1377292573,7664M,,891,96,97508,5,43649,4,5912,92,124238,8,198.0,6,16,,,,,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,12637us,199ms,696ms,9345us,122ms,510ms,319us,287us,736us,338us,13us,429us

real 6m13.394s
user 0m1.384s
sys 0m22.385s

Revision history for this message
Gabriel Zhi Chen (gabrielzchen) wrote :

According to Jeffrey's advice, I open backboard of coors-2 and unplug SSD. Then I rerun the test case 'Disk tests/storage_device_sdx'. The issue has gone.

Revision history for this message
Daniel Manrique (roadmr) wrote :

Hi Gabriel!
OK, so we know the problem is most likely caused by the SSD. However, the test should still work with it, as SSDs start to become more common we want to ensure they are covered by our testing.

Now, that's "sdb", so if you have some time I would still appreciate if you could run the test only on that drive and post the results.

The command to use would be:

sudo /usr/share/checkbox/scriptc/storage_test sdb

Thanks!

Revision history for this message
Gabriel Zhi Chen (gabrielzchen) wrote :
Download full text (3.4 KiB)

@Daniel,

As you said, I run the command :

$ sudo /usr/share/checkbox/scripts/storage_test sdb
/dev/sdb is a block device

Actually, the script has been executed and blocked. Waiting about 30 minutes, the output does not print anymore, it is still blocked here.

I checked the script 'storage_test':

------------------------------------------------------------------
  1 #!/bin/bash
  2
  3 # take the path of the storage device and test is it a block device.
  4
  5 function run_bonnie() {
  6 echo "Running bonnie++ on $1..."
  7
  8 # Determine where to put the scratchdisk
  9 mount_point=$(df -h | grep $1 | awk '{print $6}')
 10 echo "Putting scratch disk at $mount_point"
 11 mkdir -p "$mount_point/tmp/scratchdir"
 12 bonnie++ -d "$mount_point/tmp/scratchdir" -u root
 13 }
 14
 15 disk=/dev/$1
 16
 17 if [ -b $disk ]
 18 then
 19 echo "$disk is a block device"
 20 size=`parted -l | grep $disk | awk '{print $3}'`
 21
 22 if [ -n "$size" ]
 23 then
 24 echo "$disk reports a size of $size."
 25 # Have to account for the end of the size descriptor
 26 size_range=${size:(-2)}
 27
 28 if [ $size_range == "KB" ]
 29 then
 30 echo "$disk is too small to be functioning."
 31 exit 1
 32 elif [ $size_range == "MB" ]
 33 then
 34 size_int=${size::${#size}-2}
 35
 36 if [ $size_int -gt 10 ]
 37 then
 38 run_bonnie $disk
 39 else
 40 echo "$disk is too small to be functioning."
 41 exit 1
 42 fi
 43 else
 44 run_bonnie $disk
 45 fi
 46 else
 47 echo "$disk doesn't report a size."
 48 exit 1
 49 fi
 50 else
 51 echo "$disk is not listed as a block device."
 52 exit 1
 53 fi
-----------------------------------------------------------------------

I try to execute command :

$sudo parted -l | grep sdb | awk 'print $3' in line No.20 of script, system does not reply anything.

Then I execute command :

************************************
$ sudo parted -l sdb
Model: ATA ST500LM012 HN-M5 (scsi)
Disk /dev/sda: 500GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number Start End Size File system Name Flags
 1 17.4kB 50.0MB 50.0MB fat32 EFI System Partition boot
 2 50.0MB 1602MB 1552MB fat32 Recovery Partition
 3 1602MB 21.6GB 20.0GB ext4
 4 21.6GB 31.0GB 9403MB linux-swap(v1) primary
 5 31.0GB 500GB 469GB ext4 primary

Warning: /dev/sdb contains GPT signatures, indicating that it has a GPT table.
However, it does not have a valid fake msdos partition table, as it should.
Perhaps it was corrupted -- possibly by a program that doesn't understand GPT
partition tables. Or perhaps you deleted the GPT table, and are now using an
msdos partition table. Is this a GPT partition table?
Yes/No?
*************************************

So parted makes system enter a interface, it could not continue the rest of steps unless t...

Read more...

Revision history for this message
Daniel Manrique (roadmr) wrote :

Hi Gabriel,

Looks like you did all the triaging for me, thanks :)

Could you please modify your storage_test script, and change line 20 so that instead of just "parted -l", it reads "parted -l -s"? (don't change anything else in that line or the script).

Then try running the script again on sdb, and let me know if it works this time.

The -s switch to parted eliminates all user interaction so it will not stop to ask anything.

If this works, I will propose a merge request with a fix. But again, you did all the work so the credit really goes to you.

Thanks!

- Daniel

Daniel Manrique (roadmr)
Changed in checkbox:
importance: Undecided → High
tags: added: scripts
Revision history for this message
Gabriel Zhi Chen (gabrielzchen) wrote :

@Daniel,

It is my pleasure for triaging and thank you for your advice.

I changed the script here:
 **********************************************
  1 #!/bin/bash
  2
  3 # take the path of the storage device and test is it a block device.
  4
  5 function run_bonnie() {
  6 echo "Running bonnie++ on $1..."
  7
  8 # Determine where to put the scratchdisk
  9 mount_point=$(df -h | grep $1 | awk '{print $6}')
 10 echo "Putting scratch disk at $mount_point"
 11 mkdir -p "$mount_point/tmp/scratchdir"
 12 bonnie++ -d "$mount_point/tmp/scratchdir" -u root
 13 }
 14
 15 disk=/dev/$1
 16
 17 if [ -b $disk ]
 18 then
 19 echo "$disk is a block device"
 20 # size=`parted -l | grep $disk | awk '{print $3}'`
 21 size=`parted -l -s | grep $disk | awk '{print $3}'`
 22
 23 if [ -n "$size" ]
 24 then
 25 echo "$disk reports a size of $size."
 26 # Have to account for the end of the size descriptor
 27 size_range=${size:(-2)}
 28
 29 if [ $size_range == "KB" ]
 30 then
 31 echo "$disk is too small to be functioning."
 32 exit 1
 33 elif [ $size_range == "MB" ]
 34 then
 35 size_int=${size::${#size}-2}
 36
 37 if [ $size_int -gt 10 ]
 38 then
 39 run_bonnie $disk
 40 else
 41 echo "$disk is too small to be functioning."
 42 exit 1
 43 fi
 44 else
 45 run_bonnie $disk
 46 fi
 47 else
 48 echo "$disk doesn't report a size."
 49 exit 1
 50 fi
 51 else
 52 echo "$disk is not listed as a block device."
 53 exit 1
 54 fi
************************************************

Then I run the script:
Lenovo:~$ sudo /usr/share/checkbox/scripts/storage_test sdb
/dev/sdb is a block device
/dev/sdb reports a size of contains
24.0GB.
Running bonnie++ on /dev/sdb...
Putting scratch disk at
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...Can't write block.: No space left on device
Can't write block 894219.
Lenovo:~$

Base on the output information I guess that "parted -l -s" works this time.

The reason why system said 'No space left on device' , that is tracked by another bug <https://bugs.launchpad.net/checkbox/+bug/1217268>

Daniel Manrique (roadmr)
Changed in checkbox:
status: Incomplete → In Progress
assignee: nobody → Daniel Manrique (roadmr)
Ara Pulido (ara)
Changed in checkbox:
status: In Progress → Fix Committed
Daniel Manrique (roadmr)
Changed in checkbox:
milestone: none → 2013-sep-13
Daniel Manrique (roadmr)
Changed in checkbox:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.