libvirt fails to start correctly because LVM is not ready

Bug #893450 reported by Horacio
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Expired
High
Unassigned
udev (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Some times, one of the KVM guest failed to start. I've 3 guest, two started OK and one failed (the two times the same has failed). After rebooting the host two times, the KVM guest started OK, but this is a server and found too risky this behavior. The host server is running "Ubuntu 11.04 Server 64bits" and libvirt 0.8.8-1ubuntu6.5

The two times that failed, I've found this on syslog:

error : virSecurityDACSetOwnership:125 : unable to set user and group to '105:115' on '/dev/vg_default/lv_robot-pv0': No such file or directory
kernel: [ 200.354543] type=1400 audit(1321932141.068:11): apparmor="DENIED" operation="open" parent=2539 profile="/usr/lib/libvirt/virt-aa-helper" name="/dev/dm-7" pid=2638 comm="virt-aa-helper" requested_mask="r" denied_mask="r" fsuid=0 ouid=105
kernel: [ 200.692255] type=1400 audit(1321932141.408:12): apparmor="STATUS" operation="profile_load" name="libvirt-7262201f-566b-d0b1-16ec-0b404ccd5336" pid=2639 comm="apparmor_parser"
libvirtd: 00:22:21.424: 2539: error : virSecurityDACRestoreSecurityFileLabel:143 : cannot resolve symlink /dev/vg_default/lv_robot-pv0: No such file or directory
libvirtd: 00:22:21.707: 2539: error : qemuAutostartDomain:275 : Failed to autostart VM 'robot': unable to set user and group to '105:115' on '/dev/vg_default/lv_robot-pv0': No such file or directory

The LVM device is on a software raid. Maybe this is taking too long to come up?

FYI, running "aa-status" (after the reboots) gives me:
apparmor module is loaded.
9 profiles are loaded.
9 profiles are in enforce mode.
   /sbin/dhclient
   /usr/lib/NetworkManager/nm-dhcp-client.action
   /usr/lib/connman/scripts/dhclient-script
   /usr/lib/libvirt/virt-aa-helper
   /usr/sbin/libvirtd
   /usr/sbin/tcpdump
   libvirt-7262201f-566b-d0b1-16ec-0b404ccd5336
   libvirt-c032ea0a-8c62-7730-fb4d-e1bf60c15a31
   libvirt-f06ad419-f312-f002-444f-3e51f40d2291
0 profiles are in complain mode.
4 processes have profiles defined.
4 processes are in enforce mode :
   /usr/sbin/libvirtd (2459)
   libvirt-7262201f-566b-d0b1-16ec-0b404ccd5336 (2521)
   libvirt-c032ea0a-8c62-7730-fb4d-e1bf60c15a31 (2551)
   libvirt-f06ad419-f312-f002-444f-3e51f40d2291 (2582)
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.

Revision history for this message
Horacio (hgdeoro) wrote :
Changed in libvirt (Ubuntu):
importance: Undecided → High
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for submitting this bug. I'm marking it high priority as it fits 'severe impact on a small number of users'.

The problem appears to be that the logical volume is slow in coming up, such that libvirt starts before the udev creates its device node. Working around this in your particular case should be pretty easy, but a general fix will be harder.

To work around it in your case, please try adding the following to your /etc/init/libvirt-bin.conf file, at the end of the 'pre-start script' section:

   while [ ! -e /dev/vg_default/lv_robot-pv0 ]; do
      sleep 1s
   done

So, for instance, the pre-start script section would probably become:

pre-start script
 [ -r /etc/default/libvirt-bin ] && . /etc/default/libvirt-bin
 [ ! "x$start_libvirtd" = "xyes" ] && { stop; exit 0; }
 mkdir -p /var/run/libvirt
 # Clean up a pidfile that might be left around
 rm -f /var/run/libvirtd.pid
 date >> /var/log/libvirt-upstart
 echo $$ >> /var/log/libvirt-upstart
 cat /proc/self/cmdline >> /var/log/libvirt-upstart
 cat /proc/self/cgroup >> /var/log/libvirt-upstart
 while [ ! -e /dev/vg_default/lv_robot-pv0 ]; do
  sleep 1s
 done
end script

Please let us know if that does not work around your problem for now.

Changed in libvirt (Ubuntu):
status: New → Incomplete
Revision history for this message
Horacio (hgdeoro) wrote :
Download full text (3.9 KiB)

Thank you very much for the response! I was able to do some more test, and something strange happened.
I made the changes to /etc/init/libvirt-bin.conf and added some sort of "timeout":

ls -lh /dev/vg_default | logger -t libvirt-bin-start-script
for a_device in $LVM_DEVS_TO_CHECK ; do
    echo "Device $a_device" | logger -t libvirt-bin-start-script
    iter=0
    while [ ! -e $a_device -a $iter -lt 5 ]; do
 echo " + Not found: $a_device" | logger -t libvirt-bin-start-script
 ls -lh /dev/vg_default | logger -t libvirt-bin-start-script
 sleep 1s
 iter=$(( $iter + 1))
    done
done

One of the KVM guest didn't started. Searching in syslog, I found:

Nov 23 01:18:36 web4 kernel: [ 33.760027] br1: port 1(eth1) entering forwarding state
Nov 23 01:18:38 web4 kernel: [ 36.120089] br0: port 1(eth0) entering forwarding state
Nov 23 01:18:49 web4 ntpdate[1847]: no server suitable for synchronization found
Nov 23 01:21:20 web4 init: udevtrigger post-stop process (404) terminated with status 1
Nov 23 01:21:20 web4 libvirt-bin-start-script: .................. \
Nov 23 01:21:20 web4 libvirt-bin-start-script: .................. | output of "logger", with the contents of /dev/vg_default
Nov 23 01:21:20 web4 libvirt-bin-start-script: .................. /
Nov 23 01:21:21 web4 init: udev-fallback-graphics main process (2531) terminated with status 1
Nov 23 01:21:21 web4 init: plymouth-splash main process (2563) terminated with status 1
Nov 23 01:21:21 web4 udevd[366]: worker [393] unexpectedly returned with status 0x0100
Nov 23 01:21:21 web4 udevd[366]: worker [393] failed while handling '/devices/virtual/block/dm-2'
Nov 23 01:21:21 web4 udevd[366]: worker [388] unexpectedly returned with status 0x0100
Nov 23 01:21:21 web4 udevd[366]: worker [388] failed while handling '/devices/virtual/block/md1'
Nov 23 01:21:21 web4 libvirt-bin-start-script: ......................
Nov 23 01:21:24 web4 libvirt-bin-start-script: last message repeated 2 times
Nov 23 01:21:24 web4 udevd[366]: worker [376] unexpectedly returned with status 0x0100
Nov 23 01:21:24 web4 udevd[366]: worker [376] failed while handling '/devices/virtual/block/dm-8'
Nov 23 01:21:25 web4 libvirt-bin-start-script: ......................
Nov 23 01:21:26 web4 libvirt-bin-start-script: ......................
Nov 23 01:21:26 web4 libvirt-bin-start-script: ......................

The output of "logger" show me that 4 logical volumes were missing, and after waiting for 20 seconds, the scripts continued, started libvirt and one of the guest didn't started up.
Those devices were missing on /dev/mapper/ an /dev/vg_default. But doing a "lvs" worked OK and the missing devices where shown. So, the devices where found and recognized by LVM, but the links are missing.

So, I've changed the script to wait for 120 seconds for each device and rebooted. This time I got more time to see what's going on. Checking the syslog I got the same errors of "udevtrigger" and "udevd". 4 devices were missing again, so it took 8 minutes and then stop checking.

While this 8 minutes, the devices links didn't "appear". "lvs" list the devices OK, but no link were on /dev. I know this happened because of all the 'ls -lh /dev/vg_default'...

Read more...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Actually based on your syslog excerpts, this sounds to me like you might have bad storage. In particular:

Nov 23 01:21:24 web4 udevd[366]: worker [376] unexpectedly returned with status 0x0100
Nov 23 01:21:24 web4 udevd[366]: worker [376] failed while handling '/devices/virtual/block/dm-8'

(and same for dm-1)

Can you describe your storage setup? Are you using for the lvm volumes? multipath?

Also, to get us a bit more information, could you do

apport-collect 893450

?

Changed in libvirt (Ubuntu):
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Horacio (hgdeoro) wrote :
Download full text (5.3 KiB)

apport-collect didn't worked (the server has not direct internet connection).

-------------------------
lspci
-------------------------

00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01)
05:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 02)

-------------------------
some of the dmesg output
-------------------------

[ 1.152651] Fusion MPT base driver 3.04.18
[ 1.152707] Copyright (c) 1999-2008 LSI Corporation
[ 1.157884] async_tx: api initialized (async)
[ 1.170812] Fusion MPT SAS Host driver 3.04.18
[ 1.170911] mptsas 0000:05:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 1.171196] mptbase: ioc0: Initiating bringup
[ 1.330050] raid6: int64x1 1469 MB/s
[ 1.390018] Refined TSC clocksource calibration: 1866.732 MHz.
[ 1.390078] Switching to clocksource tsc
[ 1.500025] raid6: int64x2 1967 MB/s
[ 1.670042] raid6: int64x4 1418 MB/s
[ 1.840049] raid6: int64x8 1323 MB/s
[ 2.010011] raid6: sse2x1 3152 MB/s
[ 2.180022] raid6: sse2x2 3804 MB/s
[ 2.350016] raid6: sse2x4 5600 MB/s
[ 2.350070] raid6: using algorithm sse2x4 (5600 MB/s)
[ 2.351541] xor: automatically using best checksumming function: generic_sse
[ 2.400007] generic_sse: 6836.000 MB/sec
[ 2.400062] xor: using function: generic_sse (6836.000 MB/sec)
[ 2.405934] md: raid6 personality registered for level 6
[ 2.405994] md: raid5 personality registered for level 5
[ 2.406051] md: raid4 personality registered for level 4
[ 2.413276] md: raid10 personality registered for level 10
[ 3.670031] ioc0: LSISAS1064E B1: Capabilities={Initiator}
[ 3.670248] mptsas 0000:05:00.0: setting latency timer to 64
[ 9.833435] scsi2 : ioc0: LSISAS1064E B1, FwRev=01100000h, Ports=1, MaxQ=511, IRQ=16
[ 9.857778] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 0, phy 0, sas_addr 0x455d3742e2bbac8f
[ 9.866008] scsi 2:0:0:0: Direct-Access ATA WDC WD20EARS-00M AB51 PQ: 0 ANSI: 5
[ 9.867787] sd 2:0:0:0: Attached scsi generic sg1 type 0
[ 9.868136] sd 2:0:0:0: [sda] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[ 9.869384] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 1, phy 1, sas_addr 0x455d393cddc2ad95
[ 9.877777] scsi 2:0:1:0: Direct-Access ATA WDC WD20EARS-00M AB51 PQ: 0 ANSI: 5
[ 9.879545] sd 2:0:1:0: Attached scsi generic sg2 type 0
[ 9.879901] sd 2:0:1:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[ 9.886711] sd 2:0:0:0: [sda] Write Protect is off
[ 9.886771] sd 2:0:0:0: [sda] Mode Sense: 73 00 00 08
[ 9.894839] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 9.897900] sd 2:0:1:0: [sdb] Write Protect is off
[ 9.897959] sd 2:0:1:0: [sdb] Mode Sense: 73 00 00 08
[ 9.905889] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 9.976991] sda: sda1 sda2 sda3 sda4
[ 9.988416] sdb: sdb1 sdb2 sdb3 sdb4
[ 10.003907] sd 2:0:0:0: [sda] Attached SCSI disk
[ 10.015535] sd 2:0:1:0: [sdb] Attached SCSI disk
[ 10.511656] md: bind<sda4>
[ 10.52...

Read more...

summary: - KVM guest fails to autostart sometimes with
- virSecurityDACRestoreSecurityFileLabel error
+ libvirt fails to start correctly because LVM is not ready
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi,

to find out more, we need the udev debug output. Could you please edit /usr/share/initramfs-tools/scripts/init-top/udev so that the line starting udevd reads:

/sbin/udevd --daemon --debug --resolve-names=never > /dev/.initramfs/udev.out 2>&1

then run

  sudo update-initramfs -k all -u

After a reboot, you'll find the debug info in /dev/.initramfs/udev.out. Please attach that here.

Please also give us the output of the commands:

  sudo pvscan
  sudo vgscan
  sudo lvscan

Changed in udev (Ubuntu):
status: New → Incomplete
MR_JOC (joc-jpg)
Changed in udev (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Please don't change the state from incomplete until the requested information has been provided.

Changed in udev (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Horacio (hgdeoro) wrote :

This is the output of the lvm scan commands. I'll ask permission to reboot the server tonight to post the debug output of udev.

web4:~# sudo pvscan
  PV /dev/md1 VG vg_default lvm2 [1.81 TiB / 1.33 TiB free]
  Total: 1 [1.81 TiB] / in use: 1 [1.81 TiB] / in no VG: 0 [0 ]

web4:~# sudo vgscan
  Reading all physical volumes. This may take a while...
  Found volume group "vg_default" using metadata type lvm2

web4:~# sudo lvscan
  ACTIVE '/dev/vg_default/lv_desa_web_2-root' [5.00 GiB] inherit
  ACTIVE '/dev/vg_default/lv_desa_web_2-swap' [1.00 GiB] inherit
  ACTIVE '/dev/vg_default/lv_desa_web_2-pv0' [150.00 GiB] inherit
  ACTIVE '/dev/vg_default/lv_rsync' [200.00 GiB] inherit
  ACTIVE '/dev/vg_default/lv_cms2-pv0' [10.00 GiB] inherit
  ACTIVE '/dev/vg_default/lv_cms2-root' [4.00 GiB] inherit
  ACTIVE '/dev/vg_default/lv_jaime-root' [5.00 GiB] inherit
  ACTIVE '/dev/vg_default/lv_jaime-pv0' [10.00 GiB] inherit
  ACTIVE '/dev/vg_default/lv_monitor-swap' [1.00 GiB] inherit
  ACTIVE '/dev/vg_default/lv_monitor-root' [5.00 GiB] inherit
  ACTIVE '/dev/vg_default/lv_monitor-pv0' [50.00 GiB] inherit
  ACTIVE '/dev/vg_default/lv_monitor-var' [47.00 GiB] inherit

Revision history for this message
Horacio (hgdeoro) wrote :

I've attached /dev/.initramfs/udev.out. I don'n know if this is relevant, but the write performance on the disks is horrible, I think this could be caused by mis-aligment of partitions ("dd if=/dev/zero of=zeros bs=64b oflag=dsync" gives me around 400kB/s)

Changed in libvirt (Ubuntu):
status: Incomplete → New
Changed in udev (Ubuntu):
status: Incomplete → New
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Sure enough, lv_robot-pv0 (the backing lv for the vm) doesn't show up either in the lvscan or udev debug output.

Certainly a lvm failure here.

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

In re-examining the evidence and chatting with Serge a few things come to mind.

1) There's a disparity between the conclusion formed in comment #3 and the latest
udev logs. It could easily be true that the udev logging simply was held long enough
for that last LV to be activated. It appears to be true that if we wait long enough
(approx 10 mins) that all LVs will come online.

2) The crux is udev being starved of events or is it being stalled by some other action?

3) Comment #9 compelled me to dig around concerning the backing store's to the
RAID 1 software set. The WD20EARS appear to use 4K sectors, which if true
would lend credence to *slowness* being the factor (udev is starved) which
determines why it takes almost 10 mins to discover all the LVs.

Follow up:
1) verify sector size of backing stores by installing sg3-utils and posting the result of
 # sg_readcap /dev/sda
and
 # cat /sys/block/sda/queue/physical_block_size

2) the basic dd test is fine, please re-run with a sweep of blocksizes, 512, 4k, 8k, 1024k
with "count=100" so it doesn't run forever

3) an additional io benchmark would be informative like iozone or iometer, whatever
is your preference.

Conclusion:
If this is an alignment issue then we've got a lot of work to do. You might be able
to resize/grow the MD array to change the stripe size to be a multiple of the sector
size. Though you would need to carry the exercise all the way through, including LVM
metadata offsets. It would be optimal if you had a duplicate set of drives to reproduce
this issue on a separate system. Thanks.

Changed in libvirt (Ubuntu):
status: New → Incomplete
Changed in udev (Ubuntu):
status: New → Incomplete
Revision history for this message
Horacio (hgdeoro) wrote :

Ok, I've added seagate disks, manually created partitions (aligned to 1mb), added the new discs to the raid, remove the wd discs. The raid is now over two seagate disks, and the performance is still crap.

# hdparm -i /dev/sda
 Model=ST2000DL003-9VT166, FwRev=CC32, SerialNo=6YD0RYED
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=3907029168

I've tryied to make the partition aligned to 1mb, maybe I made the partitions wrong.. These are the partitions on /dev/sda:

# sfdisk -s -d /dev/sda
1953514584
# partition table of /dev/sda
unit: sectors

/dev/sda1 : start= 2048, size= 19531776, Id=fd
/dev/sda2 : start= 19533824, size= 9762816, Id=83
/dev/sda3 : start= 29296640, size=3877730304, Id=fd
/dev/sda4 : start= 0, size= 0, Id= 0

Please, tell me if the partitions aren't aligned. I think I done this right, multiples of 2048 (1mb).

If the partitions are right, the problem aren't the disks, neither raid, lvm nor the fs, since I've tested with dd over /dev/sda2 (plain partition, no raid, no lvm, no filesystem):

# dd if=/dev/zero of=/dev/sda2 count=1000 oflag=dsync bs=16384
16384000 bytes (16 MB) copied, 12.5848 s, 1.3 MB/s

# dd if=/dev/zero of=/dev/sda2 count=1000 oflag=dsync bs=16384
16384000 bytes (16 MB) copied, 14.4387 s, 1.1 MB/s

With raid, lvm and ext4, the performance is:

# dd if=/dev/zero of=zeros count=1000 oflag=dsync bs=512
512000 bytes (512 kB) copied, 19.4532 s, 26.3 kB/s

# dd if=/dev/zero of=zeros count=1000 oflag=dsync bs=1024
1024000 bytes (1.0 MB) copied, 22.373 s, 45.8 kB/s

# dd if=/dev/zero of=zeros count=1000 oflag=dsync bs=2048
2048000 bytes (2.0 MB) copied, 28.7906 s, 71.1 kB/s

# dd if=/dev/zero of=zeros count=1000 oflag=dsync bs=4096
4096000 bytes (4.1 MB) copied, 68.0617 s, 60.2 kB/s

# dd if=/dev/zero of=zeros count=1000 oflag=dsync bs=8192
8192000 bytes (8.2 MB) copied, 61.7608 s, 133 kB/s

# dd if=/dev/zero of=zeros count=1000 oflag=dsync bs=16384
16384000 bytes (16 MB) copied, 67.1279 s, 244 kB/s

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

Worked with the user on irc and ruled out screaming interrupts
and the LSI card itself (he removed it). User is upgrading to 3.x
kernel and will apprise us of his progress here.

Revision history for this message
Horacio (hgdeoro) wrote :

I've made more test with the guidance of Peter Petrakis, I've physically removed the LSI card and tryied with a newer kernel (3.0.0-15) without luck... the performance is the same (read ok, but slow writes).

Revision history for this message
Horacio (hgdeoro) wrote :

Some comments about performance:
- reading works well with big and small block sizes
- writing with big block size works ok
- the problem arise when writing with small block size

----------

Reading works ok with small and large block size:

reading blocks of 1KiB from sda2

# /usr/bin/time dd if=/dev/sda2 of=/dev/null bs=1024 count=$(( 1024*32 ))
32768+0 records in
32768+0 records out
33554432 bytes (34 MB) copied, 0.48956 s, 68.5 MB/s
0.00user 0.07system 0:00.49elapsed 14%CPU (0avgtext+0avgdata 3552maxresident)k
66016inputs+0outputs (0major+268minor)pagefaults 0swaps

# /usr/bin/time dd if=/dev/sda2 of=/dev/null bs=1024 count=$(( 1024*1024 ))
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 8.74106 s, 123 MB/s
0.08user 2.23system 0:08.74elapsed 26%CPU (0avgtext+0avgdata 3536maxresident)k
2097632inputs+0outputs (0major+268minor)pagefaults 0swaps

reading blocks of 1MiB from sdb2 (to ensure that nothing is cached)

# /usr/bin/time dd if=/dev/sdb2 of=/dev/null bs=$((1024*1024)) count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 8.76866 s, 122 MB/s
0.00user 1.82system 0:08.77elapsed 20%CPU (0avgtext+0avgdata 7616maxresident)k
2097408inputs+0outputs (0major+522minor)pagefaults 0swaps

Now, writing with blocks of 16MiB works ok (64.5 MB/s!!!)

# /usr/bin/time dd if=/dev/zero of=/mnt/rsync/zeros bs=$((1024*1024*16)) count=128 oflag=dsync
128+0 records in
128+0 records out
2147483648 bytes (2.1 GB) copied, 33.2825 s, 64.5 MB/s
0.00user 6.69system 0:33.37elapsed 20%CPU (0avgtext+0avgdata 69072maxresident)k
1464inputs+4195328outputs (0major+4363minor)pagefaults 0swaps

With 1MiB the performance is bad (12.1 MB/s)

# /usr/bin/time dd if=/dev/zero of=/mnt/rsync/zeros bs=$((1024*1024)) count=256 oflag=dsync
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 22.1709 s, 12.1 MB/s
0.00user 0.97system 0:22.17elapsed 4%CPU (0avgtext+0avgdata 7616maxresident)k
40inputs+524288outputs (1major+521minor)pagefaults 0swaps

With 1KiB I got.. 33.3 kB/s!!! (I pressed Ctrl+C after some time)

# /usr/bin/time dd if=/dev/zero of=/mnt/rsync/zeros bs=1024 count=$(( 1024*32 )) oflag=dsync
(pressed Ctrl+C)
^C9069+0 records in
9069+0 records out
9286656 bytes (9.3 MB) copied, 279.276 s, 33.3 kB/s
Command terminated by signal 2
0.00user 1.30system 4:39.90elapsed 0%CPU (0avgtext+0avgdata 3504maxresident)k
0inputs+72552outputs (0major+265minor)pagefaults 0swaps

Revision history for this message
Horacio (hgdeoro) wrote :

And some iozone tests, with 2 child process, O_DIRECT and SYNC Mode (full log on pastebin: http://pastebin.com/amTbjfdr).

All with:
        O_DIRECT feature enabled
        SYNC Mode.
        Output is in Kbytes/sec
        Min process = 2
        Max process = 2
        Throughput test with 2 processes

----- Record size 1k

        Children see throughput for 2 initial writers = 11.50 KB/sec
        Avg throughput per process = 5.75 KB/sec

        Children see throughput for 2 rewriters = 12.45 KB/sec
        Avg throughput per process = 6.22 KB/sec

----- Record size 8k

        Children see throughput for 2 initial writers = 100.18 KB/sec
        Avg throughput per process = 50.09 KB/sec

        Children see throughput for 2 rewriters = 75.27 KB/sec
        Avg throughput per process = 37.64 KB/sec

----- Record size 16k

        Children see throughput for 2 initial writers = 277.43 KB/sec
        Avg throughput per process = 138.71 KB/sec

        Children see throughput for 2 rewriters = 224.98 KB/sec
        Avg throughput per process = 112.49 KB/sec

----- Record size 32k

        Children see throughput for 2 initial writers = 398.72 KB/sec
        Avg throughput per process = 199.36 KB/sec

        Children see throughput for 2 rewriters = 204.58 KB/sec
        Avg throughput per process = 102.29 KB/sec

----- Record size 64k

        Children see throughput for 2 initial writers = 406.26 KB/sec
        Avg throughput per process = 203.13 KB/sec

        Children see throughput for 2 rewriters = 447.78 KB/sec
        Avg throughput per process = 223.89 KB/sec

----- Record size 128k

        Children see throughput for 2 initial writers = 1711.68 KB/sec
        Avg throughput per process = 855.84 KB/sec

        Children see throughput for 2 rewriters = 1563.11 KB/sec
        Avg throughput per process = 781.56 KB/sec

----- Record size 256k

        Children see throughput for 2 initial writers = 3400.94 KB/sec
        Avg throughput per process = 1700.47 KB/sec

        Children see throughput for 2 rewriters = 2532.04 KB/sec
        Avg throughput per process = 1266.02 KB/sec

Revision history for this message
Horacio (hgdeoro) wrote :

The same tests, now on a ext4 filesystem over plain partition (no RAID)

----- Record size 1k

        Children see throughput for 2 initial writers = 34.09 KB/sec
        Avg throughput per process = 17.04 KB/sec

        Children see throughput for 2 rewriters = 25.68 KB/sec
        Avg throughput per process = 12.84 KB/sec

----- Record size 8k

        Children see throughput for 2 initial writers = 195.30 KB/sec
        Avg throughput per process = 97.65 KB/sec

        Children see throughput for 2 rewriters = 199.05 KB/sec
        Avg throughput per process = 99.52 KB/sec

----- Record size 128k

        Children see throughput for 2 initial writers = 3248.37 KB/sec
        Avg throughput per process = 1624.19 KB/sec

        Children see throughput for 2 rewriters = 3197.46 KB/sec
        Avg throughput per process = 1598.73 KB/sec

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

Are these ext4 test results done at the bare metal level or from the guest? With the
guests themselves, are they using the raw volume for the image or do they exist as an
image file on the filesystem. If they're images files, then what fs type do they live on?
Thanks.

Revision history for this message
Horacio (hgdeoro) wrote :

All the tests where ran at the bare metal level.
The results on comment 16 where ran over a ext4 over lvm over raid1.
The results on comment 17 where ran over a ext4 over a plain partition.

I don't know how bad are those numbers... Maybe it's normal that working with 1k blocks kill's the disk performance?

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

The disk sector size and how it's managed really determine how well it will
perform under various access sizes. In your case remember, the disk sector
size is 4k, so we would hope that Linux is "doing the right thing" in coalescing
these 1k accesses into chunks which get the most out of your hard disks.

That's not our primary problem here, the primary problem here is LVM
having trouble bringing all LVs online. These performance issues are
in addition to that. The LVM problem appears to have been addressed
by adding a root delay so that's that.

We have yet to pin down where any of them have come from. That this
problem appeared after an update is suspicious, yet iirc you've shown
that rolling kernels back and forth have no net impact on the symptoms.

If an update was really the root of all of this you should be able to
install natty to a separate set of disks, re-run these tests and see dramatically
different numbers for the 1K test runs.

We are now left with the actual configuration of your entire stack, the
loads generated by your VMs, and how they might be affecting your
IO subsystem.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for libvirt (Ubuntu) because there has been no activity for 60 days.]

Changed in libvirt (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for udev (Ubuntu) because there has been no activity for 60 days.]

Changed in udev (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.