write errors on virtual disc during install

Bug #511620 reported by Dave Gilbert
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
qemu-kvm (Ubuntu)
Fix Released
High
Dustin Kirkland 
Lucid
Fix Released
High
Dustin Kirkland 

Bug Description

Binary package hint: qemu-kvm

(This may be Bug #420423 but we're running a much newer kernel on lucid).

Host: Lucid 64bit uptodate as of 23rd January - i7-860 - 2.6.32-11-generic #15-Ubuntu
qemu-kvm: Version: 0.12.2-0ubuntu1

Guest: lucid alpha 2 64bit iso (having pressed update-installer on it on 23rd Jan).

Symptom: During install at a non-repeatable point the root filesystem of the guest becomes RO
and there are dmesg logs on the guest says that there were write errors on vda1. The block numbers
are different on repeated installs.

There are no errors on the hosts dmesg. The disc is backed by a lvm lv, and I've tried writing it with dd if=/dev/zero of=/dev/main/fiddle2disk bs=1024k and the whole disc writes OK.

Dave

Tags: patch

Related branches

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :
Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Two more data points:

1) This still happens with the host running todays 2.6.33 daily (2.6.33-999-generic #201001221151)
2) It works if I give the guest an emulated scsi disk rather than a virtio disk.

Revision history for this message
Dustin Kirkland  (kirkland) wrote : Re: [Bug 511620] Re: write errors on virtual disc during install

Hi Dave-

Does this look like your bug:
 * https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2941282&group_id=180599

I saw this reported upstream recently...

:-Dustin

Changed in qemu-kvm (Ubuntu):
status: New → Incomplete
importance: Undecided → Medium
Changed in qemu-kvm (Ubuntu Lucid):
milestone: none → ubuntu-10.04-beta-1
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Can you also attach your dmesg with the errors?

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

I agree, that bug you reference looks like the same one.

Including a dmesg from the guest, on the host in /var/log/libvirt/fiddle2.log it ends with:

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin XAUTHORITY=/home/dg/.Xauthority DISPLAY=:0.0 /usr/bin/kvm -S -M pc-0.11 -enable-kvm -m 2048 -smp 2 -name fiddle2 -uuid fd06659e-3354-cb8e-71d9-cfeeff86e60f -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/fiddle2.monitor,server,nowait -monitor chardev:monitor -boot d -drive file=/media/more/isos/ubuntu-lucid-alpha2-desktop-64.iso,if=ide,media=cdrom,index=2 -drive file=/dev/main/fiddle2disk,if=virtio,index=0 -net nic,macaddr=54:52:00:32:5a:59,vlan=0,model=virtio,name=virtio.0 -net tap,fd=46,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -vga cirrus -soundhw es1370
char device redirected to /dev/pts/4
dg@major:/var/log/libvirt/qemu$

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I have some more interesting things to report.

I'm using karmic libvirt and if I use virtual drive the lucid daily live installs fine but at reboot I get input/output errors on the virtual drive.

If I emulate the hard-disk as sda drive no input/output errors are present.

If you need further information I've kept all the VM's and iso to reproduce the errors I get.

Maybe it's problem with lucid not the qemu-kvm?

Changed in qemu-kvm (Ubuntu Lucid):
status: Incomplete → New
Thierry Carrez (ttx)
Changed in qemu-kvm (Ubuntu Lucid):
status: New → Confirmed
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Hi Dave-

I just uploaded qemu-kvm 0.12.3 to Lucid, which includes a number of storage fixes, according to the changelog. It would be great if you could give that a test run and report back!

description: updated
Changed in qemu-kvm (Ubuntu Lucid):
status: Confirmed → Incomplete
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Also, what's your backing disk format? Raw? QCOW2?

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Also, I understand this happens at random... Does it happen *every* time? Do you ever get an install to succeed completely?

Thierry Carrez (ttx)
Changed in qemu-kvm (Ubuntu Lucid):
milestone: ubuntu-10.04-beta-1 → none
Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Hi Dustin,
  Apologies for the delay (I mostly do stuff at the weekend).

I've just tried current Lucid with qemu-kvm 0.12.3-0ubuntu7and kernel 2.6.32-15-generic #22-Ubuntu and the
lucid-alpha3 desktop amd64 iso for the guest, and the problem is the same.

The backing disk is a raw lvm2 logical volume (/dev/mapper/main-fiddle2disk), it's in a normal volume group with
a single physical volume of /dev/sda3 which is a partition on the SATA disk.

The failure seems to be 100% repeatable - the block number it fails at is non-repeatable - I have tried this about 5 times now (once with this latest version, a few more times with previous versions) and have not had a successful installation. I have managed an installation using an emulate SCSI disk rather than virtio.

One thing I saw today which I don't believe I had previously seen is in the host dmesg I'm seeing some audit entries:

[10772.370241] type=1503 audit(1267889473.222:28): operation="capable" pid=5766 parent=1 profile="libvirt-fd06659e-3354-cb8e-71d9-cfeeff86e60f" name="sys_ptrace"
[10772.370528] type=1503 audit(1267889473.222:29): operation="capable" pid=5789 parent=1 profile="libvirt-fd06659e-3354-cb8e-71d9-cfeeff86e60f" name="sys_ptrace"
[10772.379636] type=1503 audit(1267889473.232:30): operation="capable" pid=5792 parent=1 profile="libvirt-fd06659e-3354-cb8e-71d9-cfeeff86e60f" name="sys_ptrace"
[10788.216465] type=1503 audit(1267889489.092:31): operation="capable" pid=5767 parent=1 profile="libvirt-fd06659e-3354-cb8e-71d9-cfeeff86e60f" name="sys_rawio"
[10788.247112] type=1503 audit(1267889489.122:32): operation="capable" pid=5766 parent=1 profile="libvirt-fd06659e-3354-cb8e-71d9-cfeeff86e60f" name="sys_rawio"
[10791.174155] type=1503 audit(1267889492.052:33): operation="capable" pid=5766 parent=1 profile="libvirt-fd06659e-3354-cb8e-71d9-cfeeff86e60f" name="sys_rawio"
[10791.429616] type=1503 audit(1267889492.312:34): operation="capable" pid=5767 parent=1 profile="libvirt-fd06659e-3354-cb8e-71d9-cfeeff86e60f" name="sys_rawio"
[10820.832664] type=1503 audit(1267889521.762:35): operation="capable" pid=5766 parent=1 profile="libvirt-fd06659e-3354-cb8e-71d9-cfeeff86e60f" name="sys_rawio"

but it's not clear to me that's related and you would have thought it would have failed earlier if it couldn't do any IO??

Todays failure happened about 5% through the install and the errors in the guest's dmesg were:

end_request: I/O error, dev vda, sector 25702959
Buffer I/O error on device vda1, logical block 3212862

SMART on the host disk still looks good ('No Errors logged').

Dave

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Jamie-

Can you confirm that these libvirt audit messages are benign?

Changed in qemu-kvm (Ubuntu Lucid):
status: Incomplete → Confirmed
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

"The backing disk is a raw lvm2 logical volume" ...

Interesting...

Out of curiosity, would you try with a qcow2 backing disk image? Just as another data point...

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Have you got a pointer to something on how to ask it to do qcow2? I've been using the libvirt manager and I have a choice of raw devices or file backed.

Dave

Revision history for this message
Bernhard Schmidt (berni) wrote :

Applying http://article.gmane.org/gmane.comp.emulators.kvm.devel/45983 (also applied upstream, see http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commit;h=e2a305fb13ff0f5cf6ff805555aaa90a5ed5954c) on top of 0.12.3-0ubuntu11 fixes this problem for me.

This patch is also referenced in the above mentioned sourceforge.net bug (https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2941282&group_id=180599)

Revision history for this message
Bernhard Schmidt (berni) wrote :

upstream patch

tags: added: patch
Changed in qemu-kvm (Ubuntu Lucid):
status: Confirmed → Triaged
importance: Medium → High
assignee: nobody → Dustin Kirkland (kirkland)
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

I just sent a build to my PPA.

If someone could test and confirm qemu-kvm_0.12.3-0ubuntu15~ppa1 when it's built at:
 * https://edge.launchpad.net/~kirkland/+archive/ppa
that would really help!

Thanks!

Changed in qemu-kvm (Ubuntu Lucid):
status: Triaged → In Progress
Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Hi Dustin,
  That seems to work - thanks Bernhard, Dustin ; just done a lucid3 install in it (64bit) - worked fine!

(Note I realised the test I did today has my host is currently running 2.6.33-999-generic #201002211003 from the daily kernel ppa rather than the standard one, I'll try and remember to redo this test using the standard kernel tomorrow - but I doubt this kernel helped).

Thanks for the great work!

Dave

root@major:~/tmp/qemu# dpkg -l |grep -i kvm
ii kvm 1:84+dfsg-0ubuntu16+0.12.3+0ubuntu15~ppa1 dummy transitional pacakge from kvm to qemu-kvm
ii qemu-kvm 0.12.3-0ubuntu15~ppa1 Full virtualization on i386 and amd64 hardware
root@major:~/tmp/qemu# dpkg -l |grep -i qemu
ii kvm 1:84+dfsg-0ubuntu16+0.12.3+0ubuntu15~ppa1 dummy transitional pacakge from kvm to qemu-kvm
ii qemu-common 0.12.3-0ubuntu15~ppa1 qemu bios roms
ii qemu-kvm 0.12.3-0ubuntu15~ppa1 Full virtualization on i386 and amd64 hardware

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Excellent, thanks for the testing.

I just uploaded to Lucid. However, it's awaiting approval in the
queue. Hopefully your confirmation here will help nudge that approval
along.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu-kvm - 0.12.3-0ubuntu15

---------------
qemu-kvm (0.12.3-0ubuntu15) lucid; urgency=low

  * debian/patches/block_avoid_creating_too_large_iovecs_in_multiwrite_merge.patch:
    - block: avoid creating too large iovecs in multiwrite_merge,
      fixes LP: #511620, cherry pick from upstream git
 -- Dustin Kirkland <email address hidden> Fri, 12 Mar 2010 13:30:30 -0600

Changed in qemu-kvm (Ubuntu Lucid):
status: In Progress → Fix Released
Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Just tested with the 0.12.3-0ubuntu15 on the 2.6.32-16-generic #25-Ubuntu host kernel. All looks good.

Thanks,

Dave

Revision history for this message
Robert Sander (gurubert) wrote :

Hi,

where do I get qemu-kvm (0.12.3-0ubuntu15)? I have a lucid amd64 KVM host with similar symptoms but the available qemu-kvm version is only 0.12.3+noroms-0ubuntu9.

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

0.12.3+noroms-0ubuntu9 is much newer than 0.12.3-0ubuntu15

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.