'BUG: soft lockup' after lvm or lvm+encrypt install when using 'kvm -drive'

Bug #221032 reported by Jamie Strandboge
4
Affects Status Importance Assigned to Milestone
kvm (Ubuntu)
Fix Released
High
Tim Gardner

Bug Description

Using Ubuntu Server iso for amd64 [20080423.2] from http://iso.qa.ubuntu.com/qatracker/info/1612, I get the attached screenshot after installing lvm+encrypt. After entering the LUKS passphrase, I get:

[ 0.000000] BUG: soft lockup - CPU#0 stuck for 11s [lvm:2228]
[ 0.000000] BUG: soft lockup - CPU#1 stuck for 11s [udevd:2230]

over and over again. I used kvm created with virt-manager with 128M Ram, 1G non-preallocated disk, Ubuntu/Hardy, and 2 vcpus.

This also affects a regular 'lvm' install when using the above configuration.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Reproduced again with the above configuration. Also tried with 2 vcpus, 256MB, 4G disk: same error. For completeness, my LUKS password was 'foo' and answered 'yes' to using weak encryption.

I use virt-manager, and the resulting kvm invocations end up being:
/usr/bin/kvm -M pc -m 256 -smp 2 -monitor pty -no-reboot -drive file=/srv/vms/isos/hardy/hardy-server-amd64.iso,if=ide,media=cdrom,boot=on -drive file=/home/jamie/hardy-crypt1.img,if=ide -net nic,macaddr=00:16:3e:4f:d9:81,vlan=0,model=virtio -net tap,fd=12,script=,vlan=0 -usb -vnc 127.0.0.1:1

/usr/bin/kvm -M pc -m 128 -smp 2 -monitor pty -no-reboot -drive file=/srv/vms/isos/hardy/hardy-server-amd64.iso,if=ide,media=cdrom,boot=on -drive file=/home/jamie/hardy-crypt2.img,if=ide -net nic,macaddr=00:16:3e:3b:54:6f,vlan=0,model=virtio -net tap,fd=17,script=,vlan=0 -usb -vnc 127.0.0.1:2

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Not sure if this is helpful, but tried again and found this in kern.log when launching the vm:

Apr 23 12:16:43 severus kernel: [16801.095974] device vnet2 entered promiscuous mode
Apr 23 12:16:43 severus kernel: [16801.095987] audit(1208967403.109:56): dev=vnet2 prom=256 old_prom=0 auid=4294967295
Apr 23 12:16:43 severus kernel: [16801.100399] vnet0: port 2(vnet2) entering listening state
Apr 23 12:16:43 severus kernel: [16801.694888] SIPI to vcpu 1 vector 0x10
Apr 23 12:16:49 severus kernel: [16807.109080] Ignoring de-assert INIT to vcpu 1
Apr 23 12:16:49 severus kernel: [16807.109905] SIPI to vcpu 1 vector 0x06
Apr 23 12:16:49 severus kernel: [16807.155992] SIPI to vcpu 1 vector 0x06
Apr 23 12:16:53 severus kernel: [16811.880659] vnet2: no IPv6 routers present
Apr 23 12:16:58 severus kernel: [16816.072131] vnet0: port 2(vnet2) entering learning state
Apr 23 12:17:13 severus kernel: [16831.041663] vnet0: topology change detected, propagating
Apr 23 12:17:13 severus kernel: [16831.041671] vnet0: port 2(vnet2) entering forwarding state
Apr 23 12:17:23 severus kernel: [16841.045164] heci: schedule work the heci_bh_handler failed error=0

The 'heci' line keeps repeating.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

In my testing, I did have a 2 vcpu vm a) reboot successfully once, b) get hung at the LUKS password prompt. Hang might be bug #221059.

My host system is a dual core 64bit 'Genuine Intel(R) CPU 3.00GHz' processor.

Revision history for this message
Chuck Short (zulcss) wrote :

I was not able to repdroduce this on a core 2 duo i386 with the same iso on real hardware.

Thanks
chuck

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Works fine if install the machine with:
$ qemu-img create ./hardy-crypt.img 1G
$ kvm -hda ./hardy-crypt.img -cdrom /srv/vms/isos/hardy/hardy-server-amd64.iso -m 128 -vnc localhost:10 -smp 2 -boot d
$ kvm -hda ./hardy-crypt.img -m 128 -vnc localhost:10 -smp 2

Note that reboots may hit bug #221059. This bug may be related to bug #220463.

description: updated
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

For kicks I removed 'acpi' from the libvirt guest, and got the same thing. Possibly the difference between specifying '-hda' vs '-drive'?

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

All testing was initially done with a '-generic' host kernel and a '-server' guest kernel. This configuration resulted in the soft lockups. If I use the '-server' host kernel and the same '-server' guest kernel, then the machine boots with no soft lockups.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

UPDATE:

-server kernel gives far few lockups with kvm62 on hardy, but it locked up twice for me. Here is my testing so far (all tests have -server guest kernel):

kvm66 with -generic host kernel: 11 FAIL 1 PASS
kvm66 with -server host kernel: 5 FAIL 0 PASS
kvm62 (hardy) -generic host kernel: 13 FAIL 0 PASS
kvm62 (hardy) -server host kernel: 2 FAIL 7 PASS

'FAIL' means soft lockup encountered and 'PASS' means successful boot to login prompt. Note that for each test I start with a blank slate by stopping all vms, killing all kvm processes, then reloading the kvm driver.

Revision history for this message
Soren Hansen (soren) wrote :

I figured I'd put my data in here as well in case anyone else feels like helping out. For some odd reason, I need to have quite a few vcpu's before I run into problems:

kvm62, generic kernel, -smp 2: 17 PASS 0 FAIL
kvm62, generic kernel, -smp 12: 6 PASS 1 FAIL
kvm62, generic kernel, -smp 16: 9 PASS 1 FAIL

Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in kvm:
assignee: nobody → timg-tpi
importance: Undecided → High
milestone: none → ubuntu-8.04.1
status: New → Fix Committed
Revision history for this message
danyj028 (danyj028-gmail) wrote :

Hello

I have had issues with KVM (amd) on Hardy with smp option, with or without acpi - makes no difference.
Recompiling, installing and using kvm from latest source did not make any difference.

It freezes on bootup, or in X, - sometimes, keyboard stil works but mouse does not work anymore. The faults are not reliably reproduceable, Also the tap networking option sometimes fail. When X freezes, only option is to kill host X (ctrl-alt-bkspce , cant even get back to host).

Note that the same VM and kvm call worked OK with Feisty. (Don't know about Gutsy, I usually only use every 2nd release)

However all seem to be perfectly once the smp option is removed - using it now to type this email.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

SRU Justification:

Impact: Host kernel hangs when starting KVM guests.
Fix Description: Don't pre-fetch guest pages until after they have been mapped into physical memory.
Patch: http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=commit;h=a7c28c405ae69a9493b04d3b6209c0bd06472afb

TEST CASE: Install Hardy server ISO into a kvm, select lvm+encrypt. Host kernel randomly hangs.

Revision history for this message
Soren Hansen (soren) wrote :

Correction:

SRU justification:

Impact: Guests hang during boot if they're using lvm and smp.
Fix description: Mark pages as write protected *before* they're prefetched, so that the host can handle page faults in the guest.
Patch: http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=commit;h=a7c28c405ae69a9493b04d3b6209c0bd06472afb

TEST CASE: Boot a hardy guest that uses root on lvm (create it with the guided lvm partitioning in the installer, if you don't already have one) like so:

    kvm -smp 8 -drive file=/path/to/your/disk.img,boot=on -m 256

Depending on your host hardware, this will hang anywhere between 10% and 99.9% of the time without the patch applied. With the patch applied, the problem should go away completely.

Revision history for this message
Colin Watson (cjwatson) wrote :

Accepted into hardy-proposed.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I am in the 99.9% hang group Soren mentioned, and updating to linux 2.6.24-17.31 fixes the problem for me.

Revision history for this message
Martin Pitt (pitti) wrote :

linux 2.6.24-17.31 copied to hardy-updates.

Changed in kvm:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.