kvm hangs booting windows XP Pro SP2 or later, since at least 2.6.28-15

Bug #445456 reported by LaMont Jones
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
High
Unassigned

Bug Description

After upgrading from 2.6.28-11 to 2.6.28-15, kvm proceeded to hang booting a windows XP Pro image. Figuring it was windows being mad, I worked on creating a new image, and discovered that it doesn't boot once SP2 is installed. A current and fully patched XP Pro install does not boot unless the kernel is downgraded back to 2.6.28-11.

If needed, I can create a test image and make it available to someone on the kernel team.

CVE References

Revision history for this message
Stefan Bader (smb) wrote :

Yes a test image would be good indeed. Could you also add a bit more information about the host system
- sudo dmidecode
- sudo lspci -vvvnn
- cat /proc/cpuinfo
- cat /proc/interrupts
- cat /proc/version_signature (from the working -11 kernel)

Changed in linux (Ubuntu):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
importance: Undecided → High
Revision history for this message
LaMont Jones (lamont) wrote :
Download full text (37.6 KiB)

Here's the data. The test image may take a bit, though.

FWIW, running a jaunty kernel with karmic user-space? not so good from a responsiveness perspective, and kinda fatal to the thumbpad mouse. :-(

lamont

+ sudo dmidecode
# dmidecode 2.9
SMBIOS 2.4 present.
45 structures occupying 1979 bytes.
Table at 0x000F7180.

Handle 0xDA00, DMI type 218, 251 bytes
OEM-specific Type
 Header and Data:
  DA FB 00 DA B2 00 0D 5F 1F 37 40 7D 00 00 00 00
  00 7E 00 02 00 00 00 40 00 04 00 01 00 41 00 04
  00 00 00 65 00 05 00 00 00 66 00 05 00 01 00 5E
  00 06 00 01 00 5F 00 06 00 00 00 89 01 07 00 00
  00 8A 01 07 00 01 00 42 00 08 00 01 00 43 00 08
  00 00 00 55 00 09 00 00 00 6D 00 09 00 01 00 2D
  00 0A 00 02 00 6E 00 0A 00 01 00 2E 00 0A 00 00
  00 11 01 0B 00 00 00 10 01 0B 00 01 00 F0 00 0C
  00 01 00 ED 00 0C 00 00 00 41 01 0D 00 01 00 40
  01 0D 00 00 00 47 01 0E 00 01 00 46 01 0E 00 00
  00 4A 01 0F 00 00 00 4B 01 0F 00 01 00 52 01 10
  00 01 00 53 01 10 00 00 00 80 01 11 00 01 00 7F
  01 11 00 00 00 7C 01 12 00 01 00 7B 01 12 00 00
  00 7E 01 13 00 01 00 7D 01 13 00 00 00 92 01 14
  00 00 00 91 01 14 00 01 00 94 01 15 00 00 00 93
  01 15 00 01 00 FF FF 00 00 00 00

Handle 0xDA01, DMI type 218, 251 bytes
OEM-specific Type
 Header and Data:
  DA FB 01 DA B2 00 0D 5F 1F 37 40 86 01 16 00 01
  00 85 01 16 00 00 00 82 01 17 00 01 00 81 01 17
  00 00 00 84 01 18 00 01 00 83 01 18 00 00 00 9B
  01 19 00 00 00 9C 01 19 00 01 00 9D 01 19 00 02
  00 9E 01 19 00 03 00 8B 01 1A 00 00 00 8C 01 1A
  00 01 00 EA 00 1B 00 00 00 EB 00 1B 00 01 00 EC
  00 1B 00 02 00 28 00 1C 00 00 00 29 00 1C 00 01
  00 2A 00 1C 00 02 00 2B 00 1D 00 00 00 2C 00 1E
  00 00 00 E7 00 1F 00 01 00 E6 00 1F 00 00 00 0E
  01 20 00 01 00 0F 01 20 00 00 00 9B 00 21 00 01
  00 9C 00 21 00 00 00 4D 01 22 00 01 00 4C 01 22
  00 00 00 01 01 23 00 00 00 02 01 23 00 01 00 04
  01 23 00 02 00 37 01 24 00 00 00 38 01 24 00 01
  00 D9 01 25 00 01 00 D8 01 25 00 00 00 EA 01 26
  00 00 00 EB 01 26 00 01 00 EC 01 27 00 00 00 ED
  01 27 00 01 00 FF FF 00 00 00 00

Handle 0xDA02, DMI type 218, 53 bytes
OEM-specific Type
 Header and Data:
  DA 35 02 DA B2 00 0D 5F 1F 37 40 76 01 76 01 01
  00 75 01 75 01 01 00 01 F0 01 F0 00 00 02 F0 02
  F0 00 00 03 F0 03 F0 00 00 04 F0 04 F0 00 00 FF
  FF 00 00 00 00

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
 Vendor: Dell Inc.
 Version: A04
 Release Date: 11/05/2007
 Address: 0xF0000
 Runtime Size: 64 kB
 ROM Size: 1024 kB
 Characteristics:
  ISA is supported
  PCI is supported
  PC Card (PCMCIA) is supported
  PNP is supported
  BIOS is upgradeable
  BIOS shadowing is allowed
  Boot from CD is supported
  Selectable boot is supported
  3.5"/720 KB floppy services are supported (int 13h)
  Print screen service is supported (int 5h)
  8042 keyboard services are supported (int 9h)
  Serial services are supported (int 14h)
  Printer services are supported (int 17h)
  CGA/mono video services are supported (int 10h)
  ACPI is supported
  USB legacy is supported
  AGP is supported
  Smart battery is supported
  BIOS boot specification is supported
  Function key-initiated network boot is supported
  Targeted content distribution is supported
 BIOS Revision: 0...

Revision history for this message
LaMont Jones (lamont) wrote :

image information sent privately.

Revision history for this message
LaMont Jones (lamont) wrote :

After git-bisect, the winner is:

7088c3756a151abaadea5b1d4810c86e2651292e is the first bad commit
commit 7088c3756a151abaadea5b1d4810c86e2651292e
Author: Avi Kivity <email address hidden>
Date: Mon Mar 23 22:13:44 2009 +0200

    KVM: VMX: Don't allow uninhibited access to EFER on i386

    CVE-2009-1242

    commit 16175a796d061833aacfbd9672235f2d2725df65 upstream

    vmx_set_msr() does not allow i386 guests to touch EFER, but they can still
    do so through the default: label in the switch. If they set EFER_LME, they
    can oops the host.

    Fix by having EFER access through the normal channel (which will check for
    EFER_LME) even on i386.

    Reported-and-tested-by: Benjamin Gilbert <email address hidden>
    Cc: <email address hidden>
    Signed-off-by: Avi Kivity <email address hidden>
    Signed-off-by: Stefan Bader <email address hidden>

:040000 040000 067e338cc1db74e085e06c1bf598e10231cb7cba c57336b2f6e4e86d2e85096aaaad5f62a9c62f51 M arch

Changed in linux (Ubuntu):
status: New → Triaged
Revision history for this message
LaMont Jones (lamont) wrote :

It's also possible that this was fixed somewhere in the karmic release process, as windows now gets a hard failure in a dll with the current 2.6.31-14.48 kernel.

Revision history for this message
LaMont Jones (lamont) wrote :

Specifically:

STOP: c0000221 Unknown Hard Error
 \SystemRoot\System32\ntdll.dll

Revision history for this message
Stefan Bader (smb) wrote :

Thanks a lot for the tedious bisect. So it seems this gets down to something around 64bit mode. If I read the code correctly, the old version (for a i386 guest calling vmx_set_msr() to modify the extended feature enable register) went into the default case which directly modified the msr data and then called kvm_set_msr_common(), while the code that hangs, does not do that. I will do a debug version, which traces the codepath better and post a link to it here later.

Revision history for this message
Stefan Bader (smb) wrote :

It unfortunately took a bit, but you seemed to be successful using a 64bit host (which does confirm the problem is somewhere in the 32bit vs. 64bit area). I created a test kernel for Jaunty which adds print statements around the code that was changed with the patch that caused the regression. I test booted and at least it does not explode immediately. Could you install that as the host kernel, then try the xp boot again and attach the dmesg after trying that here? Thanks a lot. Eh, almost forgot the kernel can be found here: http://people.canonical.com/~smb/bug445456/.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Triaged a while ago but has not had any updated comments for quite some time. Please let us know if this issue remains in the current Ubuntu release, http://www.ubuntu.com/getubuntu/download . If the issue remains, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

Revision history for this message
attila123 (vangel-attila) wrote :

I have the same problem: my virtual windows xp 32 bit sp3 will not boot with kvm (will freeze quite early at booting with high cpu usage in one of the two cores (acc. to System Monitor)). It works well with qemu. It can boot in "Safe Mode" even with kvm if that matters, but in this case (with kvm) services.exe is eating up all the guest CPU with very high HDD activity (I had to close QEMU, I couldn't wait Windows to shut down). I first experienced this during the install after the reboot. I am not too experienced in Linux, just trying to switch to it, although I used it a bit earlier. I just installed Ubuntu 10.10 32 bit desktop edition with the latest updates. I installed the test kernel from the above link with "dpkg -i linux-image-2.6.28-16-generic_2.6.28-16.57+bug445456v1_i386.deb" as root, and rebooted. At boot the system freezed. All the dots were already red under the "ubuntu" label. I couldn't Ctrl + Alt + F1. Guess the kernel freezed. I powered my laptop off and back on and fortunately it changed back to my original kernel, which is:

$ uname -a
Linux Latitude-D620 2.6.35-23-generic-pae #41-Ubuntu SMP Wed Nov 24 10:35:46 UTC 2010 i686 GNU/Linux

I would send the dmesg output if you build a test kernel, which works with this version of ubuntu.

Changed in linux (Ubuntu):
status: Triaged → New
Revision history for this message
attila123 (vangel-attila) wrote :

Better yet, I can share the image file with whoever would like to debug the problem. It's right after the installation and installed size is reduced considerably thanks to nLite. It's
- (option a) 444 MB as compressed qcow2 or
- (option b) 338 MB compressed with lzma, which then expands to 924 MB of uncompressed qcow2. (lzma can't compress the compressed qcow2.). I can share it via ubuntu one, dropbox, or even via public file hosting service like hyperfileshare.com (I won't publish the link), or whatever. Just let me know what you prefer and whether option a or b.

Revision history for this message
Stefan Bader (smb) wrote :

Just to summarize: the fix for the referenced CVE broke XP boots for 2.6.28 kernels. As much as I don't like that fact but by now this has been rotting so long that Jaunty went out of support. Do we know whether Lucid has this problem or does that work.

@Attila, thanks for trying but having a Jaunty kernel in Maverick is unlikely ever working as there have been big changes on the X front (and kernel mode setting) for that to be working. Given how long I failed to get back on the bug I would concentrate on Lucid and later and close the report if it is working there.

Revision history for this message
Remigiusz Urbaniak (remy-t) wrote :

I had the same error while trying to install new winxp guest.
Decided to go for new HDD and try on Debian, but first wanted to install WinXP on first primary partition. During installation of WinXP I got "STOP: c0000221 Unknown Hard Error" again! This time without any virtualization or Linux involved

After swapping DVD-drives all went fine.
So I ripped new iso image of WinXP installation disc using new DVD-drive and tried again with WinXP on Ubuntu 11.04 - all went fine!

Don't know if it is purely hardware or hardware+kernel problem - it works with new DVD-drive and I don't have time right now (wasted too much on this issue already:/) to investigate.

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in dianosing the problem. From a terminal window please run:

apport-collect 445456

and then change the status of the bug back to 'New'. If, due to the nature of the issue you have encountered, are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Stefan Bader (smb) wrote :

Just randomly stumbled over this report again and I don't think that initial problem was in Lucid (10.04). I would have expected more feedback otherwise. Lacking a better close code I'll set this to won't fix.

Changed in linux (Ubuntu):
assignee: Stefan Bader (stefan-bader-canonical) → nobody
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.