System hangs at GRUB loading screen every warm boot since 2.6.37-3.11 seemingly due to nx-emu patch

Bug #686705 reported by Robert Hooker
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
Fix Released
High
Colin Watson
Natty
Fix Released
High
Colin Watson

Bug Description

Ever since the 2.6.37-3.11 kernel I have been unable to reboot the machine without a physical power off because the system will hang early in the boot process on the GRUB loading screen before the kernel selection menu comes up. Booting 2.6.37-2.10 and then rebooting works fine, but every kernel tested after that will hang. This problem does not happen in any of the 2.6.37 mainline kernels or in the maverick kernels.

Booting with noexec=off doesn't change the behavior.

For each step in this bisect I booted into the new kernel and then rebooted to verify the behavior. The problem happens 100% of the time after a reboot on this machine on an affected kernel.

---
Bisect log:
git bisect start
# bad: [ec3990d4ab8c3b6fbde6799c000ab0e1e80f3d66] UBUNTU: Ubuntu-2.6.37-3.11
git bisect bad ec3990d4ab8c3b6fbde6799c000ab0e1e80f3d66
# bad: [ec3990d4ab8c3b6fbde6799c000ab0e1e80f3d66] UBUNTU: Ubuntu-2.6.37-3.11
git bisect bad ec3990d4ab8c3b6fbde6799c000ab0e1e80f3d66
# good: [f814d1e47c8e427aafa304a15b0f47f69c4d7f88] UBUNTU: Ubuntu-2.6.37-2.10
git bisect good f814d1e47c8e427aafa304a15b0f47f69c4d7f88
# good: [f814d1e47c8e427aafa304a15b0f47f69c4d7f88] UBUNTU: Ubuntu-2.6.37-2.10
git bisect good f814d1e47c8e427aafa304a15b0f47f69c4d7f88
# good: [f814d1e47c8e427aafa304a15b0f47f69c4d7f88] UBUNTU: Ubuntu-2.6.37-2.10
git bisect good f814d1e47c8e427aafa304a15b0f47f69c4d7f88
# bad: [7aa61c989b468cfaeab7461273b1c07b708b377f] UBUNTU: [Config] enforcer -- ensure CONFIG_IPV6=y
git bisect bad 7aa61c989b468cfaeab7461273b1c07b708b377f
# bad: [e4fb20387f91c24d6c07d55c27ee3bf02522fba5] mmap randomization for executable mappings on 32-bit
git bisect bad e4fb20387f91c24d6c07d55c27ee3bf02522fba5
# good: [f911aa4eba9b8fadf948e25063e27801096b02ed] UBUNTU: Bump ABI
git bisect good f911aa4eba9b8fadf948e25063e27801096b02ed
# bad: [810fec489f2620a2bff8fa4b8fd39bdf32acf288] nx-emu: drop exec-shield sysctl, merge with disable_nx
git bisect bad 810fec489f2620a2bff8fa4b8fd39bdf32acf288
sarvatt@tangerine:~/ubuntu-natty$ git bisect bad
ed6f363a412661f45a1db9c3456db2f9d5057612 is the first bad commit
commit ed6f363a412661f45a1db9c3456db2f9d5057612
Author: Roland McGrath <email address hidden>
Date: Wed Jul 14 00:50:02 2010 -0700

    i386: NX emulation

    This is old code with some cruft, all originally by Ingo with much
    later rebasing by Fedora folks and at least one arcane fix by Roland
    a few years ago.

    Signed-off-by: Roland McGrath <email address hidden>
    Signed-off-by: Kees Cook <email address hidden>
    Signed-off-by: Andy Whitcroft <email address hidden>

:040000 040000 17e4b20dfd007b55d44c0013e1a47e2c23602217 8d77a604d04225927a16b851078285b9d6329bec M arch
:040000 040000 82c83fb772dc0758762eeac99acaa6d2f53d7877 3b77ce5c8e685e714debdd745fda5011634224c4 M fs
:040000 040000 b4ee2451899647274251944afaf4ba7136b9f336 c8ed8ccfccda0a17c60d5bffbef42163a2f74abf M include
:040000 040000 074de35017a1fe9c51156db6d5c3b6f1b5d71844 f9e2866d6302a92fca4e2620a0c68ca1cd7a607b M kernel
:040000 040000 26606ab7d024864b8ab822a3974e7920c7ed49e9 afa81af2d32b495cf0942c4e079908c270e0f871 M mm
---

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: linux-image-2.6.37-8-generic 2.6.37-8.21
Regression: Yes
ProcVersionSignature: Ubuntu 2.6.37-8.21-generic 2.6.37-rc4
Uname: Linux 2.6.37-8-generic i686
Architecture: i386
Date: Tue Dec 7 13:58:41 2010
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: linux

Related branches

Revision history for this message
Robert Hooker (sarvatt) wrote :
Revision history for this message
Robert Hooker (sarvatt) wrote :
Revision history for this message
Robert Hooker (sarvatt) wrote :
Revision history for this message
Robert Hooker (sarvatt) wrote :
description: updated
Revision history for this message
Robert Hooker (sarvatt) wrote :
description: updated
Revision history for this message
Robert Hooker (sarvatt) wrote :

If there are any other info that would be useful please let me know, apport left a lot to be desired here.

Revision history for this message
Kees Cook (kees) wrote :

The nx-emu patches haven't changed hardly at all since they were first added several releases ago. I just compared natty to maverick, and natty's only difference is the removal of the exec_shield cmdline variable in favor of using !disable_nx.

The changes to hardware NX bios masking didn't happen until -5.13, so that's not a variable in this. It seems that this system has the NX bit available in the CPU, but booting 32bit -generic should just ignore it, resulting in the nx-emu kicking in. I'm at a bit of a loss to explain what could be going wrong, especially since the nx-emu changes are mostly related to setting up userspace limits.

Revision history for this message
Robert Hooker (sarvatt) wrote :

2.6.37-3.11 generic-pae works, looks like it's just affecting -generic

Revision history for this message
Robert Hooker (sarvatt) wrote :

2.6.37-8-generic-pae also works properly

Revision history for this message
Kees Cook (kees) wrote :

I would be curious to see what happens when you use -3.11 -generic, and -3.11 -generic-pae. I would be curious to see the dmesg for -generic-pae too.

If you can run this test:
http://bazaar.launchpad.net/~ubuntu-bugcontrol/qa-regression-testing/master/files/head%3A/scripts/kernel-security/nx/

As "./nx-test stack" and report what "dmesg" shows for it, that would help verify the kernel behaviors too. For hardware NX (-generic-pae) I would expect "segfault" being reported:
[69673.123305] nx-test[28050]: segfault at 7fff9aaaa96c ip 00007fff9aaaa96c sp 00007fff9aaaa928 error 15

For NX-emu (-generic), I would expect "general protection fault" being reported instead.

Again, this all just verifies the behavior of the NX-emu patch. I still don't understand what is causing the hangs.

Revision history for this message
Robert Hooker (sarvatt) wrote :

Linux asuka 2.6.37-3-generic #11-Ubuntu SMP Fri Nov 12 02:09:53 UTC 2010 i686 GNU/Linux

data: 0x804a050
bss: 0x805a080
brk: 0x8af7008
rw: 0xb7771000
rwx: 0x230000
stack: 0xbfdde6b4
Dump of /proc/self/maps:
00230000-00231000 rwxp 00000000 00:00 0
00952000-00aa9000 r-xp 00000000 08:01 122638 /lib/libc-2.12.1.so
00aa9000-00aaa000 ---p 00157000 08:01 122638 /lib/libc-2.12.1.so
00aaa000-00aac000 r--p 00157000 08:01 122638 /lib/libc-2.12.1.so
00aac000-00aad000 rw-p 00159000 08:01 122638 /lib/libc-2.12.1.so
00aad000-00ab0000 rw-p 00000000 00:00 0
00e3a000-00e56000 r-xp 00000000 08:01 122517 /lib/ld-2.12.1.so
00e56000-00e57000 r--p 0001b000 08:01 122517 /lib/ld-2.12.1.so
00e57000-00e58000 rw-p 0001c000 08:01 122517 /lib/ld-2.12.1.so
08048000-08049000 r-xp 00000000 08:01 17196 /home/robert/nx/nx-test
08049000-0804a000 r--p 00000000 08:01 17196 /home/robert/nx/nx-test
0804a000-0804b000 rw-p 00001000 08:01 17196 /home/robert/nx/nx-test
0804b000-0805b000 rw-p 00000000 00:00 0
08af7000-08b19000 rw-p 00000000 00:00 0 [heap]
b7751000-b7752000 rw-p 00000000 00:00 0
b7770000-b7775000 rw-p 00000000 00:00 0
bfdc0000-bfde1000 rw-p 00000000 00:00 0 [stack]
Attempting to execute function at 0xbfdde6b8
If this program seg-faults, the region was enforced as non-executable...
Segmentation fault (core dumped)

[ 48.955549] nx-test[1780] general protection ip:8048ada sp:bfdde680 error:0 in nx-test[8048000+1000]

----

Linux asuka 2.6.37-3-generic-pae #11-Ubuntu SMP Fri Nov 12 02:27:02 UTC 2010 i686 GNU/Linux

data: 0x804a050
bss: 0x805a080
brk: 0x9db7008
rw: 0xb771f000
rwx: 0xb771e000
stack: 0xbf9ccaf4
Dump of /proc/self/maps:
08048000-08049000 r-xp 00000000 08:01 17196 /home/robert/nx/nx-test
08049000-0804a000 r--p 00000000 08:01 17196 /home/robert/nx/nx-test
0804a000-0804b000 rw-p 00001000 08:01 17196 /home/robert/nx/nx-test
0804b000-0805b000 rw-p 00000000 00:00 0
09db7000-09dd9000 rw-p 00000000 00:00 0 [heap]
b75a1000-b75a2000 rw-p 00000000 00:00 0
b75a2000-b76f9000 r-xp 00000000 08:01 122638 /lib/libc-2.12.1.so
b76f9000-b76fa000 ---p 00157000 08:01 122638 /lib/libc-2.12.1.so
b76fa000-b76fc000 r--p 00157000 08:01 122638 /lib/libc-2.12.1.so
b76fc000-b76fd000 rw-p 00159000 08:01 122638 /lib/libc-2.12.1.so
b76fd000-b7700000 rw-p 00000000 00:00 0
b771d000-b771e000 rw-p 00000000 00:00 0
b771e000-b771f000 rwxp 00000000 00:00 0
b771f000-b7723000 rw-p 00000000 00:00 0
b7723000-b773f000 r-xp 00000000 08:01 122517 /lib/ld-2.12.1.so
b773f000-b7740000 r--p 0001b000 08:01 122517 /lib/ld-2.12.1.so
b7740000-b7741000 rw-p 0001c000 08:01 122517 /lib/ld-2.12.1.so
bf9ad000-bf9ce000 rw-p 00000000 00:00 0 [stack]
Attempting to execute function at 0xbf9ccaf8
If this program seg-faults, the region was enforced as non-executable...
Segmentation fault (core dumped)

[ 54.181324] nx-test[1856]: segfault at bf9ccaf8 ip bf9ccaf8 sp bf9ccabc error 15

Revision history for this message
Robert Hooker (sarvatt) wrote :
Revision history for this message
Robert Hooker (sarvatt) wrote :
Revision history for this message
Kees Cook (kees) wrote :

Thanks for the details. This confirms that nx-emu vs hardware nx are working as expected between -generic and -generic-pae. This verifies that nothing else strange is going on, but I'm still at a loss about what could be causing the warm-boot failure.

Kees Cook (kees)
summary: System hangs at GRUB loading screen every warm boot since 2.6.37-3.11
+ seemingly due to nx-emu patch
Revision history for this message
Kees Cook (kees) wrote :

For reference, the nx-emu patch series can be seen with:

maverick: git diff d70f1a337f774617558ef73dc6986663c80c71ea 1c383c3a61860f857542ef1b30eb7910aed94ae4
natty: git diff 76e8e02240b121000737c91702ada3c761bb3bf2 579c4170d4fe55b68404e7fc27e4fb28d762b12e

(Though note that natty gained a reporting fixup that isn't in the above contiguous commits: 3e8ec050d688dfb8a3a9017e0bb3afbc5aea52f0)

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Robert - can you get any info by adding 'earlyprintk=vga' to the kernel boot command ?

Revision history for this message
Kees Cook (kees) wrote :

apw has suggested that this might be related to grub changes in natty. Can you try to boot with the grub command:

set gfxpayload=text

though I'm not clear when/where to use that, possibly via /etc/default/grub:

GRUB_TERMINAL=console

And then "sudo update-grub"

Revision history for this message
Robert Hooker (sarvatt) wrote :

earlyprintk=vga only shows it after grub is done and it hangs before the grub kernel selection menu even comes up so I didn't have any luck there unfortunately. I should have mentioned I am using gfxpayload=text in these logs

Revision history for this message
Alex Spurling (alexspurling) wrote :

Robert, can you post more details as to the symptoms of this problem? At which point does the system hang? Can you post a screenshot?

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Robert - you're not even making it to the grub menu? That sure sounds like a BIOS issue. The last warm boot from the O/S ought not be able to leave the HW in a state from which the BIOS cannot recover.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

On second thought, perhaps its a grub problem.

tags: added: kernel-series-unknown
Revision history for this message
Tobin Davis (gruemaster) wrote :

I have been unable to reproduce this on my Acer Aspire One 532h (similar hardware) running Natty Alpha 1.

Revision history for this message
Robert Hooker (sarvatt) wrote :

Bad news, I just reflashed my bios and lost all the custom stuff I did to it and I'm still experiencing it. Now I get this on boot with 2.6.37-9.23-generic

[ 0.000000] NX (Execute Disable) protection: approximated by x86 segment limits

instead of

[ 0.000000] Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!

but after booting that kernel and then rebooting grub still hangs at GRUB loading before I can select a kernel.

Andy Whitcroft (apw)
Changed in linux (Ubuntu):
status: New → Incomplete
status: Incomplete → Triaged
status: Triaged → Confirmed
importance: Undecided → High
assignee: nobody → Andy Whitcroft (apw)
Revision history for this message
Andy Whitcroft (apw) wrote :

I have a machine which is also showing this issue on warm boot; grub hanging before it presents it menu. Confirmed that backing out the NX emulation patches fixes this.

Colin Watson (cjwatson)
affects: linux (Ubuntu Natty) → grub2 (Ubuntu Natty)
Changed in grub2 (Ubuntu Natty):
assignee: Andy Whitcroft (apw) → Colin Watson (cjwatson)
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (4.6 KiB)

This bug was fixed in the package grub2 - 1.99~20101221-1ubuntu1

---------------
grub2 (1.99~20101221-1ubuntu1) natty; urgency=low

  * Resynchronise with Debian experimental. Remaining changes:
    - Adjust for default Ubuntu boot options ("quiet splash").
    - Default to hiding the menu; holding down Shift at boot will show it.
    - Set a monochromatic theme and an aubergine background for Ubuntu.
    - Apply Ubuntu GRUB Legacy changes to legacy update-grub script: title,
      recovery mode, quiet option, tweak how memtest86+ is displayed, and
      use UUIDs where appropriate.
    - Fix backslash-escaping in merge_debconf_into_conf.
    - Remove "GNU/Linux" from default distributor string.
    - Add crashkernel= options if kdump and makedumpfile are available.
    - If other operating systems are installed, then automatically unhide
      the menu. Otherwise, if GRUB_HIDDEN_TIMEOUT is 0, then use keystatus
      if available to check whether Shift is pressed. If it is, show the
      menu, otherwise boot immediately. If keystatus is not available, then
      fall back to a short delay interruptible with Escape.
    - Allow Shift to interrupt 'sleep --interruptible'.
    - Don't display introductory message about line editing unless we're
      actually offering a shell prompt. Don't clear the screen just before
      booting if we never drew the menu in the first place.
    - Remove some verbose messages printed before reading the configuration
      file.
    - Suppress progress messages as the kernel and initrd load for
      non-recovery kernel menu entries.
    - Change prepare_grub_to_access_device to handle filesystems
      loop-mounted on file images.
    - Ignore devices loop-mounted from files in 10_linux.
    - Show the boot menu if the previous boot failed, that is if it failed
      to get to the end of one of the normal runlevels.
    - Don't generate /boot/grub/device.map during grub-install or
      grub-mkconfig by default.
    - Adjust upgrade version checks for Ubuntu.
    - Don't display "GRUB loading" unless Shift is held down.
    - Adjust versions of grub-doc and grub-legacy-doc conflicts to tolerate
      our backport of the grub-doc split.
    - Fix LVM/RAID probing in the absence of /boot/grub/device.map.
    - Look for .mo files in /usr/share/locale-langpack as well, in
      preference.
    - Make sure GRUB_TIMEOUT isn't quoted unnecessarily.
    - Probe all devices in 'grub-probe --target=drive' if
      /boot/grub/device.map is missing.
    - Build-depend on qemu-kvm rather than qemu-system for grub-pc tests.
    - Use qemu rather than qemu-system-i386.
    - Program vesafb on BIOS systems rather than efifb.
    - Add a grub-rescue-efi-amd64 package containing a rescue CD-ROM image
      for EFI-AMD64.
    - On Wubi, don't ask for an install device, but just update wubildr
      using the diverted grub-install.
    - When embedding the core image in a post-MBR gap, check for and avoid
      sectors matching any of a list of known signatures.
    - Disable video_bochs and video_cirrus on PC BIOS systems, as probing
      PCI space seems to break on some systems.
    - Downgrade "ACPI shutdown failed" e...

Read more...

Changed in grub2 (Ubuntu Natty):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.