grub2 recordfail logic prevents headless system from rebooting after power outage

Bug #872244 reported by heckheck
88
This bug affects 19 people
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

With the move to grub2 I recently discovered that my headless Natty server running as a NAS device will not reboot following a power failure. I was able to track this down to the behavior of the 'recordfail' logic in grub2. This logic prevents grub from booting following an event such as a power failure. The system boots to the grub2 menu and waits with no timeout (-1).

While this feature may be completely appropriate for an attended desktop system, there should be an optional override for this behavior in the '/etc/default/grub' defaults file so that systems (such as headless ones) that need to boot following such a failure can boot without intervention. I was able to work around the problem by commenting the following lines in /etc/grub.d/00_header

#if [ ${recordfail} = 1 ]; then
 # set timeout=-1
#else
  set timeout=10
#fi

An optional grub2 default parameter that emulates this logic in the defaults would be a nice addition so people who want the "always boot" behavior don't have to hack the grub scripts by hand.

Additional info:
jheck@twilightzone:/etc/grub.d$ lsb_release -rd
Description: Ubuntu 11.04
Release: 11.04

jheck@twilightzone:/etc/grub.d$ apt-cache policy grub2
grub2:
  Installed: (none)
  Candidate: 1.99~rc1-13ubuntu3
  Version table:
     1.99~rc1-13ubuntu3 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty/universe amd64 Packages

Related bugs:
 * bug 669481: Timeout should not be -1 if $recordfail

Revision history for this message
heckheck (jinfo) wrote :

Here is a better diff of the change I made to /etc/grub.d/00_header to work around the problem

jheck@twilightzone:/etc/grub.d$ diff -Naur 00_header.orig 00_header
--- 00_header.orig 2011-10-10 19:23:44.000000000 -0400
+++ 00_header 2011-10-10 19:24:43.000000000 -0400
@@ -229,11 +229,11 @@
 make_timeout ()
 {
     cat << EOF
-if [ "\${recordfail}" = 1 ]; then
- set timeout=-1
-else
+#if [ "\${recordfail}" = 1 ]; then
+# set timeout=-1
+#else
   set timeout=${2}
-fi
+#fi
 EOF
 }

Changed in grub2 (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Nikolaj Løbner Sheller (nikolaj-l) wrote :

This resolves my problem on 12.04.
Thank you heckheck.

Richard Laager (rlaager)
description: updated
Revision history for this message
memartin (memartin) wrote :

FYI: The Ubuntu community proposes the use of a grub2 error script to deactivate the recordfail logic, preventing from problems in RAID/LVM/btrfs environments. I find that a somewhat "cleaner" approach than to edit 00_header directly, whose changes might be overwritten by upgrades. The error script replaces the according functions by hollow routines.

Here's the link to the workaround/solution from the german ubuntuusers wiki:
http://wiki.ubuntuusers.de/GRUB_2/Skripte#Fehlerskript-LVM-RAID

Put this as an executable script under /etc/grub.d/02_kill-save_env, do an update-grub, and you should be set up. At least in my case it seems to have worked (can't say exactly, because I used the fix on a headless system and cannot debug completely _why_ it works now :-)

Cheerz, Martin

PS: I strictly vote for a server version/mode of grub2 in Ubuntu. We (I) don't need recordfail, recovery, os-prober or pretty gui there, and it's cumbersome to strip down the grub2 functionality monster to a sweet server-compliant puppy. I really like the monster on desktops, but IMO it bears too much potential to break servers "by default".

Revision history for this message
Gaute Lund (gaute-idrift) wrote :

Agree: A more stripped-down grub2-package is appropriate for servers.

Also, for either servers or desktops: There should be a setting in /etc/default/grub, called GRUB_RECORDFAIL_TIMEOUT or GRUB_TIMEOUT_AFTER_BOOTFAIL or something. This is today effectively hard coded to -1 (forever). And it is this hardcoding many users fiddle with. Conclusion: go from hardcoded setting to configurable option.

I propose that the bootfail-timeout is set to 30 sec by default for servers, perhaps also for clients.

This is not unlike how it is on a different OS we all know: after a failed boot, you're halted for 30sec at the boot menu, but then the default entry boots.

Revision history for this message
Scott Moser (smoser) wrote :

Gaute,
 Note, that now this setting *is* available in /etc/default/grub. See the merge at
https://code.launchpad.net/~utlemming/ubuntu/quantal/grub2/param-recordfail-timeout/+merge/107243 .

Also, note that this is fixed in quantal cloud images.

Scott Moser (smoser)
description: updated
Revision history for this message
nh2 (nh2) wrote :

#5: This setting is *not* available in my 14.04 /etc/default/grub.

Revision history for this message
jox (joxonox) wrote :

@nh2: It might not be in there by default, but it will/should be considered when it's added. E.g. you can add the following to disable the boot menu after a failure:

  GRUB_RECORDFAIL_TIMEOUT=0

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.