Boot process hang because 'mountall' fails

Bug #1096307 reported by Assaf Hoffman
34
This bug affects 5 people
Affects Status Importance Assigned to Milestone
plymouth (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

We suspect a bug in Ubuntu 12.04 where boot process is hanged because ‘mountall’ fails due to an HDD failure.
  The issue can be easily reproduced through either of the following scenario on Ubuntu 12.04

  1. Mount a non-existing disk device node.
  2. Disk/partition error. You can simulate such error by corrupting a working partition using DD command.

  In either case, you will see system stuck at mount procedure and never reach the login prompt.
  I have tried some option like “timeout=30”, “optional”, update mountall to v2.46. But none of them is working.

  Thanks

Revision history for this message
Steve Langasek (vorlon) wrote :

Please elaborate on what you mean when you say "mount a non-existing device node". Do you mean you've added an entry to /etc/fstab? If so, please show the exact fstab that triggers this issue.

A missing system filesystem caused by an incorrect entry in /etc/fstab, or a failed disk, is indistinguishable from a device being slow to appear. It is therefore incorrect for mountall to silently ignore a missing disk and boot without it, as this would cause services to fail to start correctly; the only correct course of action is to halt the boot, notify the user of the missing filesystem, and give them the option to intervene. This is exactly what mountall does, presenting users via plymouth with the option to skip the disk.

So from your description, I don't think there's a bug here at all.

Changed in mountall (Ubuntu):
status: New → Incomplete
Revision history for this message
Ed Huang (kvin2097) wrote :

Hi Steve, this issue can be easily reproduced through either of the following scenario on Ubuntu 12.04.
Assuming Ubuntu 12.04 is located as /sda2, and I have two HDD connected as sda, sdb. sdb4 is formatted as ext4

1. Adding one entry in /etc/fstab which will try to mount a non-existing disk device node. Simulate as failure partition
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/sdc4 /test ext2 defaults 0 0

2. Intentionally corrupt a working partition through DD command to simulate a disk/partition error
    dd if=/dev/zero of=/dev/sdb4 bs=1M count=100
    Then add an entry to found the failure partition
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/sdb4 /disk1 ext2 defaults 0 0

 In either case, we saw system got stuck during mount procedure without any message indicating which partition or disk is having problem and never reach the login prompt.

Revision history for this message
Steve Langasek (vorlon) wrote :

Ed, which version of mountall do you have installed here? By design, mountall should notify the user of this mount failure via plymouth, but as your sample fstab entries show mount points that are not required for the OS to boot, they should not block the rest of the system from being started. As there have been updates to mountall in 12.04 related to ordering of filesystem events, it's important to know exactly which version of the package you reproduced this with.

Revision history for this message
Steve Langasek (vorlon) wrote :

I've tried to reproduce this with both mountall 2.36 and mountall 2.36.3, and cannot. When booting the system, with or without plymouth splash enabled, I'm shown a prompt as expected:

 The disk drive for /test is not ready yet or not present.
 Continue to wait, or Press S to skip mounting or M for manual recovery

With the corrupted disk (basically, I just tried to claim my swap partition was ext2), I get the other expected prompt:

 An error occurred while mounting /disk1.
 Press S to skip mounting or M for manual recovery

So I'm still not seeing any bugs here.

Revision history for this message
Ed Huang (kvin2097) wrote :

Steve, I forget to mentioned one thing.
The platform we use is ARM not x86. Our Ubuntu 12.04 is installed through guide for Marvell ArmadaXP Demo board.
On our deivce, there is only console /dev/ttyS0 for message output.

Is it possible that there is some package missing or and mis-configuraiton within our installation ?

Below I attached excerpted conosle message and /etc/fstab for reference.
/sda5 was mis-configured as ext2 from swap like you do
/sdc4 is non-exist disk partition

# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc nodev,noexec,nosuid 0 0
# / was on /dev/sda2 during installation
/dev/sda2 / ext4 errors=remount-ro 0 1
# /boot was on /dev/sda1 during installation
/dev/sda1 /boot ext2 defaults 0 2
# swap was on /dev/sda5 during installation
/dev/sda5 none ext2 sw 0 0
/dev/sdc4 /test ext2 defaults 0 0

cpuidle: using governor ladder
cpuidle: using governor menu
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
oprofile: hardware counters not available
oprofile: using timer interrupt.
TCP cubic registered
NET: Registered protocol family 10
IPv6 over IPv4 tunneling driver
NET: Registered protocol family 17
8021q: 802.1Q VLAN Support v1.8
VFP support v0.3: implementor 56 architecture 2 part 20 variant 9 rev 6
rtc-mv rtc-mv: setting system clock to 2013-01-31 06:07:28 UTC (1359612448)
pool #2: pkt_size=1536, buf_size=1632 - 2048 of 2048 buffers added
eth0: link up
eth0: started
IP-Config: Gateway not on directly connected network.
Freeing init memory: 200K
Loading, please wait...
udevd[501]: starting version 175
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... done.
FATAL: Could not load /lib/modules/3.2.34/modules.dep: No such file or directory
kjournald starting. Commit interval 5 seconds
EXT3-fs (sda2): warning: maximal mount count reached, running e2fsck is recommended
EXT3-fs (sda2): using internal journal
EXT3-fs (sda2): mounted filesystem with writeback data mode
Begin: Running /scripts/local-bottom ... done.
done.
Begin: Running /scripts/init-bottom ... done.
init: ureadahead main process (589) terminated with status 5
EXT2-fs (sda5): error: can't find an ext2 filesystem on dev sda5.
init: console-setup main process (661) terminated with status 1

.................... No message show up after above line...................................

Revision history for this message
Steve Langasek (vorlon) wrote :

Ok, if the messages are not showing up, that would seem to be a bug in plymouth rather than in mountall. And it seems to be specific to use of a serial console. Could you post the full contents of /proc/cmdline on the affected system? (Even better would be to run 'apport-collect 1096307' and let apport auto-collect the system data relevant to the plymouth package.)

> Is it possible that there is some package missing or and mis-configuraiton
> within our installation ?

The plymouth package is a required component and a dependency of mountall; and the 'details' plugin used as the fallback for displaying messages is part of the core plymouth package. So this is unlikely to be a configuration problem.

affects: mountall (Ubuntu) → plymouth (Ubuntu)
Changed in plymouth (Ubuntu):
status: Incomplete → New
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in plymouth (Ubuntu):
status: New → Confirmed
Revision history for this message
Zentai Andras (andras-zentai) wrote :

I experienced the same boot hangup on a 64 bit 12.04 Ubuntu after the update of plymouth to Version: 0.8.2-2ubuntu31

I had a missing swap partition in my /etc/fstab file which triggered some mountall warning before.
Seems like the new plymouth takes this error seriously and the system will not start properly until I remove this not existing swap partition from the fstab file.

Revision history for this message
Steve Langasek (vorlon) wrote :

Zentai, I've run this test here with exactly that version of plymouth. Please file a separate bug report for your issue and describe precisely how I can reproduce the issue.

Revision history for this message
Zentai Andras (andras-zentai) wrote :

Hi Steve, I tried to reproduce this bug uncommenting the not existing swap partition in my fstab file , but it does not want to hang my system any more. Sorry for wasting your time.

Revision history for this message
Nathan Groupp (nathangroupp) wrote :

I have reproduced this on the Ubuntu 12.04 LTS AWS AMI. Resolving an error is particularly nasty when using cloud services, as a second, running instance is required to debug the cause of boot failures. On AWS, this involves removing the EBS system volume from the instance then mounting it temporarily on another. (Rackspace has a recovery mode that does this automagically.) Any expectation of user input from the console is just innappropriate for cloud environments. It would be useful, in the event of mount failures of non-system-volumes to boot into some sort of ILOM or recovery-mode with SSH running. This would allow the operator to manually repair the instance.

Revision history for this message
Victor Mendonça (victorbrca) wrote :

I have the same issue in 13.04, however I don't get any messages. To be able to continue I need to edit my grub options by removing "quiet splash" and adding "nomodeset". Then I get the prompt for "Press S to skip mounting or M for manual recovery".

Revision history for this message
James Burns (jfburns) wrote :

I'm having the same issue as Nathan. There should be a way to totally disable prompting in mountall if this is to be deployed in the cloud.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 1096307] Re: Boot process hang because 'mountall' fails

On Tue, Dec 10, 2013 at 09:26:47PM -0000, James Burns wrote:
> I'm having the same issue as Nathan. There should be a way to totally
> disable prompting in mountall if this is to be deployed in the cloud.

And do what instead of prompting? The system blocks on mountall because
it's waiting for the filesystem to become available. You have no console,
and you have a misconfigured /etc/fstab. What are you expecting the system
to do here? Try to bring up sshd when half of the filesystem may still be
missing?

If you deploy an image to the cloud with a wrong /etc/fstab, you get to keep
both pieces.

If you have a specific case where mountall is blocking the boot waiting for
some mount point that it *shouldn't* wait for, you'll need to provide more
specific information.

Revision history for this message
Nathan Groupp (nathangroupp) wrote :

@Steve Langesek

Actually, yes, that sounds like a great idea. What, exactly, is the point of prompting for console input when the operator is 5,000 miles away and there is no console?

Revision history for this message
Steve Langasek (vorlon) wrote :

On Wed, Dec 11, 2013 at 08:01:07PM -0000, Nathan Groupp wrote:
> @Steve Langesek

> Actually, yes, that sounds like a great idea. What, exactly, is the
> point of prompting for console input when the operator is 5,000 miles
> away and there is no console?

There's no point in prompting. I asked you what you wanted to happen
*instead* of prompting. Because "booting the system with half its
filesystem missing" isn't going to be it.

To repeat myself:

  If you have a specific case where mountall is blocking the boot waiting for
  some mount point that it *shouldn't* wait for, you'll need to provide more
  specific information.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.