Reboot hangs because /etc/rc6.d/S40umountfs chokes on non-existent mounts

Bug #988394 reported by agenkin
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
autofs5 (Ubuntu)
Won't Fix
High
Canonical Server

Bug Description

All our machnes were hanging indefinitely when asked to reboot. We traced it down to the /etc/rc6.d/S40umountfs script hanging. The problem in our case is that autofs leaves phantom entries in /proc/mounts after it's stopped (I'm reporting this as a separate bug #988397). The entries autofs creates are not real mount points - these are the directories monitored by the autofs daemon, which mounts file systems as subdirectories to those directories.

The expected behaviour for the umountfs script would be to skip over any bogus mount points, left over by autofs or anything else.

For instance, we found out that by the time /etc/rc6.d/S40umountfs runs the autofs daemon is stopped (as expected) and all directories that had been mounted by it are already unmounted (as expected, it mounts NFS shares in our case, which are unmounted by /etc/rc6.d/S31umountnfs.sh). However, /proc/mounts still contained the following lines:

/etc/auto.nfs_h /h autofs rw,relatime,fd=6,pgrp=1004,timeout=300,minproto=5,maxproto=5,indirect 0 0
/etc/auto.nfs_s /s autofs rw,relatime,fd=12,pgrp=1004,timeout=300,minproto=5,maxproto=5,indirect 0 0
/etc/auto.nfs_cdf /cdf autofs rw,relatime,fd=18,pgrp=1004,timeout=300,minproto=5,maxproto=5,direct 0 0

which confused the umountfs script. The variable REG_MTPTS, among the proper file systems, contained the following: "/h /s /cdf". Subsequently, when the umountfs script invoked "fstab-decode umount ..." it hung trying to unmount these non-existent file systems.

In summary, I think that either the umountfs script should be made smarter to not pass bogus mount points to fstab-decode, or fstab-decode should be more robust and not hang when given a mount point that does not exist. Perhaps both.

Thanks!

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: initscripts 2.88dsf-13.10ubuntu11
ProcVersionSignature: Ubuntu 3.2.0-23.36-generic-pae 3.2.14
Uname: Linux 3.2.0-23-generic-pae i686
ApportVersion: 2.0.1-0ubuntu6
Architecture: i386
Date: Wed Apr 25 11:33:07 2012
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 SHELL=/local/bin/bash
SourcePackage: sysvinit
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
agenkin (agenkin-p) wrote :
agenkin (agenkin-p)
description: updated
Revision history for this message
agenkin (agenkin-p) wrote :

I modified the umountfs script to run fstab-decode under strace. I'm attaching the strace output.

Revision history for this message
agenkin (agenkin-p) wrote :

I've poked around at this further, and it seems that the fstab-decode command only hangs trying to unmount *direct* autofs entries in /proc/mounts, but does not hang on *indirect* ones. An example of the direct entry in /proc/mounts:

/etc/auto.nfs_cdf /cdf autofs rw,relatime,fd=18,pgrp=865,timeout=300,minproto=5,maxproto=5,direct 0 0

Note once again that this is *not* an actual mounted file system. The file system mounted by this autofs rule (if mounted) appears further down in /proc/mounts like this (I've obfuscated the IP addresses to 1.2.3.4):

homesrv:/cdf/ /cdf nfs ro,nosuid,nodev,noexec,noatime,vers=3,rsize=8192,wsize=8192,namlen=255,hard,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=1.2.3.4,mountvers=3,mountport=52150,mountproto=udp,local_lock=none,addr=1.2.3.4 0 0

Revision history for this message
agenkin (agenkin-p) wrote :

I'm attaching a one-line modification to the /etc/init.d/umountfs script that fixes the problem for us. It causes umountfs to ingore /proc/mounts entries where FSTYPE is 'autofs'.

I believe that this is a correct solution because these 'autofs' entries are not the actual file systems mounted by autofs, but the mount points or directories monitored by the automount daemon. The actual autofs-mounted file systems (if any) will not be skipped over because they will appear as a separate entry in /proc/mounts and their FSTYPE is not going to be 'autofs', but whatever it is in reality ('nfs' in our case, for example).

Changed in sysvinit (Ubuntu):
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Revision history for this message
Steve Langasek (vorlon) wrote :

I don't think this analysis is correct. At the time the umountfs script finishes, *all* filesystems are supposed to be unmounted. This includes any mount points watched by autofs. So why is autofs still running at this point in the shutdown? The autofs daemon should be shut down *prior* to /etc/rc6.d/S35networking, since at that point there's no network left and autofs should no longer be trying to mount new filesystems anyway!

And if I look at the autofs5 package, it has a wrong upstart job that does 'stop on runlevel [!2345]', which does not properly serialize the shutdown to ensure the service is stopped before we reach the unmounting phase. Thus there's a race between /etc/init/rc.conf and /etc/init/autofs.conf.

Reassigning to the autofs5 package.

affects: sysvinit (Ubuntu) → autofs5 (Ubuntu)
Changed in autofs5 (Ubuntu):
importance: Undecided → High
assignee: Canonical Foundations Team (canonical-foundations) → Canonical Server Team (canonical-server)
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "umountfs.diff" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-reviewers team please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
Revision history for this message
agenkin (agenkin-p) wrote :

Steve, I think that you are both right and wrong. You are right that autofs.conf does not work properly and that the autofs-related stuff should be stopped before umountfs script runs. However, there are two other things:

1. The entries in /proc/mounts with 'autofs' FSTYPE *are not file systems that need unmounting*, hence it is both safe and valid to ignore them in the umountfs script. The real file systems that have been mounted by the automounter would show up as separate entries in /proc/mounts with the proper FSTYPE ('ext4', 'nfs' or whatever).

2. The fstab-decode command should not hang on any input. It should be more robust and handle the inconsistencies like this more gracefully. Especially it should not hang when given a mount point at which no file system is mounted, which is the case here.

In other words, I think that my patch to the umountfs script is correct in as much as it accounts for the imperfect world in which the automounter was not shut down properly and fstab-decode can be hung.

The patch is also completely safe because ignoring the 'autofs' entries from /proc/mounts is completely harmless since they are not mounted file systems.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Looks like autofs needs to do 'stop on deconfiguring-networking'. That will ensure that the main process is completely stopped before the network is shutdown. I wonder if we should also consider raising the 'kill timeout' above 5 seconds, as it may take longer than that to unmount a lot of mounts, which could also explain the phantom mounts leftover, if the daemon was SIGKILL'd before it was done. Is it conceivable that these phantom mounts will be there even after autofs is completely shutdown in a graceful way?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

I'm changing this status to Confirmed. There's clearly a problem with autofs's shutdown, and it needs to be addressed.

Changed in autofs5 (Ubuntu):
status: New → Confirmed
Changed in autofs5 (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.