idmapd upstart job ends in an inconsistent state if /usr is a separate partition

Bug #811823 reported by Steve Langasek
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
nfs-utils (Ubuntu)
Fix Released
Medium
Steve Langasek
Lucid
Fix Released
Medium
Steve Langasek
Maverick
Fix Released
Medium
Steve Langasek
Natty
Fix Released
Medium
Steve Langasek

Bug Description

Because the idmapd upstart job uses a script line instead of directly exec'ing the server, in the case where /usr is a separate partition and not yet mounted at the time the job is started (e.g., because an NFS mount is attempted in parallel to an fsck of /usr), the job will fail to start and get wedged in a 'start/running' state with no associated PID (upstart bug #545673). If we exec the daemon directly, upstart will detect the exec() failure and respawn as needed.

This partially addresses the problems described in bug #643289, and is worth SRUing in its own right even though it doesn't provide a complete solution for all users.

SRU justification: race conditions in the nfs-utils upstart jobs cause NFS client startup to be unreliable in certain configurations in lucid and above, a regression vs. pre-upstart releases of Ubuntu.

Although this upload includes the removal of the rpc_pipefs job (moving its functionality into the gssd and idmapd jobs for simplicity), the real risk of regression here is very small. There are no changes to the start/stop conditions to the gssd or idmapd jobs, only a change to the script rule for the idmapd job. This should have *no effect* except in the case where /usr/sbin/rpc.idmapd is missing at the time the job runs - i.e., in the case when /usr has not yet been mounted.

TEST CASE:
1. configure a system with /usr on a separate partition.
2. install nfs-common from the release (or -updates) pocket.
3. configure an NFSv4 mount in /etc/fstab to trigger idmapd to start before /usr is mounted. (The nfsv4 mount does not have to actually be mountable. Here is an example entry:
borges:/ /home/devel nfs4 sec=krb5i,proto=tcp 0 0
)
4. boot and verify with 'service idmapd' that the idmapd service is left in state 'start/running', with no associated process.
5. install nfs-common from the -proposed pocket and reboot.
6. verify with 'service idmapd' that the idmapd service has been successfully started, and is in state 'start/running' with an associated pid.

Steve Langasek (vorlon)
Changed in nfs-utils (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Steve Langasek (vorlon)
Steve Langasek (vorlon)
Changed in nfs-utils (Ubuntu Lucid):
status: New → Triaged
Changed in nfs-utils (Ubuntu Maverick):
status: New → Triaged
Changed in nfs-utils (Ubuntu Natty):
status: New → Triaged
Changed in nfs-utils (Ubuntu Lucid):
importance: Undecided → Medium
Changed in nfs-utils (Ubuntu Maverick):
importance: Undecided → Medium
Changed in nfs-utils (Ubuntu Natty):
importance: Undecided → Medium
Changed in nfs-utils (Ubuntu Lucid):
assignee: nobody → Steve Langasek (vorlon)
Changed in nfs-utils (Ubuntu Maverick):
assignee: nobody → Steve Langasek (vorlon)
Changed in nfs-utils (Ubuntu Natty):
assignee: nobody → Steve Langasek (vorlon)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nfs-utils - 1:1.2.2-4ubuntu8

---------------
nfs-utils (1:1.2.2-4ubuntu8) oneiric; urgency=low

  * debian/nfs-common.idmapd.upstart: don't use a script unnecessarily for
    our job when we can exec directly - making the job more resilient in
    the face of races with /usr being mounted. LP: #811823.
  * Drop rpc_pipefs.conf; this has gotten far more complicated than it
    should be, just do the mount in-line in each of the gssd and idmapd
    jobs.
 -- Steve Langasek <email address hidden> Sun, 17 Jul 2011 02:23:01 -0700

Changed in nfs-utils (Ubuntu):
status: Triaged → Fix Released
Steve Langasek (vorlon)
Changed in nfs-utils (Ubuntu Natty):
status: Triaged → In Progress
Steve Langasek (vorlon)
description: updated
Steve Langasek (vorlon)
Changed in nfs-utils (Ubuntu Maverick):
status: Triaged → In Progress
Changed in nfs-utils (Ubuntu Lucid):
status: Triaged → In Progress
Revision history for this message
Clint Byrum (clint-fewbar) wrote : Please test proposed package

Hello Steve, or anyone else affected,

Accepted nfs-utils into natty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in nfs-utils (Ubuntu Natty):
status: In Progress → Fix Committed
tags: added: verification-needed
Changed in nfs-utils (Ubuntu Maverick):
status: In Progress → Fix Committed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Hello Steve, or anyone else affected,

Accepted nfs-utils into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in nfs-utils (Ubuntu Lucid):
status: In Progress → Fix Committed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Hello Steve, or anyone else affected,

Accepted nfs-utils into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

maxjos (magne-fortytwo)
Changed in nfs-utils (Ubuntu Natty):
status: Fix Committed → Fix Released
Steve Langasek (vorlon)
Changed in nfs-utils (Ubuntu Natty):
status: Fix Released → Fix Committed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Marking verification-done-lucid per this comment:

https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/643289/comments/27

tags: added: verification-done-lucid
Revision history for this message
Steve Atwell (satwell) wrote :

Steve, I'm looking at your change to gssd.conf and idmapd.conf, and it looks like you've got a race condition. You add the following to the pre-start script of both:

        if ! mountpoint -q "$PIPEFS_MOUNTPOINT"
        then
                mount -t rpc_pipefs rpc_pipefs "$PIPEFS_MOUNTPOINT"
        fi

The problem is that the filesystem could get mounted between when you check with mountpoint and when you call mount. And because mount is the last command in the pre-start script, if it fails then the pre-start will fail, preventing the job from starting. I haven't actually seen this issue, but since two jobs are running this same code around the same time, it definitely seems like a possibility.

Probably a simple "|| true" on the mount command would be sufficient. If the mountpoint isn't actually there when rpc.gssd or rpc.idmapd starts, that process will exit and upstart will retry.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 811823] Re: idmapd upstart job ends in an inconsistent state if /usr is a separate partition

Hi Steve,

On Thu, Sep 08, 2011 at 09:12:53PM -0000, Steve Atwell wrote:
> Steve, I'm looking at your change to gssd.conf and idmapd.conf, and it
> looks like you've got a race condition. You add the following to the
> pre-start script of both:

> if ! mountpoint -q "$PIPEFS_MOUNTPOINT"
> then
> mount -t rpc_pipefs rpc_pipefs "$PIPEFS_MOUNTPOINT"
> fi

> The problem is that the filesystem could get mounted between when you
> check with mountpoint and when you call mount. And because mount is the
> last command in the pre-start script, if it fails then the pre-start
> will fail, preventing the job from starting. I haven't actually seen
> this issue, but since two jobs are running this same code around the
> same time, it definitely seems like a possibility.

> Probably a simple "|| true" on the mount command would be sufficient.
> If the mountpoint isn't actually there when rpc.gssd or rpc.idmapd
> starts, that process will exit and upstart will retry.

You're right that this is not atomic; however, I believe the worst-case
scenario is that the kernel fs will be mounted twice at the same mount
point, which should not cause significant problems. (I have seen this
outcome in practice on my machine while testing.)

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Steve Atwell (satwell) wrote :

> You're right that this is not atomic; however, I believe the worst-case
> scenario is that the kernel fs will be mounted twice at the same mount
> point, which should not cause significant problems. (I have seen this
> outcome in practice on my machine while testing.)

Hm, that's not the behavior that I see. If I try to mount rpc_pipefs twice, mount exits with an error on the second attempt.

root@satwelltest1:~# mount -t rpc_pipefs rpc_pipefs /var/lib/nfs/rpc_pipefs
root@satwelltest1:~# mount -t rpc_pipefs rpc_pipefs /var/lib/nfs/rpc_pipefs
mount: rpc_pipefs already mounted or /var/lib/nfs/rpc_pipefs busy
mount: according to mtab, rpc_pipefs is already mounted on /var/lib/nfs/rpc_pipefs
root@satwelltest1:~# echo $?
32

tags: added: testcase
Revision history for this message
Scott Kitterman (kitterman) wrote :

I have a report from a user on Natty that this resolved a boot time NFS race condition for them.

"This did the trick. I applied the 5.1 patch on the machine where we run our
nfs client &, upon rebooting the nfs client machine, the nfs share was
properly mounted."

tags: added: verification-done-natty
tags: added: verification-done
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nfs-utils - 1:1.2.2-4ubuntu5.1

---------------
nfs-utils (1:1.2.2-4ubuntu5.1) natty-proposed; urgency=low

  * debian/nfs-common.idmapd.upstart: don't use a script unnecessarily for
    our job when we can exec directly - making the job more resilient in
    the face of races with /usr being mounted. LP: #811823.
  * Drop rpc_pipefs.conf; this has gotten far more complicated than it
    should be, just do the mount in-line in each of the gssd and idmapd
    jobs.
 -- Steve Langasek <email address hidden> Sun, 17 Jul 2011 22:21:57 -0700

Changed in nfs-utils (Ubuntu Natty):
status: Fix Committed → Fix Released
tags: removed: verification-done-natty
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nfs-utils - 1:1.2.2-1ubuntu1.2

---------------
nfs-utils (1:1.2.2-1ubuntu1.2) maverick-proposed; urgency=low

  * debian/nfs-common.idmapd.upstart: don't use a script unnecessarily for
    our job when we can exec directly - making the job more resilient in
    the face of races with /usr being mounted. LP: #811823.
  * Drop rpc_pipefs.conf; this has gotten far more complicated than it
    should be, just do the mount in-line in each of the gssd and idmapd
    jobs.
 -- Steve Langasek <email address hidden> Sun, 17 Jul 2011 22:45:15 -0700

Changed in nfs-utils (Ubuntu Maverick):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nfs-utils - 1:1.2.0-4ubuntu4.2

---------------
nfs-utils (1:1.2.0-4ubuntu4.2) lucid-proposed; urgency=low

  * debian/nfs-common.idmapd.upstart: don't use a script unnecessarily for
    our job when we can exec directly - making the job more resilient in
    the face of races with /usr being mounted. LP: #811823.
  * Drop rpc_pipefs.conf; this has gotten far more complicated than it
    should be, just do the mount in-line in each of the gssd and idmapd
    jobs.
 -- Steve Langasek <email address hidden> Sun, 17 Jul 2011 22:51:42 -0700

Changed in nfs-utils (Ubuntu Lucid):
status: Fix Committed → Fix Released
tags: removed: verification-done-lucid verification-needed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.