portmap/statd can not be restarted

Bug #688550 reported by Goswin von Brederlow
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
portmap (Ubuntu)
Won't Fix
Undecided
Adam Stokes
Lucid
Won't Fix
High
Adam Stokes
Maverick
Invalid
Undecided
Adam Stokes
Natty
Invalid
Undecided
Adam Stokes
Oneiric
Invalid
Undecided
Adam Stokes

Bug Description

Binary package hint: portmap

[Impact]
Anyone attempting to upgrade portmap on Lucid with a currently running portmap process in an established state.

[Test Case]
When restarting portmap / statd the old portmap process does not properly free its socket causing the new portmap process to fail and upstart to stay stuck in pre-start trying to stop statd.

beo-15:~# stop portmap

Dec 10 14:20:07 beo-15 init: Connection from private client
Dec 10 14:20:07 beo-15 init: portmap goal changed from start to stop
Dec 10 14:20:07 beo-15 init: portmap state changed from running to pre-stop
Dec 10 14:20:07 beo-15 init: portmap pre-stop process (1996)
Dec 10 14:20:07 beo-15 init: portmap pre-stop process (1996) exited normally
Dec 10 14:20:07 beo-15 init: portmap state changed from pre-stop to stopping
Dec 10 14:20:07 beo-15 init: Handling stopping event
Dec 10 14:20:07 beo-15 init: statd goal changed from start to stop
Dec 10 14:20:07 beo-15 init: statd state changed from running to pre-stop
Dec 10 14:20:07 beo-15 init: statd state changed from pre-stop to stopping
Dec 10 14:20:07 beo-15 init: Handling stopping event
Dec 10 14:20:07 beo-15 init: statd state changed from stopping to killed
Dec 10 14:20:07 beo-15 init: Sending TERM signal to statd main process (1053)
Dec 10 14:20:07 beo-15 rpc.statd[1053]: Caught signal 15, un-registering and exiting.
Dec 10 14:20:07 beo-15 init: statd main process (1053) exited normally
Dec 10 14:20:07 beo-15 init: statd state changed from killed to post-stop
Dec 10 14:20:07 beo-15 init: statd state changed from post-stop to waiting
Dec 10 14:20:07 beo-15 init: portmap state changed from stopping to killed
Dec 10 14:20:07 beo-15 init: Sending TERM signal to portmap main process (1033)
Dec 10 14:20:07 beo-15 init: Handling stopped event
Dec 10 14:20:07 beo-15 init: portmap main process (1033) killed by TERM signal
Dec 10 14:20:07 beo-15 init: portmap state changed from killed to post-stop
Dec 10 14:20:07 beo-15 init: portmap post-stop process (1998)
Dec 10 14:20:07 beo-15 init: portmap post-stop process (1998) exited normally
Dec 10 14:20:07 beo-15 init: portmap state changed from post-stop to waiting
Dec 10 14:20:07 beo-15 init: Handling stopped event

beo-15:~# start statd

Dec 10 14:20:11 beo-15 init: Connection from private client
Dec 10 14:20:11 beo-15 init: statd goal changed from stop to start
Dec 10 14:20:11 beo-15 init: statd state changed from waiting to starting
Dec 10 14:20:11 beo-15 init: Handling starting event
Dec 10 14:20:11 beo-15 init: statd state changed from starting to pre-start
Dec 10 14:20:11 beo-15 init: statd pre-start process (2001)
Dec 10 14:20:11 beo-15 init: Connection from private client
Dec 10 14:20:11 beo-15 init: portmap goal changed from stop to start
Dec 10 14:20:11 beo-15 init: portmap state changed from waiting to starting
Dec 10 14:20:11 beo-15 init: Handling starting event
Dec 10 14:20:11 beo-15 init: portmap state changed from starting to pre-start
Dec 10 14:20:11 beo-15 init: portmap state changed from pre-start to spawned
Dec 10 14:20:11 beo-15 init: portmap main process (2003)
Dec 10 14:20:11 beo-15 init: portmap main process (2003) executable changed
Dec 10 14:20:11 beo-15 init: portmap main process (2003) became new process (2004)
Dec 10 14:20:11 beo-15 init: portmap state changed from spawned to post-start
Dec 10 14:20:11 beo-15 portmap: Removing stale lockfile for pid 1033
Dec 10 14:20:11 beo-15 portmap[2004]: cannot bind tcp: Address already in use
Dec 10 14:20:11 beo-15 init: portmap post-start process (2005)
Dec 10 14:20:11 beo-15 init: portmap main process (2004) terminated with status 1
Dec 10 14:20:11 beo-15 init: portmap main process ended, respawning
Dec 10 14:20:11 beo-15 init: portmap goal changed from start to respawn
Dec 10 14:20:11 beo-15 init: portmap post-start process (2005) exited normally
Dec 10 14:20:11 beo-15 init: portmap goal changed from respawn to start
Dec 10 14:20:11 beo-15 init: portmap state changed from post-start to stopping
Dec 10 14:20:11 beo-15 init: Handling stopping event
Dec 10 14:20:11 beo-15 init: statd goal changed from start to stop
Dec 10 14:20:25 beo-15 init: Connection from private client

beo-15:~# status portmap
portmap start/stopping

beo-15:~# status statd
statd stop/pre-start, process 2001

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: portmap 6.0.0-1ubuntu2
ProcVersionSignature: Ubuntu 2.6.32-26.48-server 2.6.32.24+drm33.11
Uname: Linux 2.6.32-26-server x86_64
Architecture: amd64
Date: Fri Dec 10 14:23:40 2010
ProcEnviron:
 LC_CTYPE=de_DE
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: portmap

[Regression Potential]
This is more or less a specific use case where upgrading portmap will halt upstart during a service restart. Regression minimal as no other functionality was altered other than the re-use of SO_REUSEADDR.

Revision history for this message
Goswin von Brederlow (goswin-v-b) wrote :
Revision history for this message
Goswin von Brederlow (goswin-v-b) wrote :

This problem does not allways occur. Sometimes portmap starts up just fine right after being stoped.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in portmap (Ubuntu):
status: New → Confirmed
Revision history for this message
Steve Atwell (satwell) wrote :

This happens every time that portmap is stopped while something has a connection open to it. The problem can be reliably reproduced like this:

# nc localhost sunrpc &
[1] 14894
# stop portmap
[1] + done nc localhost sunrpc
portmap stop/waiting
# start portmap
start: Job failed to start

You can see from netstat that the server-side socket is sitting in TIME_WAIT, blocking portmap from starting:

# netstat -a | grep sunrpc
tcp 0 0 localhost:60269 localhost:sunrpc TIME_WAIT
tcp 0 0 localhost:sunrpc localhost:33281 TIME_WAIT

I would expect portmap to use SO_REUSEADDR to deal with this case, but apparently it only does this if the weird LOOPBACK_SETUNSET #define is enabled.

Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :

Debian package portmap 6.0.0-5 in experimental has patch 06-615642-reuse-socket.diff that makes portmap set SO_REUSEADDR. I do not understand what the LOOPBACK_SETUNSET #define actually do, but it basically removes it.

It's also worth noting that rpcbind suffers from bug #924593, which is quite similar and that upstream resolved by setting the SO_REUSEADDR option.

Chris J Arges (arges)
Changed in portmap (Ubuntu):
assignee: nobody → Chris J Arges (christopherarges)
status: Confirmed → Triaged
Revision history for this message
Steve Langasek (vorlon) wrote :

Note that portmap has been dropped from precise as obsolete, entirely superseded by rpcbind. We should probably target this bug at the releases you are considering SRUs for (lucid-oneiric?)

Revision history for this message
Chris J Arges (arges) wrote :

I agree, this should be an SRU. Currently lucid is being targeted, but this is fairly small fix and thus we could apply to lucid-oneiric.

Steve Langasek (vorlon)
Changed in portmap (Ubuntu):
status: Triaged → Won't Fix
Changed in portmap (Ubuntu Lucid):
status: New → Triaged
Changed in portmap (Ubuntu Maverick):
status: New → Triaged
Changed in portmap (Ubuntu Natty):
status: New → Triaged
Changed in portmap (Ubuntu Oneiric):
status: New → Triaged
Revision history for this message
Chris J Arges (arges) wrote :

This patch backports the patch 06-615641-reuse-socket.diff from debian experimental.

I have built a test package for verification on lucid:
http://people.canonical.com/~arges/lp688550/

Thanks

Revision history for this message
Steve Atwell (satwell) wrote :

Tested portmap 6.0.0-1ubuntu3~lp688550 on Lucid and verified that it can restart after being stopped with an active connection.

Chris J Arges (arges)
Changed in portmap (Ubuntu):
assignee: Chris J Arges (christopherarges) → Adam Stokes (adam-stokes)
Changed in portmap (Ubuntu Lucid):
assignee: nobody → Adam Stokes (adam-stokes)
Changed in portmap (Ubuntu Maverick):
assignee: nobody → Adam Stokes (adam-stokes)
Changed in portmap (Ubuntu Oneiric):
assignee: nobody → Adam Stokes (adam-stokes)
Changed in portmap (Ubuntu Natty):
assignee: nobody → Adam Stokes (adam-stokes)
Changed in portmap (Ubuntu Lucid):
milestone: none → lucid-updates
importance: Undecided → Medium
status: Triaged → In Progress
Changed in portmap (Ubuntu Lucid):
importance: Medium → High
Revision history for this message
Adam Stokes (adam-stokes) wrote :
description: updated
Revision history for this message
Adam Stokes (adam-stokes) wrote :

Steve,

Assuming no blockers are present with this please wait until it is released into -proposed and if you have extra cycles to re-test this that would be greatly appreciated.

Thanks
Adam

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Uploaded into lucid-proposed. Currently in the unapproved queue. You can check the status here:

https://launchpad.net/ubuntu/lucid/+queue?queue_state=1&queue_text=

Unsubscribing ubuntu-sponsors. Please subscribe again if you want to SRU this change into other releases.

Revision history for this message
Adam Conrad (adconrad) wrote :

portmap doesn't exist beyond lucid, unsetting all those tasks.

Changed in portmap (Ubuntu Maverick):
status: Triaged → Invalid
Changed in portmap (Ubuntu Natty):
status: Triaged → Invalid
Changed in portmap (Ubuntu Oneiric):
status: Triaged → Invalid
Changed in portmap (Ubuntu Lucid):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Adam Conrad (adconrad) wrote : Please test proposed package

Hello Goswin, or anyone else affected,

Accepted portmap into lucid-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/portmap/6.0.0-1ubuntu2.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Adam Stokes (adam-stokes) wrote :

Steve/Goswin,

Were either of you able to test that this package resolves your issue?

Thanks
Adam

Revision history for this message
Steve Atwell (satwell) wrote :

portmap 6.0.0-1ubuntu2.3 restarts correctly even with open connections. However, when upgrading to portmap 6.0.0-1ubuntu2.3 from 6.0.0-1ubuntu2.2, portmap fails to restart if there's an open connection. (Because the old version wasn't using SO_REUSEADDR, the new version can't take over the port.) I'm not sure this can be avoided though.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 688550] Re: portmap/statd can not be restarted

On Mon, Dec 10, 2012 at 08:34:05PM -0000, Steve Atwell wrote:
> portmap 6.0.0-1ubuntu2.3 restarts correctly even with open connections.
> However, when upgrading to portmap 6.0.0-1ubuntu2.3 from
> 6.0.0-1ubuntu2.2, portmap fails to restart if there's an open
> connection. (Because the old version wasn't using SO_REUSEADDR, the new
> version can't take over the port.) I'm not sure this can be avoided
> though.

It was discussed that the maintainer script should include a very long
timeout between stopping and restarting portmap on upgrade for this reason.
I don't know if this was implemented; I did argue that unless something like
this was implemented, we should not SRU for this issue alone, due to the
fact that the SRU itself triggers the problem.

Revision history for this message
Adam Stokes (adam-stokes) wrote :

I wasn't aware of the specifics wrt to the maintainer script before changing hands to me so I based my conclusion on

https://bugs.launchpad.net/ubuntu/+source/portmap/+bug/688550/comments/9

If this is not the case and we need to extend the timeout (I assume greater than 30 seconds?) We need to invalidate this SRU and I can implement the timeout portion of the upgrade process.

Revision history for this message
Steve Langasek (vorlon) wrote :

Adam,

On Mon, Dec 10, 2012 at 10:11:52PM -0000, Adam Stokes wrote:
> I wasn't aware of the specifics wrt to the maintainer script before
> changing hands to me so I based my conclusion on

> https://bugs.launchpad.net/ubuntu/+source/portmap/+bug/688550/comments/9

> If this is not the case and we need to extend the timeout (I assume
> greater than 30 seconds?)

The timeout needs to be 240 seconds. See
<https://bugs.launchpad.net/bugs/925717> for the full discussion. In
particular, comment #9 has my skeleton code for this.

> We need to invalidate this SRU and I can implement the timeout portion of
> the upgrade process.

Yep, marking v-failed.

Revision history for this message
Adam Stokes (adam-stokes) wrote :

Thanks will get this fixed up

Steve Langasek (vorlon)
tags: added: verification-failed
removed: verification-needed
Revision history for this message
Rolf Leggewie (r0lf) wrote :

lucid has seen the end of its life and is no longer receiving any updates. Marking the lucid task for this ticket as "Won't Fix".

Changed in portmap (Ubuntu Lucid):
status: Fix Committed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.