Bug #687535 “upstart loses track of ssh daemon after reload ssh” : Bugs : openssh package : Ubuntu

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2010-12-08:

#1

Calling this low priority since
kill -9 `pidof sshd`
start ssh

is a workaround.

Changed in openssh (Ubuntu):
status:	New → Confirmed
importance:	Undecided → Low

Revision history for this message

AlainKnaff (kubuntu-misc) wrote on 2010-12-08:

#2

So why bother having an upstart or init at all, and not start and stop everything "manually"?

Similar issue exists with squid (bug 573853)

Upstart is a nice concept, and really improved boot times. It would even be better if it was more reliable. These glitches break web administration tools which call service xxx start/stop/reload to do their thing.

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2010-12-09:

#3

Alain, I understand your frustration. This is actually I think a little more serious than Low, as existence of a workaround only barely mitigates the impact of this.

The problem, I think, is that we're using expect fork, and I'm not sure why, when sshd has -D, and its clearly not reliable for this instance, since the daemon regularly forks and exits.

Raising importance to High, as the reload command *must* be reliable for sshd, as failure of sshd could result in the system being totally unmanageable.

Changed in openssh (Ubuntu):
importance:	Low → High

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2010-12-09:

#4

Also this is not really a new bug in upstart. It is related to bug #530779 which causes issues when the parent dies, but its not the same issue, and in fact, the proc connector probably wouldn't help because the proc connector still doesn't solve the problem of knowing when this fork is actually a good one that needs to be followed as the main process versus a fork of a child that just needs to be reaped.

I'm not going to mark it Invalid yet though. I'd like to get some Upstart experts' opinions on whether or not it makes sense to solve this, and really, bug #530779, by simply saying if the process we're tracking forks, then quits, that we should always track the children as the main process.

Revision history for this message

AlainKnaff (kubuntu-misc) wrote on 2010-12-09:

#5

> Alain, I understand your frustration. think a little more serious than Low,

Thanks for your insight.

> as existence of a workaround only barely mitigates the impact of this.

Just a note about this workaround: kill -9 `pidof sshd` will "saw off the branch on which you're sitting" if you happen to manage the system remotely. A safer workaround would be kill -9 `cat /var/run/sshd.pid`

> The problem, I think, is that we're using expect fork

I think what's definitely needed is a way to make upstart track processes by pidfile.

> and I'm not sure why, when sshd has -D

Indeed, that seems to work (just tested it). So, why does ssh fork on reload in the case without -D, if it works without this fork with -D?

> I'm not going to mark it Invalid yet though.

Why do you think this bug should be marked invalid (/etc/init/ssh.conf is part of the openssh-server .deb)? Especially after basically agreeing with me... In general, in Ubuntu, triagers are far too quick to mark bugs as invalid, leading to situations where real problems still aren't fixed in the next release, or problems in LTS releases go unfixed for over a year.

> that we should always track the children as the main process.

This is tricky for processes such as ssh, as upstart might end up accidentally tracking one particular session, rather than the master. Just imagine if a user logs in at exactly the moment where the admin issues a reload ssh. Init would then not know which child to track (new master, or the session of the user who just logged in).
And in the case of squid, upstart would not even know that something is amiss as here the issue is about some intermediate process _not_ exiting when it should, so upstart would not know that something is amiss until after the first reload (... causing that first reload to be ignored...).

A safer alternative would be to optionally track by pidfile (which could be monitored by inotify, and re-read after a child exits). Of course, existence of pid listed in pidfile should be verified by sending it a -0 signal, so that upstart wouldn't get confused by stale pid files.

> Alain, I understand your frustration. think a little more serious than Low,

Thanks for your insight.

> as existence of a workaround only barely mitigates the impact of this.

Just a note about this workaround: kill -9 `pidof sshd` will "saw off the branch on which you're sitting" if you happen to manage the system remotely. A safer workaround would be kill -9 `cat /var/run/sshd.pid`

> The problem, I think, is that we're using expect fork

I think what's definitely needed is a way to make upstart track processes by pidfile.

> and I'm not sure why, when sshd has -D

Indeed, that seems to work (just tested it). So, why does ssh fork on reload in the case without -D, if it works without this fork with -D?

> I'm not going to mark it Invalid yet though.

Why do you think this bug should be marked invalid (/etc/init/ssh.conf is part of the openssh-server .deb)? Especially after basically agreeing with me... In general, in Ubuntu, triagers are far too quick to mark bugs as invalid, leading to situations where real problems still aren't fixed in the next release, or problems in LTS releases go unfixed for over a year.

> that we should always track the children as the main process.

This is tricky for processes such as ssh, as upstart might end up accidentally tracking one particular session, rather than the master. Just imagine if a user logs in at exactly the moment where the admin issues a reload ssh. Init would then not know which child to track (new master, or the session of the user who just logged in).
And in the case of squid, upstart would not even know that something is amiss as here the issue is about some intermediate process _not_ exiting when it should, so upstart would not know that something is amiss until after the first reload (... causing that first reload to be ignored...).

A safer alternative would be to optionally track by pidfile (which could be monitored by inotify, and re-read after a child exits). Of course, existence of pid listed in pidfile should be verified by sending it a -0 signal, so that upstart wouldn't get confused by stale pid files.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2010-12-09:

#6

Alain,

Clint just meant marking the upstart task invalid, not the sshd one.

We've discussed on irc, and we'll follow the -D path. Thanks again for raising this bug.

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2010-12-09:

#7

Alain, I meant "Invalid in Upstart", and I'm sorry for the confusion. There is no doubt in my mind that the ssh.conf needs some changes, and should be fixed going back to Lucid.

The notion of pidfiles is generally rejected by the upstart community because "daemons don't get this right". I think the emphasis here is that upstart should be able to properly track processes without trusting their behavior.

Brian Murray (brian-murray) on 2010-12-09

Changed in openssh (Ubuntu Lucid):
status:	New → Confirmed
Changed in openssh (Ubuntu Maverick):
status:	New → Confirmed
Changed in openssh (Ubuntu Lucid):
importance:	Undecided → High
Changed in openssh (Ubuntu Maverick):
importance:	Undecided → High

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2010-12-13:

#8

In looking at the code in OpenSSH, the -D flag only affects this portion of sshd.c, around line 1740:

    /*
     * If not in debugging mode, and not started from inetd, disconnect
     * from the controlling terminal, and fork. The original process
     * exits.
     */
    if (!(debug_flag || inetd_flag || no_daemon_flag)) {
#ifdef TIOCNOTTY
        int fd;
#endif /* TIOCNOTTY */
        if (daemon(0, 0) < 0)
            fatal("daemon() failed: %.200s", strerror(errno));

        /* Disconnect from the controlling tty. */
#ifdef TIOCNOTTY
        fd = open(_PATH_TTY, O_RDWR | O_NOCTTY);
        if (fd >= 0) {
            (void) ioctl(fd, TIOCNOTTY, NULL);
            close(fd);
        }
#endif /* TIOCNOTTY */
    }

"no_daemon_flag" isn't used anywhere else in the code.

Consequently, this code is why the daemon forks on reload, because it re-execs itself, causing this code to be hit. When run with -D, there's no call to daemon.

The call to daemon should only chdir to / and close stdin/stdout/stderr, redirecting them to /dev/null. Given that upstart will already have done that, this is a non issue. The code then ioctl's on stdin to get rid of the controlling TTY, which also has no effect when run via upstart since it is connected to /dev/null. I believe this would change the behavior if we were using console owner, and sshd would exit on control-C in console if one was using that, but we're not going to do that, nor would be a reason to do that for sshd.

I tested this on natty and maverick, killing the daemon in various ways, making sure that user sessions stay alive, and that reloads keep track of the daemon. I'm confident that this change is safe and will not cause any operational issues with openssh-server.

I've submitted a merge proposal adding -D and dropping expect fork, which would fix this bug.

Changed in openssh (Ubuntu):
assignee:	nobody → Clint Byrum (clint-fewbar)

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-12-13: Re: [Bug 687535] Re: upstart loses track of ssh daemon after reload ssh

#9

Download full text (4.4 KiB)

Not using "expect fork" of course would mean that ssh would be reported as
running before it was actually listening for connections.

On Mon, Dec 13, 2010 at 10:13 PM, Clint Byrum <email address hidden> wrote:

> In looking at the code in OpenSSH, the -D flag only affects this portion
> of sshd.c, around line 1740:
>
>
> /*
> * If not in debugging mode, and not started from inetd, disconnect
> * from the controlling terminal, and fork. The original process
> * exits.
> */
> if (!(debug_flag || inetd_flag || no_daemon_flag)) {
> #ifdef TIOCNOTTY
> int fd;
> #endif /* TIOCNOTTY */
> if (daemon(0, 0) < 0)
> fatal("daemon() failed: %.200s", strerror(errno));
>
> /* Disconnect from the controlling tty. */
> #ifdef TIOCNOTTY
> fd = open(_PATH_TTY, O_RDWR | O_NOCTTY);
> if (fd >= 0) {
> (void) ioctl(fd, TIOCNOTTY, NULL);
> close(fd);
> }
> #endif /* TIOCNOTTY */
> }
>
> "no_daemon_flag" isn't used anywhere else in the code.
>
> Consequently, this code is why the daemon forks on reload, because it
> re-execs itself, causing this code to be hit. When run with -D, there's
> no call to daemon.
>
> The call to daemon should only chdir to / and close stdin/stdout/stderr,
> redirecting them to /dev/null. Given that upstart will already have done
> that, this is a non issue. The code then ioctl's on stdin to get rid of
> the controlling TTY, which also has no effect when run via upstart since
> it is connected to /dev/null. I believe this would change the behavior
> if we were using console owner, and sshd would exit on control-C in
> console if one was using that, but we're not going to do that, nor would
> be a reason to do that for sshd.
>
> I tested this on natty and maverick, killing the daemon in various ways,
> making sure that user sessions stay alive, and that reloads keep track
> of the daemon. I'm confident that this change is safe and will not cause
> any operational issues with openssh-server.
>
> I've submitted a merge proposal adding -D and dropping expect fork,
> which would fix this bug.
>
> ** Changed in: openssh (Ubuntu)
> Assignee: (unassigned) => Clint Byrum (clint-fewbar)
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/687535
>
> Title:
> upstart loses track of ssh daemon after reload ssh
>
> Status in Upstart:
> New
> Status in “openssh” package in Ubuntu:
> Confirmed
> Status in “openssh” source package in Lucid:
> Confirmed
> Status in “openssh” source package in Maverick:
> Confirmed
>
> Bug description:
> When sshd gets a signal 1 for reload, it forks a new process and ditches
> the old. This causes upstart to believe that ssh has crashed, and loses
> track of it. A second reload (or any other initctl operation on ssh) will
> thus say:
>
> reload: Unknown instance:
>
> There would be 2 ways to fix this:
> 1. Don't have ssh fork on relod, but keep the same pid
> 2. Use a different mechanism in upstart to keep track of ssh. Maybe a pid
> file? Just tracking children of the exited ssh won't work, or it might
> accide...

Not using "expect fork" of course would mean that ssh would be reported as
running before it was actually listening for connections.

On Mon, Dec 13, 2010 at 10:13 PM, Clint Byrum <clint@fewbar.com> wrote:

> In looking at the code in OpenSSH, the -D flag only affects this portion
> of sshd.c, around line 1740:
>
>
>    /*
>     * If not in debugging mode, and not started from inetd, disconnect
>     * from the controlling terminal, and fork.  The original process
>     * exits.
>     */
>    if (!(debug_flag || inetd_flag || no_daemon_flag)) {
> #ifdef TIOCNOTTY
>        int fd;
> #endif /* TIOCNOTTY */
>        if (daemon(0, 0) < 0)
>            fatal("daemon() failed: %.200s", strerror(errno));
>
>        /* Disconnect from the controlling tty. */
> #ifdef TIOCNOTTY
>        fd = open(_PATH_TTY, O_RDWR | O_NOCTTY);
>        if (fd >= 0) {
>            (void) ioctl(fd, TIOCNOTTY, NULL);
>            close(fd);
>        }
> #endif /* TIOCNOTTY */
>    }
>
> "no_daemon_flag" isn't used anywhere else in the code.
>
> Consequently, this code is why the daemon forks on reload, because it
> re-execs itself, causing this code to be hit. When run with -D, there's
> no call to daemon.
>
> The call to daemon should only chdir to / and close stdin/stdout/stderr,
> redirecting them to /dev/null. Given that upstart will already have done
> that, this is a non issue. The code then ioctl's on stdin to get rid of
> the controlling TTY, which also has no effect when run via upstart since
> it is connected to /dev/null. I believe this would change the behavior
> if we were using console owner, and sshd would exit on control-C in
> console if one was using that, but we're not going to do that, nor would
> be a reason to do that for sshd.
>
> I tested this on natty and maverick, killing the daemon in various ways,
> making sure that user sessions stay alive, and that reloads keep track
> of the daemon. I'm confident that this change is safe and will not cause
> any operational issues with openssh-server.
>
> I've submitted a merge proposal adding -D and dropping expect fork,
> which would fix this bug.
>
> ** Changed in: openssh (Ubuntu)
>     Assignee: (unassigned) => Clint Byrum (clint-fewbar)
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/687535
>
> Title:
>  upstart loses track of ssh daemon after reload ssh
>
> Status in Upstart:
>  New
> Status in “openssh” package in Ubuntu:
>  Confirmed
> Status in “openssh” source package in Lucid:
>  Confirmed
> Status in “openssh” source package in Maverick:
>  Confirmed
>
> Bug description:
>  When sshd gets a signal 1 for reload, it forks a new process and ditches
> the old. This causes upstart to believe that ssh has crashed, and loses
> track of it. A second reload (or any other initctl operation on ssh) will
> thus say:
>
> reload: Unknown instance:
>
> There would be 2 ways to fix this:
> 1.  Don't have ssh fork on relod, but keep the same pid
> 2. Use a different mechanism in upstart to keep track of ssh. Maybe a pid
> file? Just tracking children of the exited ssh won't work, or it might
> accidentally track a particular session rather than the master, if somebody
> just happens to log in close to reload time.
>
>
> # lsb_release -rd
> Description:    Ubuntu 10.04.1 LTS
> Release:        10.04
>
> # dpkg -l openssh-server | cat
> Desired=Unknown/Install/Remove/Purge/Hold
> |
> Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> ||/ Name                                        Version
>                                Description
>
> +++-===========================================-==========================================================-=================================================================
> ii  openssh-server                              1:5.3p1-3ubuntu4
>                                 secure shell (SSH) server, for secure access
> from remote machines
>
> # dpkg -l upstart
> Desired=Unknown/Install/Remove/Purge/Hold
> |
> Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> ||/ Name           Version        Description
>
> +++-==============-==============-============================================
> ii  upstart        0.6.5-7        event-based init daemon
>
>
>

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2010-12-13:

#10

The code looks something like this:

[ option parsing, config check, etc ]
sshd.c, line 1744: daemon()
[ reinit logs, start random number generator, chdir to /, ignore SIGPIPE ]
sshd.c, line 1774: server_listen() -- socket(), bind(), listen(), etc.

So, there's already a race between considering it started (at the fork caused by daemon()), and sshd listening, right? Its just that now we're considering it started before the config check and option parsing, so that does give any dependent upstart jobs a bigger head start than they already had.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2010-12-14:

#11

Ah, then ssh already had that race.

In which case -D seems like a good choice

On Mon, Dec 13, 2010 at 11:05 PM, Clint Byrum <email address hidden> wrote:

> The code looks something like this:
>
> [ option parsing, config check, etc ]
> sshd.c, line 1744: daemon()
> [ reinit logs, start random number generator, chdir to /, ignore SIGPIPE ]
> sshd.c, line 1774: server_listen() -- socket(), bind(), listen(), etc.
>
> So, there's already a race between considering it started (at the fork
> caused by daemon()), and sshd listening, right? Its just that now we're
> considering it started before the config check and option parsing, so
> that does give any dependent upstart jobs a bigger head start than they
> already had.
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/687535
>
> Title:
> upstart loses track of ssh daemon after reload ssh
>
> Status in Upstart:
> New
> Status in “openssh” package in Ubuntu:
> Confirmed
> Status in “openssh” source package in Lucid:
> Confirmed
> Status in “openssh” source package in Maverick:
> Confirmed
>
> Bug description:
> When sshd gets a signal 1 for reload, it forks a new process and ditches
> the old. This causes upstart to believe that ssh has crashed, and loses
> track of it. A second reload (or any other initctl operation on ssh) will
> thus say:
>
> reload: Unknown instance:
>
> There would be 2 ways to fix this:
> 1. Don't have ssh fork on relod, but keep the same pid
> 2. Use a different mechanism in upstart to keep track of ssh. Maybe a pid
> file? Just tracking children of the exited ssh won't work, or it might
> accidentally track a particular session rather than the master, if somebody
> just happens to log in close to reload time.
>
>
> # lsb_release -rd
> Description: Ubuntu 10.04.1 LTS
> Release: 10.04
>
> # dpkg -l openssh-server | cat
> Desired=Unknown/Install/Remove/Purge/Hold
> |
> Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> ||/ Name Version
> Description
>
> +++-===========================================-==========================================================-=================================================================
> ii openssh-server 1:5.3p1-3ubuntu4
> secure shell (SSH) server, for secure access
> from remote machines
>
> # dpkg -l upstart
> Desired=Unknown/Install/Remove/Purge/Hold
> |
> Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> ||/ Name Version Description
>
> +++-==============-==============-============================================
> ii upstart 0.6.5-7 event-based init daemon
>
>
>

Ah, then ssh already had that race.

In which case -D seems like a good choice

On Mon, Dec 13, 2010 at 11:05 PM, Clint Byrum <clint@fewbar.com> wrote:

> The code looks something like this:
>
> [ option parsing, config check, etc ]
> sshd.c, line 1744: daemon()
> [ reinit logs, start random number generator, chdir to /, ignore SIGPIPE ]
> sshd.c, line 1774: server_listen() -- socket(), bind(), listen(), etc.
>
> So, there's already a race between considering it started (at the fork
> caused by daemon()), and sshd listening, right? Its just that now we're
> considering it started before the config check and option parsing, so
> that does give any dependent upstart jobs a bigger head start than they
> already had.
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/687535
>
> Title:
>  upstart loses track of ssh daemon after reload ssh
>
> Status in Upstart:
>  New
> Status in “openssh” package in Ubuntu:
>  Confirmed
> Status in “openssh” source package in Lucid:
>  Confirmed
> Status in “openssh” source package in Maverick:
>  Confirmed
>
> Bug description:
>  When sshd gets a signal 1 for reload, it forks a new process and ditches
> the old. This causes upstart to believe that ssh has crashed, and loses
> track of it. A second reload (or any other initctl operation on ssh) will
> thus say:
>
> reload: Unknown instance:
>
> There would be 2 ways to fix this:
> 1.  Don't have ssh fork on relod, but keep the same pid
> 2. Use a different mechanism in upstart to keep track of ssh. Maybe a pid
> file? Just tracking children of the exited ssh won't work, or it might
> accidentally track a particular session rather than the master, if somebody
> just happens to log in close to reload time.
>
>
> # lsb_release -rd
> Description:    Ubuntu 10.04.1 LTS
> Release:        10.04
>
> # dpkg -l openssh-server | cat
> Desired=Unknown/Install/Remove/Purge/Hold
> |
> Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> ||/ Name                                        Version
>                                Description
>
> +++-===========================================-==========================================================-=================================================================
> ii  openssh-server                              1:5.3p1-3ubuntu4
>                                 secure shell (SSH) server, for secure access
> from remote machines
>
> # dpkg -l upstart
> Desired=Unknown/Install/Remove/Purge/Hold
> |
> Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> ||/ Name           Version        Description
>
> +++-==============-==============-============================================
> ii  upstart        0.6.5-7        event-based init daemon
>
>
>

Revision history for this message

Launchpad Janitor (janitor) wrote on 2010-12-14:

#12

This bug was fixed in the package openssh - 1:5.6p1-2ubuntu3

---------------
openssh (1:5.6p1-2ubuntu3) natty; urgency=low

  [ Clint Byrum ]
  * debian/openssh-server.ssh.upstart: drop 'expect fork' and run sshd
    with -D to avoid losing track on reload (LP: #687535)

  [ Colin Watson ]
  * Fix up mention-ssh-keygen-on-keychange.patch to match what was actually
    applied to the source tree.
-- Colin Watson <email address hidden> Tue, 14 Dec 2010 11:52:43 +0000

Changed in openssh (Ubuntu):
status:	Confirmed → Fix Released

Revision history for this message

Imre Gergely (cemc) wrote on 2010-12-28:

#13

I'm guessing this is waiting for a SRU. Here's the patch for Lucid and Maverick. Description updated for SRU. Please somebody add some notes to 'REGRESSION POTENTIAL' because I have no idea about that...

description:

updated

Revision history for this message

Imre Gergely (cemc) wrote on 2010-12-28:

#14

Debdiff for Lucid Edit (1.1 KiB, text/plain)

Revision history for this message

Imre Gergely (cemc) wrote on 2010-12-28:

#15

Debdiff for Maverick Edit (1.1 KiB, text/plain)

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2010-12-28:

#16

Thanks Imre!

I've updated the 'regression-potential' and subscribed ubuntu-sru and ubuntu-sponsors.

tags:	added: patch
description:	updated

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2010-12-28:

#17

Imre, the debdiffs need different changelog lines for SRU's

openssh (1:5.3p1-3ubuntu4.1) lucid-proposed; urgency=low

and

openssh (1:5.5p1-4ubuntu4.1) maverick-proposed; urgency=low

Otherwise they look good to me.

Revision history for this message

Colin Watson (cjwatson) wrote on 2011-01-07:

#18

Thanks, Imre! I've sponsored these two patches; they're waiting in the SRU queue for approval now.

Changed in openssh (Ubuntu Lucid):
status:	Confirmed → In Progress
Changed in openssh (Ubuntu Maverick):
status:	Confirmed → In Progress

Revision history for this message

Benjamin Drung (bdrung) wrote on 2011-01-08:

#19

Colin, you forgot to unsubscribe ubuntu-sponsors.

Revision history for this message

Martin Pitt (pitti) wrote on 2011-01-09: Please test proposed package

#20

Accepted openssh into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in openssh (Ubuntu Maverick):
status:	In Progress → Fix Committed
tags:	added: verification-needed

Revision history for this message

Martin Pitt (pitti) wrote on 2011-01-09:

#21

Accepted openssh into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in openssh (Ubuntu Lucid):
status:	In Progress → Fix Committed

Revision history for this message

Simon Déziel (sdeziel) wrote on 2011-01-09:

#22

I confirm that this bug is fixed on Lucid.

$ lsb_release -rd
Description: Ubuntu 10.04.1 LTS
Release: 10.04

$ apt-cache policy openssh-server
openssh-server:
  Installed: 1:5.3p1-3ubuntu5
  Candidate: 1:5.3p1-3ubuntu5
  Version table:
*** 1:5.3p1-3ubuntu5 0
        500 http://archive.ubuntu.com/ubuntu/ lucid-proposed/main Packages
        100 /var/lib/dpkg/status
     1:5.3p1-3ubuntu4 0
        500 http://ca.archive.ubuntu.com/ubuntu/ lucid-updates/main Packages
     1:5.3p1-3ubuntu3 0
        500 http://ca.archive.ubuntu.com/ubuntu/ lucid/main Packages

Martin Pitt (pitti) on 2011-01-10

tags:

added: verification-done

Revision history for this message

Colin Watson (cjwatson) wrote on 2011-01-23:

#23

Benjamin, no, I didn't - I sponsored it myself and it wasn't necessary to add it to the general sponsoring queue.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-01-23:

#24

This bug was fixed in the package openssh - 1:5.3p1-3ubuntu5

---------------
openssh (1:5.3p1-3ubuntu5) lucid-proposed; urgency=low

* debian/openssh-server.ssh.upstart: drop 'expect fork' and run sshd
with -D to avoid losing track on reload (LP: #687535)
-- Imre Gergely <email address hidden> Fri, 07 Jan 2011 10:19:13 +0000

Changed in openssh (Ubuntu Lucid):
status:	Fix Committed → Fix Released

Revision history for this message

Colin Watson (cjwatson) wrote on 2011-01-23:

#25

Released for lucid (thanks, Simon!); still needs verification for maverick.

tags:

removed: verification-done

Revision history for this message

Imre Gergely (cemc) wrote on 2011-01-25:

#26

Confirmed fixed on Maverick.

root@utest-mms32:~# lsb_release -rd
Description: Ubuntu 10.10
Release: 10.10

root@utest-mms32:~# apt-cache policy openssh-server
openssh-server:
  Installed: 1:5.5p1-4ubuntu5
  Candidate: 1:5.5p1-4ubuntu5
  Version table:
*** 1:5.5p1-4ubuntu5 0
        500 http://ro.archive.ubuntu.com/ubuntu/ maverick-proposed/main i386 Packages
        100 /var/lib/dpkg/status
     1:5.5p1-4ubuntu4 0
        500 http://ro.archive.ubuntu.com/ubuntu/ maverick/main i386 Packages

Martin Pitt (pitti) on 2011-01-25

tags:

added: verification-done
removed: verification-needed

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-01-26:

#27

This bug was fixed in the package openssh - 1:5.5p1-4ubuntu5

---------------
openssh (1:5.5p1-4ubuntu5) maverick-proposed; urgency=low

* debian/openssh-server.ssh.upstart: drop 'expect fork' and run sshd
with -D to avoid losing track on reload (LP: #687535)
-- Imre Gergely <email address hidden> Fri, 07 Jan 2011 10:23:21 +0000

Changed in openssh (Ubuntu Maverick):
status:	Fix Committed → Fix Released

Revision history for this message

Jan de Haan (jdehaan) wrote on 2011-02-02:

#28

Download full text (4.9 KiB)

I'm sorry to spoil the fun but I see the same problem on Lucid with the new package.

Only thing that might be of interest is that this is a xen 4 guest.

More info and tests available on request.

root@somehost:/var/log >(zcat daemon.log.3.gz ; zcat daemon.log.2.gz ; cat daemon.log.1 ; cat daemon.log)|grep 'somehost init: ssh'|more
Jan 11 09:24:36 somehost init: ssh main process (3466) terminated with status 255
Jan 11 09:30:12 somehost init: ssh main process (3799) terminated with status 255
Jan 11 11:07:20 somehost init: ssh main process (3971) terminated with status 255
Jan 11 20:44:30 somehost init: ssh main process (369) terminated with status 255
Jan 11 21:54:04 somehost init: ssh main process (423) terminated with status 255
Jan 11 22:08:05 somehost init: ssh main process (1573) terminated with status 255
Jan 15 22:23:03 somehost init: ssh main process (1859) terminated with status 255
Jan 16 18:05:26 somehost init: ssh main process (400) terminated with status 255
Jan 23 20:04:04 somehost init: ssh main process (458) terminated with status 255
Jan 26 14:34:56 somehost init: ssh main process (4856) terminated with status 255
Jan 26 14:35:59 somehost init: ssh main process (21968) terminated with status 255
Jan 30 01:36:19 somehost init: ssh main process (396) terminated with status 255
Jan 30 13:48:46 somehost init: ssh main process (502) terminated with status 255
Jan 31 15:44:56 somehost init: ssh main process (419) terminated with status 255
Feb 2 12:01:33 somehost init: ssh main process (379) terminated with status 255
Feb 2 12:01:33 somehost init: ssh main process ended, respawning
Feb 2 12:01:33 somehost init: ssh main process (389) terminated with status 255
Feb 2 12:01:33 somehost init: ssh main process ended, respawning
Feb 2 12:01:33 somehost init: ssh main process (393) terminated with status 255
Feb 2 12:01:33 somehost init: ssh main process ended, respawning
Feb 2 12:01:33 somehost init: ssh main process (406) terminated with status 255
Feb 2 12:01:33 somehost init: ssh main process ended, respawning
Feb 2 12:01:33 somehost init: ssh main process (417) terminated with status 255
Feb 2 12:01:33 somehost init: ssh main process ended, respawning
Feb 2 12:01:33 somehost init: ssh main process (421) terminated with status 255
Feb 2 12:01:33 somehost init: ssh main process ended, respawning
Feb 2 12:01:33 somehost init: ssh main process (424) terminated with status 255
Feb 2 12:01:33 somehost init: ssh main process ended, respawning
Feb 2 12:01:33 somehost init: ssh main process (427) terminated with status 255
Feb 2 12:01:33 somehost init: ssh main process ended, respawning
Feb 2 12:01:33 somehost init: ssh main process (430) terminated with status 255
Feb 2 12:01:33 somehost init: ssh main process ended, respawning
Feb 2 12:01:33 somehost init: ssh main process (433) terminated with status 255
Feb 2 12:01:33 somehost init: ssh main process ended, respawning
Feb 2 12:01:33 somehost init: ssh main process (436) terminated with status 255
Feb 2 12:01:33 somehost init: ssh respawning too fast, stopped
Feb 2 16:39:02 somehost init: ssh main process (357) terminated with status 255
Feb 2 16:39:02 somehost i...

I'm sorry to spoil the fun but I see the same problem on Lucid with the new package.

Only thing that might  be of interest is that this is a xen 4 guest.

More info and tests available on request.

root@somehost:/var/log >(zcat daemon.log.3.gz ; zcat daemon.log.2.gz ; cat daemon.log.1 ; cat daemon.log)|grep 'somehost init: ssh'|more
Jan 11 09:24:36 somehost init: ssh main process (3466) terminated with status 255
Jan 11 09:30:12 somehost init: ssh main process (3799) terminated with status 255
Jan 11 11:07:20 somehost init: ssh main process (3971) terminated with status 255
Jan 11 20:44:30 somehost init: ssh main process (369) terminated with status 255
Jan 11 21:54:04 somehost init: ssh main process (423) terminated with status 255
Jan 11 22:08:05 somehost init: ssh main process (1573) terminated with status 255
Jan 15 22:23:03 somehost init: ssh main process (1859) terminated with status 255
Jan 16 18:05:26 somehost init: ssh main process (400) terminated with status 255
Jan 23 20:04:04 somehost init: ssh main process (458) terminated with status 255
Jan 26 14:34:56 somehost init: ssh main process (4856) terminated with status 255
Jan 26 14:35:59 somehost init: ssh main process (21968) terminated with status 255
Jan 30 01:36:19 somehost init: ssh main process (396) terminated with status 255
Jan 30 13:48:46 somehost init: ssh main process (502) terminated with status 255
Jan 31 15:44:56 somehost init: ssh main process (419) terminated with status 255
Feb  2 12:01:33 somehost init: ssh main process (379) terminated with status 255
Feb  2 12:01:33 somehost init: ssh main process ended, respawning
Feb  2 12:01:33 somehost init: ssh main process (389) terminated with status 255
Feb  2 12:01:33 somehost init: ssh main process ended, respawning
Feb  2 12:01:33 somehost init: ssh main process (393) terminated with status 255
Feb  2 12:01:33 somehost init: ssh main process ended, respawning
Feb  2 12:01:33 somehost init: ssh main process (406) terminated with status 255
Feb  2 12:01:33 somehost init: ssh main process ended, respawning
Feb  2 12:01:33 somehost init: ssh main process (417) terminated with status 255
Feb  2 12:01:33 somehost init: ssh main process ended, respawning
Feb  2 12:01:33 somehost init: ssh main process (421) terminated with status 255
Feb  2 12:01:33 somehost init: ssh main process ended, respawning
Feb  2 12:01:33 somehost init: ssh main process (424) terminated with status 255
Feb  2 12:01:33 somehost init: ssh main process ended, respawning
Feb  2 12:01:33 somehost init: ssh main process (427) terminated with status 255
Feb  2 12:01:33 somehost init: ssh main process ended, respawning
Feb  2 12:01:33 somehost init: ssh main process (430) terminated with status 255
Feb  2 12:01:33 somehost init: ssh main process ended, respawning
Feb  2 12:01:33 somehost init: ssh main process (433) terminated with status 255
Feb  2 12:01:33 somehost init: ssh main process ended, respawning
Feb  2 12:01:33 somehost init: ssh main process (436) terminated with status 255
Feb  2 12:01:33 somehost init: ssh respawning too fast, stopped
Feb  2 16:39:02 somehost init: ssh main process (357) terminated with status 255
Feb  2 16:39:02 somehost init: ssh main process ended, respawning
Feb  2 16:39:02 somehost init: ssh main process (373) terminated with status 255
Feb  2 16:39:02 somehost init: ssh main process ended, respawning
Feb  2 18:27:19 somehost init: ssh main process (368) terminated with status 255
Feb  2 18:27:19 somehost init: ssh main process (414) terminated with status 255
root@somehost:/var/log >apt-cache policy openssh-server
openssh-server:
  Installed: 1:5.3p1-3ubuntu5
  Candidate: 1:5.3p1-3ubuntu5
  Version table:
 *** 1:5.3p1-3ubuntu5 0
        500 http://nl.archive.ubuntu.com/ubuntu/ lucid-updates/main Packages
        100 /var/lib/dpkg/status
     1:5.3p1-3ubuntu3 0
        500 http://nl.archive.ubuntu.com/ubuntu/ lucid/main Packages
root@somehost:/var/log >
root@somehost:/var/log >(last ; last -f wtmp.1)|grep '^reboot'
reboot   system boot  2.6.32-28-server Wed Feb  2 18:27 - 20:20  (01:53)    
reboot   system boot  2.6.32-28-server Wed Feb  2 16:39 - 18:26  (01:47)    
reboot   system boot  2.6.32-28-server Wed Feb  2 12:01 - 16:38  (04:36)    
reboot   system boot  2.6.32-28-server Mon Jan 31 14:42 - 20:20 (2+05:37)   
reboot   system boot  2.6.32-28-server Sun Jan 30 01:36 - 20:20 (3+18:44)   
reboot   system boot  2.6.32-28-server Thu Jan 27 14:13 - 00:15 (2+10:01)   
reboot   system boot  2.6.32-27-server Sun Jan 16 18:05 - 14:12 (10+20:07)  
reboot   system boot  2.6.32-27-server Sun Jan 16 16:57 - 18:04  (01:07)    
reboot   system boot  2.6.32-27-server Sun Jan 16 15:29 - 16:56  (01:27)    
reboot   system boot  2.6.32-27-server Sun Jan 16 09:46 - 15:29  (05:42)    
reboot   system boot  2.6.32-27-server Tue Jan 11 21:23 - 09:46 (4+12:22)   
root@somehost:/var/log >ls -altc /var/cache/apt/archives
.
.
.
-rw-r--r-- 1 root root   304198 2011-01-23 20:03 openssh-server_1%3a5.3p1-3ubuntu5_amd64.deb
.
.
.

Revision history for this message

Jan de Haan (jdehaan) wrote on 2011-02-02:

#29

Download full text (3.3 KiB)

Hrmmm....

Feb 2 12:01:33 somehost sshd[379]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[379]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[379]: fatal: Cannot bind any address.
Feb 2 12:01:33 somehost sshd[389]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[389]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[389]: fatal: Cannot bind any address.
Feb 2 12:01:33 somehost sshd[393]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[393]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[393]: fatal: Cannot bind any address.
Feb 2 12:01:33 somehost sshd[406]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[406]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[406]: fatal: Cannot bind any address.
Feb 2 12:01:33 somehost sshd[417]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[417]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[417]: fatal: Cannot bind any address.
Feb 2 12:01:33 somehost sshd[421]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[421]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[421]: fatal: Cannot bind any address.
Feb 2 12:01:33 somehost sshd[424]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[424]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[424]: fatal: Cannot bind any address.
Feb 2 12:01:33 somehost sshd[427]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[427]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[427]: fatal: Cannot bind any address.
Feb 2 12:01:33 somehost sshd[430]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[430]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[430]: fatal: Cannot bind any address.
Feb 2 12:01:33 somehost sshd[433]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[433]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[433]: fatal: Cannot bind any address.
Feb 2 12:01:33 somehost sshd[436]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb 2 12:01:33 somehost sshd[436]: error: Bind to port 22...

Hrmmm....

Feb  2 12:01:33 somehost sshd[379]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[379]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[379]: fatal: Cannot bind any address.
Feb  2 12:01:33 somehost sshd[389]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[389]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[389]: fatal: Cannot bind any address.
Feb  2 12:01:33 somehost sshd[393]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[393]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[393]: fatal: Cannot bind any address.
Feb  2 12:01:33 somehost sshd[406]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[406]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[406]: fatal: Cannot bind any address.
Feb  2 12:01:33 somehost sshd[417]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[417]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[417]: fatal: Cannot bind any address.
Feb  2 12:01:33 somehost sshd[421]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[421]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[421]: fatal: Cannot bind any address.
Feb  2 12:01:33 somehost sshd[424]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[424]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[424]: fatal: Cannot bind any address.
Feb  2 12:01:33 somehost sshd[427]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[427]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[427]: fatal: Cannot bind any address.
Feb  2 12:01:33 somehost sshd[430]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[430]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[430]: fatal: Cannot bind any address.
Feb  2 12:01:33 somehost sshd[433]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[433]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[433]: fatal: Cannot bind any address.
Feb  2 12:01:33 somehost sshd[436]: error: Bind to port 22 on 1.2.3.4 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[436]: error: Bind to port 22 on 5555:666:77:8888::1 failed: Cannot assign requested address.
Feb  2 12:01:33 somehost sshd[436]: fatal: Cannot bind any address.

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2011-02-02:

#30

Jan, do you have a pidfile for sshd in /var/run ?

If so, you may have hit bug #531912 by running the init.d script.

Try

/etc/init.d/ssh stop

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2011-02-08:

#31

I'm closing the upstart bug task as Invalid.

Upstart is doing what it was designed to do, following fork()'d pids. This method just can't handle things that re-fork arbitrarily.

Changed in upstart:
status:	New → Invalid

Revision history for this message

Steve Langasek (vorlon) wrote on 2011-02-08:

#32

I'm not convinced this is true, Clint. Should upstart jobs be allowed to spawn processes that then get orphaned from the POV of the process supervisor? Maybe upstart shouldn't be tracking the new PID, but maybe it should instead be reaping any children left behind?

Revision history for this message

Scott James Remnant (scott) wrote on 2011-02-08: Re: [Bug 687535] Re: upstart loses track of ssh daemon after reload ssh

#33

Download full text (4.0 KiB)

They definitely should be allowed, ssh is actually a canon example of why.

Upstart should supervise the sshd daemon, not the login sub-process
associated with a particular connection, and certainly not any
processes being run inside the login session.

Otherwise "stop ssh" would kill all user logins; and ssh would not
respawn on crash if there was still a used logged in (or running
screen!)

Scott

On Mon, Feb 7, 2011 at 5:51 PM, Steve Langasek
<email address hidden> wrote:
> I'm not convinced this is true, Clint. Should upstart jobs be allowed
> to spawn processes that then get orphaned from the POV of the process
> supervisor? Maybe upstart shouldn't be tracking the new PID, but maybe
> it should instead be reaping any children left behind?
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/687535
>
> Title:
> upstart loses track of ssh daemon after reload ssh
>
> Status in Upstart:
> Invalid
> Status in “openssh” package in Ubuntu:
> Fix Released
> Status in “openssh” source package in Lucid:
> Fix Released
> Status in “openssh” source package in Maverick:
> Fix Released
>
> Bug description:
> When sshd gets a signal 1 for reload, it forks a new process and
> ditches the old. This causes upstart to believe that ssh has crashed,
> and loses track of it. A second reload (or any other initctl operation
> on ssh) will thus say:
>
> reload: Unknown instance:
>
> There would be 2 ways to fix this:
> 1. Don't have ssh fork on relod, but keep the same pid
> 2. Use a different mechanism in upstart to keep track of ssh. Maybe a pid file? Just tracking children of the exited ssh won't work, or it might accidentally track a particular session rather than the master, if somebody just happens to log in close to reload time.
>
> openssh-server 1:5.3p1-3ubuntu4
> upstart 0.6.5-7
>
> ==== Info for Maverick, Lucid SRU ====
> IMPACT: if sshd gets a HUP signal, it forks a new process and upstart thinks the process died and loses track of it, so the user/admin uses the ability to stop/start/reload the daemon through upstart.
> The problem is fixed in Natty 5.6p1-2ubuntu3. See attached patches for Maverick and Lucid.
>
> TEST CASE:
>
> - install openssh-server
> - send a HUP signal to sshd
> - the daemon is restarted, but upstart thinks that it crashed (/var/log/daemon.log):
>
> Dec 28 20:59:57 utest-lls32 init: ssh main process ended, respawning
> Dec 28 20:59:57 utest-lls32 init: ssh main process (1451) terminated with status 255
> Dec 28 20:59:57 utest-lls32 init: ssh main process ended, respawning
> Dec 28 20:59:57 utest-lls32 init: ssh main process (1455) terminated with status 255
> Dec 28 20:59:57 utest-lls32 init: ssh respawning too fast, stopped
>
> - after this, upstart won't know about sshd, despite the daemon
> running just fine:
>
> root@utest-lls32:~# reload ssh
> reload: Unknown instance:
>
> With the fix applied, the correct behavior is:
>
> - send a HUP signal to sshd
> ps ax |grep sshd
> kill -HUP sshd
> - the daemon reloads (/var/log/auth.log):
>
> Dec 28 21:37:01 utest-lls...

They definitely should be allowed, ssh is actually a canon example of why.

Upstart should supervise the sshd daemon, not the login sub-process
associated with a particular connection, and certainly not any
processes being run inside the login session.

Otherwise "stop ssh" would kill all user logins; and ssh would not
respawn on crash if there was still a used logged in (or running
screen!)

Scott

On Mon, Feb 7, 2011 at 5:51 PM, Steve Langasek
<steve.langasek@canonical.com> wrote:
> I'm not convinced this is true, Clint.  Should upstart jobs be allowed
> to spawn processes that then get orphaned from the POV of the process
> supervisor?  Maybe upstart shouldn't be tracking the new PID, but maybe
> it should instead be reaping any children left behind?
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/687535
>
> Title:
>  upstart loses track of ssh daemon after reload ssh
>
> Status in Upstart:
>  Invalid
> Status in “openssh” package in Ubuntu:
>  Fix Released
> Status in “openssh” source package in Lucid:
>  Fix Released
> Status in “openssh” source package in Maverick:
>  Fix Released
>
> Bug description:
>  When sshd gets a signal 1 for reload, it forks a new process and
>  ditches the old. This causes upstart to believe that ssh has crashed,
>  and loses track of it. A second reload (or any other initctl operation
>  on ssh) will thus say:
>
>  reload: Unknown instance:
>
>  There would be 2 ways to fix this:
>  1.  Don't have ssh fork on relod, but keep the same pid
>  2. Use a different mechanism in upstart to keep track of ssh. Maybe a pid file? Just tracking children of the exited ssh won't work, or it might accidentally track a particular session rather than the master, if somebody just happens to log in close to reload time.
>
>  openssh-server  1:5.3p1-3ubuntu4
>  upstart         0.6.5-7
>
>  ==== Info for Maverick, Lucid SRU ====
>  IMPACT: if sshd gets a HUP signal, it forks a new process and upstart thinks the process died and loses track of it, so the user/admin uses the ability to stop/start/reload the daemon through upstart.
>  The problem is fixed in Natty 5.6p1-2ubuntu3. See attached patches for Maverick and Lucid.
>
>  TEST CASE:
>
>  - install openssh-server
>  - send a HUP signal to sshd
>  - the daemon is restarted, but upstart thinks that it crashed (/var/log/daemon.log):
>
>  Dec 28 20:59:57 utest-lls32 init: ssh main process ended, respawning
>  Dec 28 20:59:57 utest-lls32 init: ssh main process (1451) terminated with status 255
>  Dec 28 20:59:57 utest-lls32 init: ssh main process ended, respawning
>  Dec 28 20:59:57 utest-lls32 init: ssh main process (1455) terminated with status 255
>  Dec 28 20:59:57 utest-lls32 init: ssh respawning too fast, stopped
>
>  - after this, upstart won't know about sshd, despite the daemon
>  running just fine:
>
>  root@utest-lls32:~# reload ssh
>  reload: Unknown instance:
>
>  With the fix applied, the correct behavior is:
>
>  - send a HUP signal to sshd
>    ps ax |grep sshd
>    kill -HUP sshd
>  - the daemon reloads (/var/log/auth.log):
>
>  Dec 28 21:37:01 utest-lls32 sshd[742]: Received SIGHUP; restarting.
>  Dec 28 21:37:01 utest-lls32 sshd[742]: Server listening on 0.0.0.0 port 22.
>  Dec 28 21:37:01 utest-lls32 sshd[742]: Server listening on :: port 22.
>
>  - reloading with upstart gives the same result, and NOT an error
>  message.
>
>  REGRESSION POTENTIAL:
>
>  There is a small race condition in sshd between when it forks, and
>  when it listens for incoming connections. The length of the race is
>  lengthened by a very tiny amount by considering sshd started as soon
>  as it has been executed, rather than when it forks. This will only
>  affect jobs that use 'start on started ssh' and immediately connect to
>  it. This is unlikely to cause problems in any real world scenario,
>  given that most of these programs would also have to fork, exec, and
>  open a socket, which is more work than what sshd will be doing in that
>  time.
>
>
>

Revision history for this message

Steve Langasek (vorlon) wrote on 2011-02-08:

#34

On Tue, Feb 08, 2011 at 04:14:59AM -0000, Scott James Remnant wrote:
> They definitely should be allowed, ssh is actually a canon example of
> why.

> Upstart should supervise the sshd daemon, not the login sub-process
> associated with a particular connection, and certainly not any
> processes being run inside the login session.

> Otherwise "stop ssh" would kill all user logins; and ssh would not
> respawn on crash if there was still a used logged in (or running
> screen!)

Good point, comment withdrawn. :)

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message

Scott James Remnant (scott) wrote on 2011-02-08:

#35

Download full text (4.4 KiB)

That being said, the current process tracker is very much not ideal.

I'm working on a new one, and will probably post in this week in the
form of a test program to be run and played with.

Scott

On Mon, Feb 7, 2011 at 8:48 PM, Steve Langasek
<email address hidden> wrote:
> On Tue, Feb 08, 2011 at 04:14:59AM -0000, Scott James Remnant wrote:
>> They definitely should be allowed, ssh is actually a canon example of
>> why.
>
>> Upstart should supervise the sshd daemon, not the login sub-process
>> associated with a particular connection, and certainly not any
>> processes being run inside the login session.
>
>> Otherwise "stop ssh" would kill all user logins; and ssh would not
>> respawn on crash if there was still a used logged in (or running
>> screen!)
>
> Good point, comment withdrawn. :)
>
> --
> Steve Langasek Give me a lever long enough and a Free OS
> Debian Developer to set it on, and I can move the world.
> Ubuntu Developer http://www.debian.org/
> <email address hidden> <email address hidden>
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/687535
>
> Title:
> upstart loses track of ssh daemon after reload ssh
>
> Status in Upstart:
> Invalid
> Status in “openssh” package in Ubuntu:
> Fix Released
> Status in “openssh” source package in Lucid:
> Fix Released
> Status in “openssh” source package in Maverick:
> Fix Released
>
> Bug description:
> When sshd gets a signal 1 for reload, it forks a new process and
> ditches the old. This causes upstart to believe that ssh has crashed,
> and loses track of it. A second reload (or any other initctl operation
> on ssh) will thus say:
>
> reload: Unknown instance:
>
> There would be 2 ways to fix this:
> 1. Don't have ssh fork on relod, but keep the same pid
> 2. Use a different mechanism in upstart to keep track of ssh. Maybe a pid file? Just tracking children of the exited ssh won't work, or it might accidentally track a particular session rather than the master, if somebody just happens to log in close to reload time.
>
> openssh-server 1:5.3p1-3ubuntu4
> upstart 0.6.5-7
>
> ==== Info for Maverick, Lucid SRU ====
> IMPACT: if sshd gets a HUP signal, it forks a new process and upstart thinks the process died and loses track of it, so the user/admin uses the ability to stop/start/reload the daemon through upstart.
> The problem is fixed in Natty 5.6p1-2ubuntu3. See attached patches for Maverick and Lucid.
>
> TEST CASE:
>
> - install openssh-server
> - send a HUP signal to sshd
> - the daemon is restarted, but upstart thinks that it crashed (/var/log/daemon.log):
>
> Dec 28 20:59:57 utest-lls32 init: ssh main process ended, respawning
> Dec 28 20:59:57 utest-lls32 init: ssh main process (1451) terminated with status 255
> Dec 28 20:59:57 utest-lls32 init: ssh main process ended, respawning
> Dec 28 20:59:57 utest-lls32 init: ssh main process (1455) terminated with status 255
> Dec 28 20:59:57 utest-lls32 init: ssh respawning too fast, sto...

That being said, the current process tracker is very much not ideal.

I'm working on a new one, and will probably post in this week in the
form of a test program to be run and played with.

Scott

On Mon, Feb 7, 2011 at 8:48 PM, Steve Langasek
<steve.langasek@canonical.com> wrote:
> On Tue, Feb 08, 2011 at 04:14:59AM -0000, Scott James Remnant wrote:
>> They definitely should be allowed, ssh is actually a canon example of
>> why.
>
>> Upstart should supervise the sshd daemon, not the login sub-process
>> associated with a particular connection, and certainly not any
>> processes being run inside the login session.
>
>> Otherwise "stop ssh" would kill all user logins; and ssh would not
>> respawn on crash if there was still a used logged in (or running
>> screen!)
>
> Good point, comment withdrawn. :)
>
> --
> Steve Langasek                   Give me a lever long enough and a Free OS
> Debian Developer                   to set it on, and I can move the world.
> Ubuntu Developer                                    http://www.debian.org/
> slangasek@ubuntu.com                                     vorlon@debian.org
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/687535
>
> Title:
>  upstart loses track of ssh daemon after reload ssh
>
> Status in Upstart:
>  Invalid
> Status in “openssh” package in Ubuntu:
>  Fix Released
> Status in “openssh” source package in Lucid:
>  Fix Released
> Status in “openssh” source package in Maverick:
>  Fix Released
>
> Bug description:
>  When sshd gets a signal 1 for reload, it forks a new process and
>  ditches the old. This causes upstart to believe that ssh has crashed,
>  and loses track of it. A second reload (or any other initctl operation
>  on ssh) will thus say:
>
>  reload: Unknown instance:
>
>  There would be 2 ways to fix this:
>  1.  Don't have ssh fork on relod, but keep the same pid
>  2. Use a different mechanism in upstart to keep track of ssh. Maybe a pid file? Just tracking children of the exited ssh won't work, or it might accidentally track a particular session rather than the master, if somebody just happens to log in close to reload time.
>
>  openssh-server  1:5.3p1-3ubuntu4
>  upstart         0.6.5-7
>
>  ==== Info for Maverick, Lucid SRU ====
>  IMPACT: if sshd gets a HUP signal, it forks a new process and upstart thinks the process died and loses track of it, so the user/admin uses the ability to stop/start/reload the daemon through upstart.
>  The problem is fixed in Natty 5.6p1-2ubuntu3. See attached patches for Maverick and Lucid.
>
>  TEST CASE:
>
>  - install openssh-server
>  - send a HUP signal to sshd
>  - the daemon is restarted, but upstart thinks that it crashed (/var/log/daemon.log):
>
>  Dec 28 20:59:57 utest-lls32 init: ssh main process ended, respawning
>  Dec 28 20:59:57 utest-lls32 init: ssh main process (1451) terminated with status 255
>  Dec 28 20:59:57 utest-lls32 init: ssh main process ended, respawning
>  Dec 28 20:59:57 utest-lls32 init: ssh main process (1455) terminated with status 255
>  Dec 28 20:59:57 utest-lls32 init: ssh respawning too fast, stopped
>
>  - after this, upstart won't know about sshd, despite the daemon
>  running just fine:
>
>  root@utest-lls32:~# reload ssh
>  reload: Unknown instance:
>
>  With the fix applied, the correct behavior is:
>
>  - send a HUP signal to sshd
>    ps ax |grep sshd
>    kill -HUP sshd
>  - the daemon reloads (/var/log/auth.log):
>
>  Dec 28 21:37:01 utest-lls32 sshd[742]: Received SIGHUP; restarting.
>  Dec 28 21:37:01 utest-lls32 sshd[742]: Server listening on 0.0.0.0 port 22.
>  Dec 28 21:37:01 utest-lls32 sshd[742]: Server listening on :: port 22.
>
>  - reloading with upstart gives the same result, and NOT an error
>  message.
>
>  REGRESSION POTENTIAL:
>
>  There is a small race condition in sshd between when it forks, and
>  when it listens for incoming connections. The length of the race is
>  lengthened by a very tiny amount by considering sshd started as soon
>  as it has been executed, rather than when it forks. This will only
>  affect jobs that use 'start on started ssh' and immediately connect to
>  it. This is unlikely to cause problems in any real world scenario,
>  given that most of these programs would also have to fork, exec, and
>  open a socket, which is more work than what sshd will be doing in that
>  time.
>
>
>

Revision history for this message

Mark Goris (mgoris) wrote on 2011-02-23:

#36

I just updated to the openssh-server_5.3p1-3ubuntu5 package for a Lucid instance, and now I can't get SSH to start. Or, rather, I'm running into what Jan mentioned above, with the respawning/terminating loop that eventually results in init deciding that SSH isn't going to start, and giving up. Something like:

init: ssh main process (430) terminated with status 255
init: ssh main process ended, respawning
init: ssh main process (449) terminated with status 255
init: ssh main process ended, respawning
init: ssh main process (456) terminated with status 255
init: ssh main process ended, respawning
...
init: ssh main process (468) terminated with status 255
init: ssh respawning too fast, stopped

The real trouble here is, this is an EC2 instance that was just restarted, so now I have no means by which I can get on that machine to do any additional troubleshooting or ultimately fix this. But this was not caused by me executing an /etc/init.d/ssh stop, so I don't think this is bug #531912.

So my question would be what now? Downgrade upstart (I just upgraded to 0.6.5-8)? Downgrade openssh-server (I just upgraded to openssh-server_5.3.01-3ubuntu5? Is there something I can do to workaround this? My existing instance my be lost, but what will keep me from running into this again if I build a new EC2 instance with the same package versions?

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2011-02-23:

#37

On Wed, 2011-02-23 at 05:19 +0000, gorism wrote:
> I just updated to the openssh-server_5.3p1-3ubuntu5 package for a Lucid
> instance, and now I can't get SSH to start. Or, rather, I'm running
> into what Jan mentioned above, with the respawning/terminating loop that
> eventually results in init deciding that SSH isn't going to start, and
> giving up. Something like:
>
> init: ssh main process (430) terminated with status 255
> init: ssh main process ended, respawning
> init: ssh main process (449) terminated with status 255
> init: ssh main process ended, respawning
> init: ssh main process (456) terminated with status 255
> init: ssh main process ended, respawning
> ...
> init: ssh main process (468) terminated with status 255
> init: ssh respawning too fast, stopped
>
> The real trouble here is, this is an EC2 instance that was just
> restarted, so now I have no means by which I can get on that machine to
> do any additional troubleshooting or ultimately fix this. But this was
> not caused by me executing an /etc/init.d/ssh stop, so I don't think
> this is bug #531912.
>

If you restarted the instance and this is happening, its definitely not
bug #531912, as that one still results in an sshd that works on bootup.

If this is an EBS root instance, can you snapshot it and mount it from
another instance? If so, you should be able to view the contents
of /var/log and paste any errors from sshd. If you can't find anything
there, you can mount the volume r/w and add 'console output'
to /etc/init/ssh.conf and then check the console output when it fails to
start up.

If its fully instance store, I'm not sure there's much you can do though
EC2 experts may have some tricks.

Revision history for this message

Mark Goris (mgoris) wrote on 2011-02-23:

#38

It is an EBS root instance, and I'll be spinning up a replacement instance as a next step. I'll mount a snapshot of root, check out /var/log, and let you know what I find.

Revision history for this message

Mark Goris (mgoris) wrote on 2011-02-25:

#39

We did as described above, and the log files did not yield any helpful information. My next step is to revert back to openssh_5.3p1-3ubuntu4, since that seemed to be reliable.

As for the fact that this didn't show up in any testing prior to committing the patch, I would point out that, just like Jan above, our instances are running as Xen guests (since we're running in EC2). So perhaps that has something to do with it.

I can't tell from the bug comments, but I'm curious as to whether or not these sort of changes get tested in a Xen based environment. My guess is that their not, at least not consistently.

Ubuntu Foundations Team Bug Bot (crichton) on 2011-09-19

tags:

added: testcase

Ubuntu
openssh package

upstart loses track of ssh daemon after reload ssh

Bug Description

Related branches

Other bug subscribers

Patches

Remote bug watches

	Status	Importance	Assigned to
upstart	Invalid	Undecided	Unassigned
openssh (Ubuntu)	Fix Released	High	Clint Byrum
Lucid	Fix Released	High	Unassigned
Maverick	Fix Released	High	Unassigned

Ubuntuopenssh package

upstart loses track of ssh daemon after reload ssh

Bug Description

Related branches

Other bug subscribers

Patches

Remote bug watches

Ubuntu
openssh package