restart broken - Regression Caused by LP: #600941

Bug #896388 reported by nutznboltz
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
nagios-nrpe (Debian)
Fix Released
Unknown
nagios-nrpe (Ubuntu)
Fix Released
Medium
Stéphane Graber
Hardy
Fix Released
Undecided
Stéphane Graber
Lucid
Fix Released
Undecided
Stéphane Graber
Maverick
Won't Fix
Undecided
Unassigned
Natty
Fix Released
Undecided
Stéphane Graber
Oneiric
Fix Released
Undecided
Stéphane Graber
Precise
Fix Released
Medium
Stéphane Graber
Quantal
Fix Released
Medium
Stéphane Graber

Bug Description

After LP: #600941 was pushed out all of our systems started experiencing Nagios nrpe restart failures.

Commands like /etc/init.d/nagios-nrpe-server restart

would cause nrpe to stop but not restart.

I tracked this down to the way that the /etc/init.d/nagios-nrpe-server script is calling start-stop-daemon.

The issue is that the "stop" stanza in the /etc/init.d/nagios-nrpe-server script first calls start-stop-daemon which sends SIGTERM to nrpe and then waits only for one second.

If nrpe has not exited by that time the pid file will still exist and the /etc/init.d/nagios-nrpe-server script will remove it.

Worse if /etc/init.d/nagios-nrpe-server restart is used not only will the pid file be removed, the attempt to restart nrpe will fail provided that the nrpe daemon is still tardy in shutting down.

The attempt to start under those circumstances will fail because nrpe will still be bound to a socket and the second attempt at binding will cause the nrpe startup to abort.

They should have wondered why there was a comment about "sometimes the pid file does not get removed".

They should have tested on systems that have a heavy load and therefore slow nrpe response times.

The fix is to add --retry 10 or such to the invocation of start-stop-daemon ... --stop ...

Patch forthcoming, see

http://askubuntu.com/questions/82631/what-is-the-way-to-submit-a-patch-to-fix-all-the-damage-that-lp-600941-causes

See also

https://launchpad.net/~nutznboltz/+archive/nrpe-unbreak-lp-600941

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: nagios-nrpe-server (not installed)
ProcVersionSignature: Ubuntu 3.2.0-1.3-generic 3.2.0-rc2
Uname: Linux 3.2.0-1-generic i686
ApportVersion: 1.90-0ubuntu1
Architecture: i386
Date: Fri Nov 25 14:38:05 2011
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Release i386 (20111011)
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: nagios-nrpe
UpgradeStatus: No upgrade log present (probably fresh install)

Related branches

description: updated
Revision history for this message
Stéphane Graber (stgraber) wrote :

I'm unfortunately in a bit of a rush with alpha-1 being released next week so don't expect to have time to work on this for the next week or so.

In the mean time, could you forward that bug and fix to Debian? it'd make the process on Ubuntu's side much easier if it gets fixed in Debian first.

Our current init script (the one included in the SRU) is a perfect copy/paste from Debian's, so your bug most likely affects them too and so it'd be the right place to get the fix included.

Thanks

Changed in nagios-nrpe (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Stéphane Graber (stgraber)
Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

I'm OK with just using my PPA for now and letting the rest of the world stay broken.

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :
tags: added: regression-update
summary: - Fix Regression Caused by LP: #600941
+ restart broken - Regression Caused by LP: #600941
Changed in nagios-nrpe (Debian):
status: Unknown → New
Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

I will be on vacation through Jan 5, 2012. Please do not ask for testing until after that date, thanks.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nagios-nrpe (Ubuntu Hardy):
status: New → Confirmed
Changed in nagios-nrpe (Ubuntu Lucid):
status: New → Confirmed
Changed in nagios-nrpe (Ubuntu Maverick):
status: New → Confirmed
Changed in nagios-nrpe (Ubuntu Natty):
status: New → Confirmed
Changed in nagios-nrpe (Ubuntu Oneiric):
status: New → Confirmed
Changed in nagios-nrpe (Ubuntu):
status: New → Confirmed
Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

I'm back, feel free to ask for non-developer QA testing from me. Thanks.

Changed in nagios-nrpe (Debian):
status: New → Fix Released
Changed in nagios-nrpe (Ubuntu Maverick):
status: Confirmed → Won't Fix
Changed in nagios-nrpe (Ubuntu Hardy):
assignee: nobody → Stéphane Graber (stgraber)
Changed in nagios-nrpe (Ubuntu Lucid):
assignee: nobody → Stéphane Graber (stgraber)
Changed in nagios-nrpe (Ubuntu Natty):
assignee: nobody → Stéphane Graber (stgraber)
Changed in nagios-nrpe (Ubuntu Oneiric):
assignee: nobody → Stéphane Graber (stgraber)
Changed in nagios-nrpe (Ubuntu Hardy):
status: Confirmed → In Progress
Changed in nagios-nrpe (Ubuntu Lucid):
status: Confirmed → In Progress
Changed in nagios-nrpe (Ubuntu Natty):
status: Confirmed → In Progress
Changed in nagios-nrpe (Ubuntu Oneiric):
status: Confirmed → In Progress
Changed in nagios-nrpe (Ubuntu Precise):
status: Confirmed → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nagios-nrpe - 2.12-6ubuntu1

---------------
nagios-nrpe (2.12-6ubuntu1) quantal; urgency=low

  [ Dmitrijs Ledkovs ]
  * Merge with Debian; remaining changes:
    - debian/{rules,control}: add hardening-includes to gain PIE
      security builds.
    - Use dpkg-buildflags.
  * Changes gained from Debian:
    - [4dc53fb] Use retry argument for start-stop-daemon when stopping nrpe
    (LP: #896388)

  [ Stéphane Graber ]
  * Drop useless diff file in source package (xxx) that was added by
    mistake in a previous merge.

nagios-nrpe (2.12-6) unstable; urgency=low

  * [36b1062] Add add icinga to the list of recommends
  * [a698acb] Don't remove homedirectory of the nagios user (Closes: #665845)
  * [4dc53fb] Use retry argument for start-stop-daemon when stopping nrpe
    (Closes: #650464)
 -- Stephane Graber <email address hidden> Thu, 03 May 2012 09:55:10 -0400

Changed in nagios-nrpe (Ubuntu Quantal):
status: Confirmed → Fix Released
Changed in nagios-nrpe (Ubuntu Precise):
status: In Progress → Fix Committed
Changed in nagios-nrpe (Ubuntu Oneiric):
status: In Progress → Fix Committed
Changed in nagios-nrpe (Ubuntu Natty):
status: In Progress → Fix Committed
Changed in nagios-nrpe (Ubuntu Lucid):
status: In Progress → Fix Committed
Changed in nagios-nrpe (Ubuntu Hardy):
status: In Progress → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Hello nutznboltz, or anyone else affected,

Accepted nagios-nrpe into precise-proposed. The package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Hello nutznboltz, or anyone else affected,

Accepted nagios-nrpe into oneiric-proposed. The package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Hello nutznboltz, or anyone else affected,

Accepted nagios-nrpe into natty-proposed. The package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Hello nutznboltz, or anyone else affected,

Accepted nagios-nrpe into hardy-proposed. The package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Hello nutznboltz, or anyone else affected,

Accepted nagios-nrpe into lucid-proposed. The package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Martin Pitt (pitti) wrote :

I actually accepted the natty-proposed upload now.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Can someone help testing these?

Based on past comments, the two things we want to check for all of these Ubuntu versions are:
 - Does /etc/init.d/nagios-nrpe restart work fine (try calling multiple time) => regression test
 - Does /etc/init.d/nagios-nrpe restart under high load work fine now (in the past, the service would stop but never restart)

Revision history for this message
Stéphane Graber (stgraber) wrote :

Short of having testers willing to confirm the SRU for me, I deployed it on my network, across multiple hosts with varying load, all of the 60 containers upgraded properly and nrpe still works. This was on a mix of hardy (5 systems), lucid (20 systems), precise (75 systems).

So as far as I'm concerned, that's:
 - hardy => OK
 - lucid => OK
 - precise => OK

As natty shipped with the exact same version of nagios-nrpe as lucid's, I'd consider it a pass too (even though I haven't tested it).

I unfortunately don't have any oneiric system using nrpe (I only use LTSes in production) but I'd be tempted to consider it a pass too on the basis that the fix is absolutely identical to lucid's and the previous SRU (2.12-4ubuntu3.1) touching the same code path was confirmed to apply equally on all releases.

Marking verification-done. If someone has time, actual testing of both natty and oneiric would be appreciated.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nagios-nrpe - 2.8.1-1ubuntu0.2

---------------
nagios-nrpe (2.8.1-1ubuntu0.2) hardy-proposed; urgency=low

  * [4dc53fb] Use retry argument for start-stop-daemon when stopping nrpe,
    this fixes cases where restarting nagios-nrpe fails to respawn it.
    (LP: #896388)
 -- Stephane Graber <email address hidden> Thu, 03 May 2012 10:19:29 -0400

Changed in nagios-nrpe (Ubuntu Hardy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nagios-nrpe - 2.12-4ubuntu1.10.04.2

---------------
nagios-nrpe (2.12-4ubuntu1.10.04.2) lucid-proposed; urgency=low

  * [4dc53fb] Use retry argument for start-stop-daemon when stopping nrpe,
    this fixes cases where restarting nagios-nrpe fails to respawn it.
    (LP: #896388)
 -- Stephane Graber <email address hidden> Thu, 03 May 2012 10:17:43 -0400

Changed in nagios-nrpe (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nagios-nrpe - 2.12-5ubuntu1.1

---------------
nagios-nrpe (2.12-5ubuntu1.1) precise-proposed; urgency=low

  * [4dc53fb] Use retry argument for start-stop-daemon when stopping nrpe,
    this fixes cases where restarting nagios-nrpe fails to respawn it.
    (LP: #896388)
 -- Stephane Graber <email address hidden> Thu, 03 May 2012 10:10:38 -0400

Changed in nagios-nrpe (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Resetting verification tags for natty and oneiric. These should at least get a shallow regression test ("still starts and works") to rule out misbuilds.

tags: added: verification-needed
removed: verification-done
Revision history for this message
Stéphane Graber (stgraber) wrote :

Confirmed on natty, used a clean up to date natty container, installed nagios-nrpe-server, checked that it started properly, upgraded to that from -proposed, checked that the restart worked properly.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Confirmed on oneiric, used a clean up to date oneiric container, installed nagios-nrpe-server, checked that it started properly, upgraded to that from -proposed, checked that the restart worked properly.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nagios-nrpe - 2.12-4ubuntu1.11.04.2

---------------
nagios-nrpe (2.12-4ubuntu1.11.04.2) natty-proposed; urgency=low

  * [4dc53fb] Use retry argument for start-stop-daemon when stopping nrpe,
    this fixes cases where restarting nagios-nrpe fails to respawn it.
    (LP: #896388)
 -- Stephane Graber <email address hidden> Thu, 03 May 2012 10:16:08 -0400

Changed in nagios-nrpe (Ubuntu Natty):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nagios-nrpe - 2.12-4ubuntu3.2

---------------
nagios-nrpe (2.12-4ubuntu3.2) oneiric-proposed; urgency=low

  * [4dc53fb] Use retry argument for start-stop-daemon when stopping nrpe,
    this fixes cases where restarting nagios-nrpe fails to respawn it.
    (LP: #896388)
 -- Stephane Graber <email address hidden> Thu, 03 May 2012 10:13:59 -0400

Changed in nagios-nrpe (Ubuntu Oneiric):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.