[SRU] Update squid3 upstart script to kill it with SIGINT and wait longer

Bug #1073478 reported by Jan Groenewald
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
squid3 (Ubuntu)
Invalid
Medium
Unassigned
Precise
Fix Released
Undecided
Unassigned
Trusty
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

Squid 3.1.19-1ubuntu3.12.04.3 on Precise and 3.3.8-1ubuntu6.3 on Trusty do not clean up its PID file, causing logrotate to spit out an error each night when cron runs it.

The underlying issue here is that the upstart script does not wait long enough for squid3 to finish - one might also argue that it is using the wrong signal if it expects squid to finish quickly.

By default, when squid3 receives a SIGTERM it will close the socket for incoming connections and will wait for existing connections to complete. The shutdown_lifetime directive configures how long it waits before forcefully closing those open connections and it is set by default to 30 seconds.

Current setting, SIGKILL happens after 5 seconds (which is the upstart default):
=====
# strace -r -e trace=signal,file -p $(cat /var/run/squid3.pid)
Process 20865 attached - interrupt to quit
     0.000000 --- SIGTERM (Terminated) @ 0 (0) ---
     0.000440 rt_sigreturn(0xf) = -1 EINTR (Interrupted system call)
     0.000418 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
     5.006483 +++ killed by SIGKILL +++

Killing squid with SIGKILL might lead to cache index corruption and a very slow startup the next time it is started: http://lists.squid-cache.org/pipermail/squid-users/2015-July/004752.html

The first way to avoid this is by increasing upstart's "kill timeout" to a few seconds more then the "shutdown_lifetime" directive.

With "kill timeout 40" in upstart script, notice the pid file being unlinked (the ~30 seconds wait is due to the default value of shutdown_lifetime directive):
=====
# strace -r -e trace=signal,file -p $(cat /var/run/squid3.pid)
Process 20805 attached - interrupt to quit
     0.000000 --- SIGTERM (Terminated) @ 0 (0) ---
     0.000284 rt_sigreturn(0xf) = -1 EINTR (Interrupted system call)
     0.000402 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
    31.611726 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
     0.001792 --- SIGCHLD (Child exited) @ 0 (0) ---
     0.000247 rt_sigreturn(0xffffffffffffffff) = 63
     0.006027 unlink("/var/run/squid3.pid") = 0
     0.000539 open("/proc/sys/kernel/ngroups_max", O_RDONLY) = 0
     0.000387 open("/etc/group", O_RDONLY|O_CLOEXEC) = 0
Process 20805 detached

Still, this will unfortunately lead to a slow shutdown time as reported in http://askubuntu.com/questions/18127/squid3-starts-and-stops-slowly

To solve this it is better to issue a SIGINT instead of SIGTERM (upstart default) by adding "kill signal SIGINT" to squid's upstart script.

With "kill signal SIGINT", which does not wait on outstanding connections:
=====
# strace -r -e trace=signal,file -p $(cat /var/run/squid3.pid)
Process 20891 attached - interrupt to quit
     0.000000 --- SIGINT (Interrupt) @ 0 (0) ---
     0.000312 rt_sigreturn(0x2) = -1 EINTR (Interrupted system call)
     0.000237 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
     1.123564 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
     0.000754 --- SIGCHLD (Child exited) @ 0 (0) ---
     0.000103 rt_sigreturn(0xffffffffffffffff) = 0
     0.002031 unlink("/var/run/squid3.pid") = 0
     0.000154 open("/proc/sys/kernel/ngroups_max", O_RDONLY) = 0
     0.000142 open("/etc/group", O_RDONLY|O_CLOEXEC) = 0
Process 20891 detached

The whole stop process is very quick. The downside is that squid will forcefully close all open connections immediately, but that was already happening when SIGKILL was being issued before - so we are actually keeping the same behavior as before and there are no bugs complaining about it. And the PID file gets removed anyway.

The proposal is to add both "kill signal SIGINT" and "kill timeout 40" to squid, so squid should have enough time to close all open cache index files.

[Test Case]

# start squid3
squid3 start/running, process 10392
# cat /var/run/squid3.pid
10392
# stop squid3
squid3 stop/waiting
# cat /var/run/squid3.pid
10392
# logrotate -f /etc/logrotate.d/squid3
squid: ERROR: Could not send signal 10 to process 10392: (3) No such process
error: error running shared postrotate script for '/var/log/squid3/*.log '

[Regression Potential]

* The main regression potencial is related to the "kill timeout" value. If set too high, a user's system might *seem* too slow to shutdown. Note that this should rarely happen: only when squid actually needs a long time to write down and close its cache index files (high system load maybe?). Overall, whenever a system behaves this way the user probably has other problems to solve.
* If "kill timeout" value is set too low we might incur in the same problem originally reported and end up calling SIGKILL on squid too early, probably while it is closing its cache index files, thus corrupting them. Anyway, this was probably happening already anyway, as SIGKILL is always called after 5 seconds in the current upstart script.

Overall, there is very low regression potential for this fix.

[Other info]

# lsb_release -d
Description: Ubuntu 12.04.5 LTS
# dpkg -l squid3
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Description
+++-==================-==================-====================================================
ii squid3 3.1.19-1ubuntu3.12 Full featured Web Proxy cache (HTTP proxy)

# lsb_release -d
Description: Ubuntu 14.04.3 LTS
# dpkg -l squid3
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===========================================-==========================-==========================-===========================================================================================
ii squid3 3.3.8-1ubuntu6.3 amd64 Full featured Web Proxy cache (HTTP proxy)

Revision history for this message
Robie Basak (racb) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better.

I can confirm that the logrotate test fails if squid is not running, and agree that this is a bug.

Unfortunately I can't find a good reference to point to in order to declare that squid should be cleaning its pid file and that it is not, although my understand has also always been that daemons should do this when they can. I've found http://stackoverflow.com/questions/688343/reference-for-proper-handling-of-pid-file-on-unix and http://perfec.to/stalepid.html both of which indicate that it is a good thing to do, but there doesn't seem to be any definitive reference for this.

I think an easier fix would be to adjust squid's logrotate configuration to test for the process before calling "squid3 -k rotate".

Could you try this as a workaround? In /etc/logrotate.d/squid3, replace the postrotate line:

    test ! -e /var/run/squid3.pid || /usr/sbin/squid3 -k rotate

with:

    test ! -e /var/run/squid3.pid || ! kill -0 `cat /var/run/squid3.pid` 2>/dev/null || /usr/sbin/squid3 -k rotate

Revision history for this message
Robie Basak (racb) wrote :

(the replacement is all on one line)

Changed in squid3 (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
summary: - squid3 does not clean up pid file
+ squid3 logrotate fails when squid is not running
Revision history for this message
Jan Groenewald (jan-aims) wrote : Re: [Bug 1073478] Re: squid3 does not clean up pid file

That seems to work:

0 root@rackspace:/etc/logrotate.d#cp squid3 ~/etc_logrotate.d_squid3.orig
0 root@rackspace:/etc/logrotate.d#vim squid3
0 root@rackspace:/etc/logrotate.d#logrotate -f
/root/etc_logrotate.d_squid3.orig
squid: ERROR: Could not send signal 10 to process 13273: (3) No such process
error: error running shared postrotate script for '/var/log/squid3/*.log '
1 root@rackspace:/etc/logrotate.d#logrotate -f squid3
0 root@rackspace:/etc/logrotate.d#grep kill squid3
                 test ! -e /var/run/squid3.pid || ! kill -0 `cat
/var/run/squid3.pid` 2>/dev/null || /usr/sbin/squid3 -k rotate
0 root@rackspace:/etc/logrotate.d#ps axw|grep squid3
13357 pts/0 S+ 0:00 grep --color=auto squid3
0 root@rackspace:/etc/logrotate.d#cat /var/run/squid3.pid
13273
0 root@rackspace:/etc/logrotate.d#

On 15 November 2012 17:21, Robie Basak <email address hidden> wrote:

> (the replacement is all on one line)
>
> ** Changed in: squid3 (Ubuntu)
> Status: New => Triaged
>
> ** Changed in: squid3 (Ubuntu)
> Importance: Undecided => Medium
>
> ** Summary changed:
>
> - squid3 does not clean up pid file
> + squid3 logrotate fails when squid is not running
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1073478
>
> Title:
> squid3 logrotate fails when squid is not running
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/squid3/+bug/1073478/+subscriptions
>

--
  .~.
  /V\ Jan Groenewald
 /( )\ www.aims.ac.za
 ^^-^^

Revision history for this message
Robie Basak (racb) wrote : Re: squid3 logrotate fails when squid is not running

Thanks! The logrotate script comes from Debian, so it would be best to file a bug in Debian to get this fixed at the source. I'm pretty sure that sysv init would be affected by this too.

tags: added: needs-upstream-report
Revision history for this message
Sopor (sopor) wrote :

This bug report is over 2 years old, but still it's not fixed :(

description: updated
description: updated
Revision history for this message
Tiago Stürmer Daitx (tdaitx) wrote :
Revision history for this message
Tiago Stürmer Daitx (tdaitx) wrote :
tags: added: patch precise trusty
removed: needs-upstream-report
summary: - squid3 logrotate fails when squid is not running
+ [SRU] Update squid3 upstart script to kill it with SIGINT and wait
+ longer
Revision history for this message
Robie Basak (racb) wrote :

Note that there was an issue with php5 fpm (I think that was it) that we can't ship a "kill" directive in an upstart job if someone can do a release upgrade up to the version where that directive first appears. Otherwise it gets done on the old upstart that does not support it and break things. I can't find the bug reference right now, sorry, but it should probably be checked against Precise.

Revision history for this message
Tiago Stürmer Daitx (tdaitx) wrote :

Thanks for the feedback. Do you mean LP: #1272788 "php5-fpm upstart job is not compatible with precise upstart" or LP: #1242376 "reload php5-fpm crashes"?

That was one was caused because of the "reload" directive, only introduced in upstart 1.10 while Precise only supports upstart 1.5.

According to http://upstart.ubuntu.com/cookbook/#stanzas-by-category both "kill timeout" (all upstart versions) and "kill signal" (since upstart 1.3) should be supported by upstart 1.5 in Precise so this SRU should be fine.

Revision history for this message
Robie Basak (racb) wrote : Re: [Bug 1073478] Re: [SRU] Update squid3 upstart script to kill it with SIGINT and wait longer

On Wed, Oct 14, 2015 at 12:34:33PM -0000, Tiago Stürmer Daitx wrote:
> According to http://upstart.ubuntu.com/cookbook/#stanzas-by-category
> both "kill timeout" (all upstart versions) and "kill signal" (since
> upstart 1.3) should be supported by upstart 1.5 in Precise so this SRU
> should be fine.

You're right. Sorry for the noise.

Revision history for this message
Michael Terry (mterry) wrote :

OK, I've uploaded these to precise and trusty. Will subscribe ubuntu-sru and unsubscribe ubuntu-sponsors.

Does this need any adjusting in wily? We don't use upstart anymore but do ship a traditional init.d file.

Revision history for this message
Michael Terry (mterry) wrote :

(And thanks so much for preparing these patches / SRU description, Tiago!)

Revision history for this message
Tiago Stürmer Daitx (tdaitx) wrote :

Thank you Michael!

We only need to touch Wily if we want to keep the upstart script updated there as well, but as you said Wily no longer uses upstart. Updating it would not change any behavior. So far I couldn't find any documentation explaining how to proceed on this scenario, so I'm open to suggestions. I'm also ok providing a debdiff for it if we decide to go that way. =)

Revision history for this message
Tiago Stürmer Daitx (tdaitx) wrote :
Download full text (7.6 KiB)

On the contrary, thanks for bringing that up. I was good to be made aware
of such problems. Just because it worked now on Precise and Trusty didn't
mean I should skip that check, as I might as well run into an obsolete or
deleted feature (not that upstart seem to have those). =)
On Oct 14, 2015 9:56 AM, "Robie Basak" <email address hidden> wrote:

> On Wed, Oct 14, 2015 at 12:34:33PM -0000, Tiago Stürmer Daitx wrote:
> > According to http://upstart.ubuntu.com/cookbook/#stanzas-by-category
> > both "kill timeout" (all upstart versions) and "kill signal" (since
> > upstart 1.3) should be supported by upstart 1.5 in Precise so this SRU
> > should be fine.
>
> You're right. Sorry for the noise.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1073478
>
> Title:
> [SRU] Update squid3 upstart script to kill it with SIGINT and wait
> longer
>
> Status in squid3 package in Ubuntu:
> Triaged
>
> Bug description:
> [Impact]
>
> Squid 3.1.19-1ubuntu3.12.04.3 on Precise and 3.3.8-1ubuntu6.3 on
> Trusty do not clean up its PID file, causing logrotate to spit out an
> error each night when cron runs it.
>
> The underlying issue here is that the upstart script does not wait
> long enough for squid3 to finish - one might also argue that it is
> using the wrong signal if it expects squid to finish quickly.
>
> By default, when squid3 receives a SIGTERM it will close the socket
> for incoming connections and will wait for existing connections to
> complete. The shutdown_lifetime directive configures how long it waits
> before forcefully closing those open connections and it is set by
> default to 30 seconds.
>
> Current setting, SIGKILL happens after 5 seconds (which is the upstart
> default):
> =====
> # strace -r -e trace=signal,file -p $(cat /var/run/squid3.pid)
> Process 20865 attached - interrupt to quit
> 0.000000 --- SIGTERM (Terminated) @ 0 (0) ---
> 0.000440 rt_sigreturn(0xf) = -1 EINTR (Interrupted system call)
> 0.000418 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118,
> ...}) = 0
> 5.006483 +++ killed by SIGKILL +++
>
> Killing squid with SIGKILL might lead to cache index corruption and a
> very slow startup the next time it is started: http://lists.squid-
> cache.org/pipermail/squid-users/2015-July/004752.html
>
> The first way to avoid this is by increasing upstart's "kill timeout"
> to a few seconds more then the "shutdown_lifetime" directive.
>
> With "kill timeout 40" in upstart script, notice the pid file being
> unlinked (the ~30 seconds wait is due to the default value of
> shutdown_lifetime directive):
> =====
> # strace -r -e trace=signal,file -p $(cat /var/run/squid3.pid)
> Process 20805 attached - interrupt to quit
> 0.000000 --- SIGTERM (Terminated) @ 0 (0) ---
> 0.000284 rt_sigreturn(0xf) = -1 EINTR (Interrupted system call)
> 0.000402 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118,
> ...}) = 0
> 31.611726 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118,
> ...}) = 0
> 0.001792 --- SIGCHLD (Child exited) @ 0...

Read more...

Revision history for this message
Robie Basak (racb) wrote :

On Wed, Oct 14, 2015 at 05:49:12PM -0000, Tiago Stürmer Daitx wrote:
> We only need to touch Wily if we want to keep the upstart script updated
> there as well, but as you said Wily no longer uses upstart. Updating it
> would not change any behavior. So far I couldn't find any documentation
> explaining how to proceed on this scenario, so I'm open to suggestions.
> I'm also ok providing a debdiff for it if we decide to go that way. =)

IMHO, this doesn't need to be a priority for Ubuntu. In my next merge I
intend to drop the upstart script part of the Ubuntu delta anyway.

Revision history for this message
Michael Terry (mterry) wrote :

Yeah, but is the bug present in the sysinit script? I'm assuming not, but it wouldn't hurt to verify.

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Jan, or anyone else affected,

Accepted squid3 into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/squid3/3.3.8-1ubuntu6.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in squid3 (Ubuntu Trusty):
status: New → Fix Committed
tags: added: verification-needed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Jan, or anyone else affected,

Accepted squid3 into precise-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/squid3/3.1.19-1ubuntu3.12.04.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in squid3 (Ubuntu Precise):
status: New → Fix Committed
Revision history for this message
Tiago Stürmer Daitx (tdaitx) wrote : Re: [Bug 1073478] Re: [SRU] Update squid3 upstart script to kill it with SIGINT and wait longer

> Yeah, but is the bug present in the sysinit script? I'm assuming not,
> but it wouldn't hurt to verify.

Michael, indeed, thanks for the heads up. I did try it when
researching the SRU, but I forgot to include evidence that it worked
just fine. I have collected that just now.

# /etc/init.d/squid3 start
[ ok ] Starting squid3 (via systemctl): squid3.service.
# /etc/init.d/squid3 stop
[ ok ] Stopping squid3 (via systemctl): squid3.service.

# strace -r -e trace=signal,file -p $(cat /var/run/squid3.pid)
Process 2925 attached
     0.000000 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER,
si_pid=3028, si_uid=0} ---
     0.000068 kill(2923, SIGUSR1) = 0
     0.000046 rt_sigreturn() = -1 EINTR (Interrupted system call)
     0.000081 stat("/etc/localtime", {st_mode=S_IFREG|0644,
st_size=127, ...}) = 0
     9.395299 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED,
si_pid=2943, si_status=1, si_utime=0, si_stime=1} ---
     0.000778 rt_sigreturn() = -1 EINTR (Interrupted system call)
    21.948196 stat("/etc/localtime", {st_mode=S_IFREG|0644,
st_size=127, ...}) = 0
     0.002870 kill(2931, SIGTERM) = 0
     0.000606 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED,
si_pid=2931, si_status=SIGTERM, si_utime=0, si_stime=0} ---
     0.000271 rt_sigreturn() = 9
     0.010035 unlink("/var/run/squid3.pid") = 0
     0.000397 open("/proc/sys/kernel/ngroups_max", O_RDONLY) = 10
     0.000353 open("/etc/group", O_RDONLY|O_CLOEXEC) = 10
     0.005137 +++ exited with 0 +++

As expected. it exits after about ~30 seconds.

Revision history for this message
Tiago Stürmer Daitx (tdaitx) wrote :

Brian, thank you for uploading and publishing those to -proposed.

I have tested them on both Precise and Trusty and can confirm that they work as expected: squid is killed with SIGINT, closing all outstanding connections immediately and unlinking the PID file.

Michael Terry (mterry)
tags: added: verification-done-precise verification-done-trusty
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package squid3 - 3.1.19-1ubuntu3.12.04.4

---------------
squid3 (3.1.19-1ubuntu3.12.04.4) precise-proposed; urgency=low

  * d/squid3.upstart: Use SIGINT to terminate squid and wait at most 40
    seconds for it to finish. (LP: #1073478)

 -- Tiago Stürmer Daitx <email address hidden> Wed, 14 Oct 2015 02:54:20 +0000

Changed in squid3 (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for squid3 has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package squid3 - 3.3.8-1ubuntu6.4

---------------
squid3 (3.3.8-1ubuntu6.4) trusty-proposed; urgency=low

  * d/squid3.upstart: Use SIGINT to terminate squid and wait at most 40
    seconds for it to finish. (LP: #1073478)

 -- Tiago Stürmer Daitx <email address hidden> Wed, 14 Oct 2015 02:54:20 +0000

Changed in squid3 (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Michael Terry (mterry) wrote :

And marking current task Invalid, since we moved away from upstart.

Changed in squid3 (Ubuntu):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.