mysqld_safe thinks mysqld has crashed when it hasn't

Bug #326768 reported by Will Uther
68
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Mythbuntu
Fix Released
Critical
Unassigned
mysql-dfsg-5.0 (Debian)
Confirmed
Unknown
mysql-dfsg-5.0 (Ubuntu)
Fix Released
High
Mathias Gug
Jaunty
Fix Released
Medium
Mathias Gug
Karmic
Fix Released
High
Mathias Gug

Bug Description

--Impact--
I'm running mythtv on jaunty with mysql-server-5.0 version 5.1.30really5.0.75-0ubuntu5. During boot mysql starts, then mythtv starts, then mysql restarts and mythtv gets confused.

This is caused by some portions of a debian patch that is applied on top of MySQL. It has not been accepted yet at upstream MySQL. The MySQL server is receiving a SIGHUP which the behavior is changed because of the debian patch.

Here are some relevant syslog sections:

Feb 8 12:35:07 willvo mysqld_safe[3668]: started
Feb 8 12:35:08 willvo mysqld[3671]: 090208 12:35:08 InnoDB: Started; log sequence number 0 43655
Feb 8 12:35:08 willvo mysqld[3671]: 090208 12:35:08 [Note] /usr/sbin/mysqld: ready for connections.
Feb 8 12:35:08 willvo mysqld[3671]: Version: '5.0.75-0ubuntu5' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu)
Feb 8 12:35:08 willvo /etc/mysql/debian-start[3710]: Upgrading MySQL tables if necessary.
Feb 8 12:35:08 willvo /etc/mysql/debian-start[3713]: Looking for 'mysql' as: /usr/bin/mysql
Feb 8 12:35:08 willvo /etc/mysql/debian-start[3713]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
Feb 8 12:35:08 willvo /etc/mysql/debian-start[3713]: This installation of MySQL is already upgraded to 5.0.75, use --force if you still need to run mysql_upgrade
Feb 8 12:35:08 willvo /etc/mysql/debian-start[3720]: Checking for insecure root accounts.
Feb 8 12:35:08 willvo /etc/mysql/debian-start[3724]: Triggering myisam-recover for all MyISAM tables

Feb 8 12:35:13 willvo mythtv-backend[4560]: Started mythtv-backend

Feb 8 12:35:16 willvo mysqld_safe[5212]: Number of processes running now: 1
Feb 8 12:35:16 willvo mysqld_safe[5223]: mysqld process hanging, pid 3670 - killed
Feb 8 12:35:16 willvo mysqld_safe[5227]: restarted
Feb 8 12:35:16 willvo mysqld[5231]: 090208 12:35:16 InnoDB: Started; log sequence number 0 43655
Feb 8 12:35:17 willvo mysqld[5231]: 090208 12:35:17 [Note] /usr/sbin/mysqld: ready for connections.
Feb 8 12:35:17 willvo mysqld[5231]: Version: '5.0.75-0ubuntu5' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu)

Note that mysqld_safe is finds one mysqld process still running after it is supposed to have crashed. mysqld_safe then kills that process and starts another. This only seems to happen once during boot - it doesn't keep restarting mysqld in normal use.

--Addressing--
This has *not* yet been addressed in the karmic branch, but it has been verified that a PPA resolve the problem. MySQL hasn't changed yet in karmic, so this can easily be brought to karmic if viewed to properly solve the problem.

The problem comes from the fact that mysqld_safe starts mysqld and then waits for its crash. However installing a trap for SIGHUP makes the wait command return immediately when a SIGHUP is received by mysqld_safe. This mysqld_safe proceed to kill the remaining mysqld process (which hasn't crashed). The proposed fix is to add a wait command to the trap.

-- Test Case
1. Install mysql-server-5.0 and watch the log. Record the PID of mysqld. Issue a status command to see how many flush tables have been done:
$ sudo /usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf status
Uptime: 36 Threads: 1 Questions: 1 Slow queries: 0 Opens: 12 Flush tables: 1 Open tables: 6 Queries per second avg: 0.028

2. From a terminal:
  sudo killall -HUP mysqld_safe

3.
 a. WIthtout the patch: mysqld_safe reports that a mysqld process is stuck, kills it and restarts it. This can be verified by checking that mysqld has new a PID.
 b. With the patch: mysqld is *not* restarted. mysqld has the same PID. Only the refresh command is sent to mysqld. This can be verified by checking that the number of flush tables has been incremented by one.

--Regression Potential--
There isn't any change in behavior. Signals are still handled the way they were before.

Revision history for this message
Charles (charles-bovy) wrote :

I can confirm this behavior.

Revision history for this message
Mathias Gug (mathiaz) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. When you're running into the issue, does mysqld_safe takes up 100% of the cpu?

Changed in mysql-dfsg-5.0:
status: New → Incomplete
Revision history for this message
Will Uther (willu-mailinglists) wrote :

Without the fix I mentioned in the original report, mysql gets restarted once. Obviously mysqld_safe uses some CPU to do this (it will do one iteration of the loop at the bottom of the script). mysqld_safe is not looping and restarting mysql multiple times, which is, I think, the behaviour you were asking about.

No, it does not sit at the top of `top`.

It is strange that it only falls through once.

Revision history for this message
MarcRandolph (mrand) wrote :

Bug 358173 may be a duplicate of this.

Thanks to the original reporter for proposing a solution.

Changed in mysql-dfsg-5.0 (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Will Uther (willu-mailinglists) wrote :

Yeah - it looks to me like Bug 358173 is the same bug.

I find it very strange that this bug only affects mythbackend (all the possible duplicates seem to reference mythtv). I suspect that it is doing something strange. I reported a mysql bug because I figured that the mysql server shouldn't behave strangely even if the client program does weird things.

Revision history for this message
MarcRandolph (mrand) wrote :

Adding mythbuntu. It seems to me that the " & wait" should make things more reliable and predictable rather than less, so I must be missing something. Any danger in removing the " & wait"?

Changed in mythbuntu:
importance: Undecided → Critical
milestone: none → jaunty-rc
status: New → Confirmed
Revision history for this message
ranjo (ranjo-jjxl) wrote :

Since I upgraded to Jaunty RC, my mythtv database gets corrupted after almost every new boot. This results into a confused mythtv-backend which fails to record the scheduled recordings. After a "repair table" command everything returns back as normal. I'm trying the " & wait" fix mentioned above, hope to get some good results.

Revision history for this message
barney_1 (szczys) wrote :

I believe I am also encountering this bug. Running Mythbuntu 9.04 Beta I find my recording schedule on mythtv is empty after reboot. Running mythtv-setup, removing all encoder cards and adding them again fixes this.

Output of "cat syslog | grep mysqld":

Apr 19 11:54:05 barney mysqld[3497]: 090419 11:54:05 [Note] /usr/sbin/mysqld: Normal shutdown
Apr 19 11:54:05 barney mysqld[3497]:
Apr 19 11:54:07 barney mysqld[3497]: 090419 11:54:07 InnoDB: Starting shutdown...
Apr 19 11:54:08 barney mysqld[3497]: 090419 11:54:08 InnoDB: Shutdown completed; log sequence number 0 150426
Apr 19 11:54:08 barney mysqld[3497]: 090419 11:54:08 [Note] /usr/sbin/mysqld: Shutdown complete
Apr 19 11:54:08 barney mysqld[3497]:
Apr 19 11:54:08 barney mysqld_safe[5352]: ended
Apr 19 11:54:48 barney kernel: [ 11.333672] type=1505 audit(1240160086.411:8): operation="profile_load" name="/usr/sbin/mysqld" name2="default" pid=1969
Apr 19 11:54:49 barney mysqld_safe[2496]: started
Apr 19 11:54:49 barney mysqld[2500]: 090419 11:54:49 InnoDB: Started; log sequence number 0 150426
Apr 19 11:54:50 barney mysqld[2500]: 090419 11:54:50 [Note] /usr/sbin/mysqld: ready for connections.
Apr 19 11:54:50 barney mysqld[2500]: Version: '5.0.75-0ubuntu10' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu)
Apr 19 11:55:03 barney mysqld_safe[3605]: Number of processes running now: 1
Apr 19 11:55:03 barney mysqld_safe[3613]: mysqld process hanging, pid 2498 - killed
Apr 19 11:55:03 barney mysqld_safe[3616]: restarted
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: The log sequence number in ibdata files does not match
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: the log sequence number in the ib_logfiles!
Apr 19 11:55:03 barney mysqld[3620]: 090419 11:55:03 InnoDB: Database was not shut down normally!
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: Starting crash recovery.
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: Reading tablespace information from the .ibd files...
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: Restoring possible half-written data pages from the doublewrite
Apr 19 11:55:03 barney mysqld[3620]: InnoDB: buffer...
Apr 19 11:55:03 barney mysqld[3620]: 090419 11:55:03 InnoDB: Started; log sequence number 0 150436
Apr 19 11:55:03 barney mysqld[3620]: 090419 11:55:03 [Note] /usr/sbin/mysqld: ready for connections.
Apr 19 11:55:03 barney mysqld[3620]: Version: '5.0.75-0ubuntu10' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu)

Revision history for this message
Will Uther (willu-mailinglists) wrote :

Barney - could you clarify your comment for me?

You say "Running mythtv-setup, removing all encoder cards and adding them again fixes this." Is this a permanent fix, or does the problem recur if you reboot?

If this is only a fix for that boot, there are easier ones. Anything that causes mythbackend to force a reschedule is a temporary fix (this includes restarting mythbackend, or just marking a new show to be recorded).

Revision history for this message
barney_1 (szczys) wrote :

Hi Will,

Thanks, you're right. Restarting mythtv-backend after reboot (sudo /etc/init.d/mythtv-backend restart) does bring the schedule back.

Is this bug what is causing my problem?

Revision history for this message
Will Uther (willu-mailinglists) wrote :

I thought I'd list what I discovered when originally looking at this bug. Much of this is mythtv specific. I'm currently of the opinion that there are issues in both mythtv and mysql, with the mythtv strange behaviour tickling a mysql issue.

On a problematic boot, mysql starts normally. This means that the mysqld_safe script is run. It starts the mysql daemon then waits for the daemon to stop. If the daemon stops normally (the pid file has been removed) then mysqld_safe exits. If the daemon didn't stop normally then mysqld_safe kills any remaining instances of mysqld running, before restarting the daemon.

mythbackend then starts. During startup, mythbackend does a reschedule of its recordings (i.e. it goes through the list of programs to be recorded and the epg and decides which ones to record with which tuner).

At some point around this time, mysqld_safe is detecting that the mysql daemon has stopped. It is then, as noted above, killing all remaining mysql daemon processes, before starting a fresh one. My understanding is that on my default desktop jaunty install I should only ever have one mysql daemon running at once (and a quick ps reveals that is what I currently have). If you look at the logs above, you'll see that mysqld_safe is detecting one mysql daemon running after it has already decided that one has failed. This is what I meant by the bug report title, "mysql thinks it has crashed when it hasn't". This seems to be a problem with a false detection of a crash causing the cleanup mechanism to actually kill the mysqld process.

When the mysqld process is killed in this way during a mythbackend reschedule, mythbackend fails to schedule any recordings. Once the database restarts, mythbackend can still access the database - e.g. you can see the epg, and you see the list of programs you've ask to be recorded. If you do anything that causes mythbackend to re-schedule recordings (e.g. change anything in the list of programs to be recorded - you can even change it back after the reschedule), then mythbackend will connect to the db, reschedule correctly and everything will be fine.

This is a little bit weird in that restarting mythbackend later doesn't cause mysql to restart again. It only seems to restart once. If there was a major problem with mythbackend then I would expect it to show up every time mythbackend is restarted.

It also doesn't seem to be a race condition (which was suggested by the fact that it only happens the first time). If I put a `sleep 10` at the start of the mythbackend init script it doesn't help.

summary: - mysql thinks it has crashed when it hasn't
+ mysqld_safe thinks mysqld has crashed when it hasn't
Revision history for this message
barney_1 (szczys) wrote :

I see that Mythbuntu 9.04 has been released and there is no mention of this problem in the release notes (http://www.mythbuntu.org/9.04/Release_notes).

I am still having this problem. After every reboot I have to manually restart the mythtv backend for my recording schedules to be read in from mythconverg (otherwise the list appears as empty and nothing records).

Does this bug exist in the stable release or am I having problems caused by some other issue?

Revision history for this message
Dave Johansen (davejohansen) wrote :

This should definitely be added to the release notes or fixed, but it appears that the problem can be resolved by performing the workaround in the original description.

Also, just doing something to interact with the database (like schedule a dummy recording), always worked to fix the problem for me before I applied the workaround.

Revision history for this message
Dewey_Oxberger (jonb-cirris) wrote :

On a fresh install of mythbuntu 9.04 I get this problem. The fix described above does not work for me. (It doesn't change the behavior at all as far as I can tell).

My situation is more complex since I use a HDHomeRun tuner (so I need the network working as well). Here is what I've tried:

1) The default boot has mysql coming up from rc2.d at something low like S10 (I can't remember). Then S31mythbackend, then S50NetworkManager. Fails 80% of the time. I boot to a crippled mythfrontend that has scrambled keymappings and a broken theme. If I kill the frontend, restart the backend, and run frontend it works.

2) Don't auto start mythfrontend or mythbackend. Still use S50NetworkManager to bring the network up. Boot. From the desktop manually start mythbackend, then start mythfrontend. Works every time.

3) Loads of combinations of NetworkManager, manually configuring the network, moving mysql around and mythbackend around in rc2.d. Using my SxxWaitForHDHomeRun script. It's hard to find any combination that works.

I wrote a script that does mysql STATUS queries and prints the uptime. Put it all over rc2.d. The server shows a constantly increasing uptime so I don't think it's getting killed. At S31 it shows 5 sec of uptime. Still, mythbackend can't see it. Some queries work, some fail.

Finally had to: a) manually config the network (to give time for the HDHomeRun to be available on the network), b) move mythbackend to S99zMythbackend so it starts up really late. Now it works 90% of the time but it still fails a query or two from time to time (but then they start working again).

Crazy things going on...

Revision history for this message
Mario Limonciello (superm1) wrote :

I've got a debdiff that i'd like to propose as a solution to this, but I would like to get some wider testing and feedback with it before uploading it to -proposed.

It's attached, and I have a test build on my PPA (version 5.1.30really5.0.75-0ubuntu10.1~ppa1) at https://launchpad.net/~superm1/+archive/ppa

Revision history for this message
barney_1 (szczys) wrote :

Thanks for making that build available on your ppa Mario. I just installed it and it seems to have fixed my problems. I can reboot and I don't get the mysqld errors or the behavior of my recorded schedules not being loaded into the mythbackend.

One surprise is that it also seems to fix a playback issues I'd been having. When playing a recording there had been a 2-3 second black screen before the recording started playback. After installing this updated package the recording starts immediately.

Thanks again!

Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
status: New → Confirmed
description: updated
Changed in mythbuntu:
milestone: jaunty-rc → none
status: Confirmed → Triaged
Revision history for this message
Martin Pitt (pitti) wrote :

Thanks, Mario. Assigned to Canonical server team for review.

Changed in mysql-dfsg-5.0 (Ubuntu Karmic):
assignee: nobody → Canonical Server Team (canonical-server)
Revision history for this message
Dewey_Oxberger (jonb-cirris) wrote :

I'm testing the patch. Works Great! Mario really is Super.

I'm a Mythbuntu 9.04 RC install that was fully updated as of yesterday (before the PPA test fix). I restored everything to the "error state" and verified the error still exists. The error state does have the problem. (mythbackend log shows failed mysql queries by the dozen).

Then I updated to the PPA. Zero defects. Works great. No query errors. Boots just fine.

One odd note. I had been tinkering with this so I did a script and "spammed" the script all through rc2.d. Here is the script:

#!/bin/bash
echo -e "Checking MySQL\n"
mysql --user=root --password=blahblah -e STATUS | grep -i uptime

It shows a continuously increasing uptime from boot in both the error state and in the fixed state. Zero difference. "uptime" must not mean what I think it means :)

Let me know if you want me to do a clean install and test just the patch. I'm set up to do that...

Revision history for this message
Martin Pitt (pitti) wrote :

Accepted mysql-dfsg-5.0 into jaunty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Dewey_Oxberger (jonb-cirris) wrote :

Martin,

Is it a good enough test to install over Mario's PPA or would you prefer I revert mysql, reproduce the error, then apply the proposed over the top of it?

Revision history for this message
Dewey_Oxberger (jonb-cirris) wrote :

Added jaunty-proposed and did a partial update of mysql. No problems found with mythbuntu. I you want a test on a clean install let me know.

Revision history for this message
Mathias Gug (mathiaz) wrote :

Thanks for taking to prepare a debdiff and conduct some testing. However I'm not comfortable with releasing an update that changes the signal handling, especially removing one.

I'd like to understand *why* is mysqld crashing in the first place. It seems that mythbackend is making mysqld crash. Could you enable the general query log (http://dev.mysql.com/doc/refman/5.0/en/query-log.html) and try to figure out which SQL statement makes the daemon crash?

If the proposed debdiff fixes the issue it means that something during the boot process sends a HUP signal to the mysql_safe process. I'd like to figure out what is sending the HUP signal to mysql_safe.

Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
status: Fix Committed → Confirmed
tags: removed: verification-needed
Revision history for this message
Mario Limonciello (superm1) wrote : Re: [Bug 326768] Re: mysqld_safe thinks mysqld has crashed when it hasn't

There are two problems at stake here actually:

1) something is sending HUP to mysqld_safe. That's not been determined
yet. Clearly it's some root process that is doing it. It can't be
mythbackend because it doesn't run as root.

2) mysqld_safe is not properly handling the HUP. When the trap goes off,
it's supposed to mearly send a refresh command to mysqld. Here's what's
actually happening:

1) essentially, "mysqld & wait" is running
So mysqld is running in the background, and the bash script is blocked by
the wait command.
2) the trap comes in and causes wait to exit status 128
3) the refresh is issued to mysqld
3) the bash script continues and runs kill -9 mysqld

So, the logic that's INTENDED by the patch isn't working.

On Thu, May 7, 2009 at 14:14, Mathias Gug <email address hidden> wrote:

> Thanks for taking to prepare a debdiff and conduct some testing. However
> I'm not comfortable with releasing an update that changes the signal
> handling, especially removing one.
>
> I'd like to understand *why* is mysqld crashing in the first place. It
> seems that mythbackend is making mysqld crash. Could you enable the
> general query log (http://dev.mysql.com/doc/refman/5.0/en/query-
> log.html <http://dev.mysql.com/doc/refman/5.0/en/query-%0Alog.html>) and
> try to figure out which SQL statement makes the daemon
> crash?
>
> If the proposed debdiff fixes the issue it means that something during
> the boot process sends a HUP signal to the mysql_safe process. I'd like
> to figure out what is sending the HUP signal to mysql_safe.
>
> ** Changed in: mysql-dfsg-5.0 (Ubuntu Jaunty)
> Status: Fix Committed => Confirmed
>
> ** Tags removed: verification-needed
>
> --
> mysqld_safe thinks mysqld has crashed when it hasn't
> https://bugs.launchpad.net/bugs/326768
> You received this bug notification because you are a member of Mythbuntu
> Developers, which is subscribed to Mythbuntu.
>

--
Mario Limonciello
<email address hidden>

Revision history for this message
Steve Langasek (vorlon) wrote :

Mathias has said this package shouldn't go to -updates in its current state, so I'm marking it verification-failed for now. Mathias, when you've had time to reevaluate the package and if you conclude it's suitable, please set it back to verification-needed; or if you upload a new version of the package instead, you shouldn't need to do anything further, we'll pick it up with the SRU team scripts.

tags: added: verification-failed
Revision history for this message
laga (laga) wrote :

Mathias,

i can trigger this problem on my laptop which is *not* running mythbackend by simply issuing 'sudo killall -HUP mysqld_safe'. So it's not related to MythTV.

In fact, if you issue 'mysqladmin refresh' as done by mysqld_safe when it catches a SIGHUP, mysqld will *not* crash. See Mario's explanation why the patch is broken.

How to break it:
sudo killall -HUP mysqld_safe

Expected behavior:
* 'mysqladmin refresh' is run
OR
* nothing happens, as intended by upstream

Resulting behavior:
* mysqld is restarted

Revision history for this message
DAP (akadap) wrote :

See this thread
http://ubuntuforums.org/showthread.php?p=7233888#post7233888

superm1 suggested I enable the Jaunty Proposed repository to see if it fixed this problem, it did not. I still get this in the log:

May 7 12:43:01 AComputer mysqld_safe[4938]: Number of processes running now: 1
May 7 12:43:01 AComputer mysqld_safe[4946]: mysqld process hanging, pid 3191 - killed
May 7 12:43:01 AComputer mysqld_safe[4949]: restarted

I am running 2 HDHomerun boxes, and an ATI HD Blunder card. I have a script in rc2.d that delays mythbackend until the HDHomerun boxes can be accessed through the network. I have also been experimenting with delaying mythbackend to see if I can avoid the database problem.
I currently have a 3 minute delay in the mythtv-backend script to delay the start of the backend. The msqld_safe problem tracks the start time of the backend. If I delay the backend start time by 3 minutes, the mysqld_safe problem shows up 3 minutes later in the log file.
There is an advantage to the delay however, it appears that if the backend start is delayed long enough, mysqld restarts fast enough that it does not screw up the backend (4 sucessful restarts with this setup so far). Nasty kludge, but it appears to be working.

Revision history for this message
Dewey_Oxberger (jonb-cirris) wrote :

Strange. I set everything back to default except I didn't use my SxxWaitForHDHomeRun script. I manually configured the network and had Network Manager shut off. I verified I was getting the problem, then applied the patch. I haven't had a single mysqld process hanging since the patch.

I suppose I should reconnect network manager and see what happens.

As for the 3 minute delay; I noticed adding delays in the script that start mythbackend didn't help but if I didn't launch mythbackend, boot, and manually start, it worked just fine. That kind of behavior really makes me thing some crazy thread race condition.

Revision history for this message
Dewey_Oxberger (jonb-cirris) wrote :

Probably no point in me doing this but I spent the evening trying to get the mysqld process hanging with the patch installed and never could. I'm using network manager and the WaitForHDHomeRun script and I'm not seeing any trouble.

I forgot to mention - before the patch I occasionally saw mysqld go 100% on cpu usage. Other people are seeing that as well:

http://bugs.mysql.com/bug.php?id=34312

Revision history for this message
Blackpaw (blackpaw) wrote :

I had exactly the same bugs - Mythbuntu 9.04, adding Mario's PPA (no other modifications) fixed it.

Thank you very much Mario!

Revision history for this message
DAP (akadap) wrote :

Installed a new update today, Lots of mysqld messages in the log file, none appear to be errors, all appear to occur when mysqld starts, none occurred when mythtv-backend started 3 minutes later.
Seems the bug may be fixed.

Revision history for this message
Will Uther (willu-mailinglists) wrote :

I patched mysqld_safe directly based on Mario's diff (and replace the "& wait" that I had removed). This fixes the problem for me.

I also looked in the logs to see what was happening on boot with the change and I noticed:

May 10 11:02:55 willvo sshd[2981]: Received signal 15; terminating.
May 10 11:02:55 willvo ntpd[3721]: ntpd exiting on signal 15

but both seem to be running. Postfix also reloads its configuration. Is someone (upstart?) telling all daemons to reload their configs for some reason?

Is there an easy way to track down who sends a signal?

It seems there are two problems here - the signal being sent, and mysqld_safe poor response to receiving it. It would be nice to get to the bottom of each of them.

description: updated
Revision history for this message
Mathias Gug (mathiaz) wrote : Re: [Bug 326768] Re: mysqld_safe thinks mysqld has crashed when it hasn't

Hi Will,

On Mon, May 11, 2009 at 01:35:31AM -0000, Will Uther wrote:
> I also looked in the logs to see what was happening on boot with the
> change and I noticed:
>
> May 10 11:02:55 willvo sshd[2981]: Received signal 15; terminating.
> May 10 11:02:55 willvo ntpd[3721]: ntpd exiting on signal 15
>
> but both seem to be running. Postfix also reloads its configuration.
> Is someone (upstart?) telling all daemons to reload their configs for
> some reason?

Thanks for figuring this out. This is probably the root cause of the
issue. I've seen similar behavior with sshd - could you describe your
network configuration? Are you using network-manager or
/etc/network/intrefaces? Are you using a static configuration or rely on
dhcp?

> It seems there are two problems here - the signal being sent, and
> mysqld_safe poor response to receiving it. It would be nice to get to
> the bottom of each of them.
>

Agreed. It seems that the latter has been identified and a workaround is
described in this bug. I think the former should also be well understood
before pushing anything to a *stable* release. A known broken behaviour
with a workaround is better then pushing an update that breaks existing
production systems.

--
Mathias Gug
Ubuntu Developer http://www.ubuntu.com

Revision history for this message
Mario Limonciello (superm1) wrote :

> Agreed. It seems that the latter has been identified and a workaround is
> described in this bug. I think the former should also be well understood
> before pushing anything to a *stable* release. A known broken behaviour
> with a workaround is better then pushing an update that breaks existing
> production systems.
I'm sorry, but I entirely disagree here.

Regardless of what is sending a SIGHUP to mysqld_safe, it should be a supported scenario to allow such signals to be sent to system daemons. It's common for SIGHUP to be used to ask to reload configuration files when the daemon supports it.

The "broken" patch from debian's sole purpose is adding support for catching SIGHUP and a few other signals. It doesn't work properly.

Ignoring the fact that mysqld is getting restarted rather than reloaded, the SIGHUP trap support to issue a refresh would *only* work if you configured /root/my.cnf or had no root mysql password defined in the first place.

How can reverting a portion of it break an existing production system in any way?

Revision history for this message
Mathias Gug (mathiaz) wrote :
Download full text (5.5 KiB)

On Mon, May 11, 2009 at 08:33:20PM -0000, Mario Limonciello wrote:
> Regardless of what is sending a SIGHUP to mysqld_safe, it should be a
> supported scenario to allow such signals to be sent to system daemons.
> It's common for SIGHUP to be used to ask to reload configuration files
> when the daemon supports it.

Right. However mysqld doesn't support reloading its configuration files
via SIGHUP since upstream mysqld_safe disables SIGHUP and starts the
mysqld process with the nohup command.

The patch was originally introduced to fix Debian bug 208364 [1].
[1]: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=208364

> The "broken" patch from debian's sole purpose is adding support for
> catching SIGHUP and a few other signals. It doesn't work properly.
>

As your analysis in comment 23 [2] shows the bug is introduced when the
wait command exits. According to the bash man page:

  If bash is waiting for a command to complete and receives a signal for
  which a trap has been set, the trap will not be executed until the
  command completes. When bash is waiting for an asynchronous command
  via the wait builtin, the reception of a signal for which a trap has
  been set will cause the wait builtin to return immediately with an
  exit status greater than 128, immediately after which the trap is
  executed.

[2]: https://bugs.launchpad.net/ubuntu/+source/mysql-dfsg-5.0/+bug/326768/comments/23

Here is a trace of the mysqld_safe script run with debugging on (with an
added echo $? to output the return code of the wait command):

+ nohup /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --
pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --socket =/var/run/mysqld/mysqld.sock
+ logger -p daemon.err -t mysqld_safe -i -t mysqld
+ wait

[.... sudo killall -HUP mysqld_safe from a terminal ....]

+ /usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf refresh
+ echo 129
129
+ test ! -f /var/run/mysqld/mysqld.pid
+ true
+ test 1 -eq 1
+ ps xaww
+ grep -v grep
+ grep /usr/sbin/mysqld\>
+ grep -c pid-file=/var/run/mysqld/mysqld.pid
+ numofproces=1
+ echo Number of processes running now: 1
+ logger -p daemon.err -t mysqld_safe -i -s
mysqld_safe[4867]: Number of processes running now: 1
+ I=1
+ test 1 -le 1
+ ps xaww
+ grep /usr/sbin/mysqld\>
+ grep -v grep
+ grep pid-file=/var/run/mysqld/mysqld.pid
+ sed -n $p
+ PROC= 4846 pts/1 S<l+ 0:00 /usr/sbin/mysqld --basedir=/usr --datadir=/var
/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-lo cking --port=3306 --socket=/var/run/mysqld/mysqld.sock
+ break
+ kill -9 4846
+ echo mysqld process hanging, pid 4846 - killed
+ logger -p daemon.err -t mysqld_safe -i -s
mysqld_safe[4875]: mysqld process hanging, pid 4846 - killed
+ expr 1 + 1
+ I=2
+ test 2 -le 1
+ echo restarted
+ logger -p daemon.err -t mysqld_safe -i -s
mysqld_safe[4878]: restarted
+ true
+ rm -f /var/run/mysqld/mysqld.sock /var/run/mysqld/mysqld.pid
+ test -z --port=3306 --socket=/var/run/mysqld/mysqld.sock
+ eval nohup /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --...

Read more...

Revision history for this message
Mathias Gug (mathiaz) wrote :

Here is another way to fix the issue: add a wait to the trap:

-+trap '/usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf refresh' 1 # HUP
++trap '/usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf refresh & wait' 1 # HUP

This patch doesn't remove an existing functionality (signals are handled by mysqld_safe).

I've attached a debdiff against ubuntu10 (currently in jaunty). I'll publish a package in my PPA and ask for more testing once done.

Revision history for this message
Mario Limonciello (superm1) wrote : Re: [Bug 326768] Re: mysqld_safe thinks mysqld has crashed when it hasn't

Mathias:

Spectacular idea. I just manually tried it on my test system that i've been
using to investigate this bug. It solves the problems entirely.

On Mon, May 11, 2009 at 22:04, Mathias Gug <email address hidden> wrote:

> Here is another way to fix the issue: add a wait to the trap:
>
> -+trap '/usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf
> refresh' 1 # HUP
> ++trap '/usr/bin/mysqladmin --defaults-extra-file=/etc/mysql/debian.cnf
> refresh & wait' 1 # HUP
>
> This patch doesn't remove an existing functionality (signals are handled
> by mysqld_safe).
>
> I've attached a debdiff against ubuntu10 (currently in jaunty). I'll
> publish a package in my PPA and ask for more testing once done.
>
> ** Attachment added:
> "mysql-dfsg-5.0_5.1.30really5.0.75-0ubuntu10.2.debdiff"
>
> http://launchpadlibrarian.net/26619255/mysql-dfsg-5.0_5.1.30really5.0.75-0ubuntu10.2.debdiff
>
> --
> mysqld_safe thinks mysqld has crashed when it hasn't
> https://bugs.launchpad.net/bugs/326768
> You received this bug notification because you are a member of Mythbuntu
> Developers, which is subscribed to Mythbuntu.
>

--
Mario Limonciello
<email address hidden>

Revision history for this message
Will Uther (willu-mailinglists) wrote : Re: [Bug 326768] Re: mysqld_safe thinks mysqld has crashed when it hasn't

On 12/05/2009, at 2:16 AM, Mathias Gug wrote:

> Thanks for figuring this out. This is probably the root cause of the
> issue. I've seen similar behavior with sshd - could you describe your
> network configuration? Are you using network-manager or
> /etc/network/intrefaces? Are you using a static configuration or
> rely on
> dhcp?

I have a static configuration in /etc/network/interfaces.

The reload seems to happen just before NetworkManager starts up.

Will :-}

Revision history for this message
Will Uther (willu-mailinglists) wrote :

I just tried manually adding Mathias' latest change. That also seems to solve the problem for me. (This was not a full re-install with that patch - just me with vim reverting the old patch and adding that patch. I hope I haven't made any mistakes :)

Revision history for this message
MarcRandolph (mrand) wrote :

I'm not sure if I'm late to the party or not here... but on the off chance that it would be helpful, I threw together a program to print out the source process ID for a SIGHUP signal. I have attached the source to this bug. Very simple to use... in this case:

1. mv mysqld_safe mysqld_safe_real
2. gcc sig_src.c -o mysqld_safe
3. Launch programs/services that cause crash
4. Watch stderr output, wherever that happens to be going. This could be modified to log to a file.

Revision history for this message
Will Uther (willu-mailinglists) wrote :

I just tried Marc's signal tracker: Here is the output after boot:

Executing (/usr/bin/mysqld_safe_real )
signo: [1], errno: [0], code: [128], pid: [0], uid: [0]

It looks like a code of 0x80 corresponds to a code of SI_KERNEL.

What does the kernel have against mysqld_safe?

Mathias Gug (mathiaz)
description: updated
tags: removed: verification-failed
Changed in mysql-dfsg-5.0 (Ubuntu Karmic):
assignee: Canonical Server Team (canonical-server) → Mathias Gug (mathiaz)
status: Confirmed → In Progress
Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
assignee: nobody → Mathias Gug (mathiaz)
status: Confirmed → In Progress
Changed in mysql-dfsg-5.0 (Debian):
status: Unknown → New
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted mysql-dfsg-5.0 into jaunty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Dewey_Oxberger (jonb-cirris) wrote :

Proposed works for me. No sign of mysql trouble in any of the logs.

Revision history for this message
Bill Meek (llibkeem) wrote :

Short answer: Fixed.

I was running on Mythbuntu 8.10 and didn't have these errors.

Fresh load of Mythbuntu 9.04RC, got the 'hanging' messages in syslog and database errors ending with
'MySQL server has gone away' in mythbackend.log. These messages are gone now.

Thank you.

Martin Pitt (pitti)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mysql-dfsg-5.0 - 5.1.30really5.0.75-0ubuntu10.2

---------------
mysql-dfsg-5.0 (5.1.30really5.0.75-0ubuntu10.2) jaunty-proposed; urgency=low

  * debian/patches/38_scripts__mysqld_safe.sh__signals.dpatch: wait in the
    SIGHUP trap to avoid killing an existing mysqld process when a HUP signal
    is sent to mysqld_safe. (LP: #326768)

 -- Mathias Gug <email address hidden> Mon, 11 May 2009 22:41:44 -0400

Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
status: Fix Committed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Mathias, please upload fix to Karmic ASAP.

Changed in mysql-dfsg-5.0 (Ubuntu Karmic):
importance: Undecided → High
Revision history for this message
Stephane Chazelas (stephane-chazelas) wrote :

that patch is still wrong.

The first time a HUP is received, we run the code in the trap
and call wait which will wait for both the refresh command and
the mysqld one.

But we won't return from that trap until mysqld dies, and in the
trap the HUP signal is blocked, which means any subsequent
HUP will not be handled.

A better way could be to implement some proper event handling as in:

trap : HUP INT QUIT TERM
while :; do
  action=
  mysqld ... &
  while :; do
    signal=NONE
    wait || signal=$(kill -l "$?")

    case $signal in
      (INT|TERM|QUIT) mysqladmin ... shutdown; exit;;
      (HUP) mysqladmin ... refresh;;
      (NONE) break;; # mysqld died
      (*) unexpected ... die;;
    esac
  esac
done

Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
status: Fix Released → Incomplete
Revision history for this message
Mathias Gug (mathiaz) wrote :

Thanks to the careful and detailed analysis of Stephane in http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=527623, it seems that there are two issues:

==========================================================
SIGHUP, SIGQUIT, SIGINT ignored after package installation

After an apt-get install SIGHUP (as well as SIGQUIT and SIGINT) is ignored by mysqld_safe. This is due to the fact that apt blocks SIGHUP, SIGQUIT and SIGINT which is propagated to dpkg, the maintainer shell scripts, the mysql init shell script and finally mysqld_safe (a shell script). However SIGTERM is not.

What are the consequences? A mysql refresh command cannot be sent to mysqld by sending a SIGHUP to mysqld_safe.

Since SIGTERM is not ignored mysqld_safe can properly shutdown mysqld.

It seems that this issue is not related to the bug described here and new bug should be opened instead.

===============================================================
SIGTERM may be ignored when bash is the default shell (/bin/sh)

If /bin/sh points to bash (not the default configuration), SIGHUP (and other signals) will *always* be ignored by mysqld_safe after the first SIGHUP has been sent to mysqld_safe. This is a change in behavior brought by the new upload.

The consequences are similar to the ones mentionned above with the added case that SIGTERM will also be ignored. While this is more problematic, it should be noted that the mysql init script tries *first* to shutdown mysqld via mysqladmin, and uses SIGTERM on mysqld (and not mysqld_safe) if it fails, and then SIGKILL on mysqld (and not mysqld_safe) if SIGTERM fails.

Before the upload the behavior was that on every SIGHUP (regardless of the shell), mysqld_safe would kill mysqld (via SIGKILL) and restart mysqld. Since mysqld was killed with SIGKILL a proper shutdown was *not* done, which could lead to data corruption.

The proposed upload removes a potential data corruption for mysqld except in the case where bash is used as the default shell (/bin/sh).

In this non-default configuration (bash as the default shell):

  * sending a SIGHUP to mysqld_safe would lead to:
    - before (ubuntu10): mysqld killed with SIGKILL and restarted (potential data corruption)
    - after (ubuntu10.2): refresh the first time, nothing on subsequent signaling.
    This is the bug reported here and the upload provides an improvement.

  * sending a SIGTERM to mysqld_safe would lead to:
    - before (ubuntu10): proper shutdown of mysqld
    - after (ubuntu10.2): proper shutdown of mysqld if no SIGHUP has been sent before, nothing if a SIGHUP has already been sent to mysqld_safe.
    This is a regression.

    However it should be noted that the mysql init script doesn't send signals to mysqld_safe when stopping mysql and is run before the sendsigs script during the system shutdown process (rc6).

Changed in mysql-dfsg-5.0 (Ubuntu Karmic):
status: In Progress → Triaged
Mathias Gug (mathiaz)
Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
importance: Undecided → Medium
Mathias Gug (mathiaz)
Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
status: Incomplete → Fix Released
Mathias Gug (mathiaz)
Changed in mysql-dfsg-5.0 (Ubuntu Karmic):
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.6 KiB)

This bug was fixed in the package mysql-dfsg-5.0 - 5.1.30really5.0.83-0ubuntu1

---------------
mysql-dfsg-5.0 (5.1.30really5.0.83-0ubuntu1) karmic; urgency=low

  [ Norbert Tretkowski ]
  * New upstream release.

  [ Mathias Gug ]
  * Merge from debian unstable, remaining changes:
   + debian/control:
    - Add a mysql-doc-5.0 as a Suggest to mysql-client-5.0, mysql-server-5.0
      and libmysql15-dev
    - Prepend XS-Original- to Vcs-{Browser, SVN}.
    - Lower mailx from a Recommends to a Suggests, which is pulling in
      exim4 on all installs fo mysql-server. (LP: #259477)
   + debian/rules:
     - Apply same configuration options on lpia as for i386.
     - Add -fno-strict-aliasing to fix FTBFS failures in the mysql
       testsuite.
     - Install mysqld AppArmor profile.
   + debian/additions/my.cnf:
     - Error message files are located in a different directory in MySQL 5.1.
       Setting the language option to use /usr/share/mysql/ breaks 5.1.
       Both 5.0 and 5.1 use a default value that works. (LP: #316974).
     - remove skip-bdb option. This option is not available in 5.1 anymore.
       Moreover 5.0 isn't build with the BerkeleyDB engine. (LP: #316849)
   + Create mysql-server-core-5.0 package for files needed by Akonadi.
   + Add mysqld AppArmor profile:
     - debian/apparmor-profile: mysqld AppArmor profile.
     - debian/mysql-server-5.0.README.Debian: add a note about mysqld AppArmor
       profile.
     - debian/mysql-server-5.0.files: ship mysqld AppArmor profile.
     - debian/mysql-server-5.0.postinst: reload mysqld AppArmor profile.
     - debian/rules: install mysqld AppArmor profile.
     - debian/additions/my.cnf: add warning about apparmor. (LP: #201799)
   + Fixes random build failures:
     - 90_upstream_bug_23921.dpatch (http://bugs.mysql.com/bug.php?id=23921).
     - 91_ubuntu_buildd_testfailures.dpatch: disable tests that fail
       randomly only on the amd64 buildd.
   + Fix log-slow-queries option: (LP: #183762)
     - debian/patches/57-fix-mysqlslowdump-config.dpatch: support
       log_slow_queries option in mysqlslowdump.
     - debian/additions/my.cnf: use log-slow-queries instead of
       log_slow_queries.
  * debian/patches/38_scripts__mysqld_safe.sh__signals.dpatch: wait in the
    SIGHUP trap to avoid killing an existing mysqld process when a HUP
    signal is sent to mysqld_safe. (LP: #326768)
  * Dropped from Ubuntu:
    + debian/patches/92_ssl_test_cert.dpatch: Fixed in 5.0.78.
    + Fix man page conflicts with mysql-doc-5.0 when upgrading from gutsy
      for mysql-server-5.0, mysql-client-5.0, and libmysqlclient15-dev
      packages.
    + Replaces and Conflicts apparmour-profiles << 2.1+1075-0ubuntu4 to allow
      proper upgrades from gutsy.
    + Follow ApparmorProfileMigration and force apparmor complain mode on some
      upgrades (LP: #203531)
      - debian/control: Recommends apparmor >= 2.1+1075-0ubuntu6
      - debian/mysql-server-5.0.dirs: add etc/apparmor.d/force-complain
      - debian/mysql-server-5.0.preinst: create symlink for force-complain/
        on pre-feisty upgrades, upgrades where apparmor-profiles profile is
        unchanged (ie non-enforcing) and upgrades wher...

Read more...

Changed in mysql-dfsg-5.0 (Ubuntu Karmic):
status: In Progress → Fix Released
jflash (cowgirlup2day)
Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
status: Fix Released → Fix Committed
Revision history for this message
Steve Langasek (vorlon) wrote :

Do not change bug statuses without explanation.

Changed in mysql-dfsg-5.0 (Ubuntu Jaunty):
status: Fix Committed → Fix Released
Changed in mythbuntu:
status: Triaged → Fix Released
Changed in mysql-dfsg-5.0 (Debian):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.