Xorg crashes do not work with apport

Bug #226668 reported by Tristan Schmelcher
4
Affects Status Importance Assigned to Milestone
xorg-server (Ubuntu)
Fix Released
Wishlist
Bryce Harrington

Bug Description

Binary package hint: xserver-xorg-core

I have a Dell XPS M1710 laptop running pure up-to-date Hardy final. I have observed that crashes of the X server (see other bugs I've filed) do not trigger apport, at least not the crashes that I've had. There is no file created in /var/crash and the apport GUI never appears after logging back in. I suspect that this is because whatever technique the X server uses to write the backtrace to its log is preventing the crashes from reaching apport. Ideally, X server crashes should go to its log _and_ apport. If that is not feasible though, I think it would be better for them to go to apport than the log.

$ lsb_release -rd
Description: Ubuntu 8.04
Release: 8.04
$ apt-cache policy xserver-xorg-core
xserver-xorg-core:
  Installed: 2:1.4.1~git20080131-1ubuntu9
  Candidate: 2:1.4.1~git20080131-1ubuntu9
  Version table:
 *** 2:1.4.1~git20080131-1ubuntu9 0
        500 http://archive.ubuntu.com hardy/main Packages
        100 /var/lib/dpkg/status

Related branches

Revision history for this message
Tristan Schmelcher (tschmelcher) wrote :
Bryce Harrington (bryce)
Changed in xorg-server:
assignee: nobody → bryceharrington
importance: Undecided → Wishlist
status: New → Confirmed
Bryce Harrington (bryce)
Changed in xorg-server:
status: Confirmed → In Progress
Revision history for this message
Bryce Harrington (bryce) wrote :

X server has it's own signal handler which is what catches failures and prints the backtrace in the log. The signal handler doesn't re-raise the signals it handles, and instead simply abort()'s.

Revision history for this message
Martin Pitt (pitti) wrote :

I built an X server with this patch and tested it. However, apport does not seem to be called at all. The X.org log says "re-raising 11", and in strace I see:

7363 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
7363 rt_sigaction(SIGSEGV, {SIG_IGN}, {0x80b4ce0, [SEGV], SA_RESTART}, 8) = 0
[...]
7363 write(2, " ddxSigGiveUp: re-raising 11\n", 29) = 29
7363 tgkill(7363, 7363, SIGSEGV) = 0
7363 exit_group(1) = ?

This looks a little weird, since tgkill() expects a "thread id". I am not actually sure what this means, and whether it is actually identical to the process ID. If I do

  strace -f sh -c 'kill -SEGV $$'

I get

kill(7022, SIGSEGV) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV (core dumped) +++

NB the "core dumped", which I do *not* see for the X process. (and thus apport is never called).

raise(3) indicates that in a multi-threaded program, raise() is not equivalent to kill(), but to pthread_kill(pthread_self(), sig); I guess this eventually leads to calling tgkill().

So at this point I'm lost. It might be a kernel bug that it doesn't dump core for tgkill(), or something weird about signal handling in multi-threaded programs which I don't understand.

Revision history for this message
Martin Pitt (pitti) wrote :

Hm, I changed the raise(signo) to kill(getpid(), signo). The strace duefully now has kill() instead of tgkill(), but it doesn't help. It just doesn't coredump. When X exits,it doesn't even show "+++ killed by SIGSEGV".

Revision history for this message
Bryce Harrington (bryce) wrote :

Thanks for the additional analysis Martin, glad it wasn't just a trivial oversight on my part!

I spoke with Colin about this. I will keep working on this with a focus to get it in early in jaunty. Since apport is used more during the development period than after release, this probably makes the best sense.

Due to the complexity of the patch (and probable additional complexity when we figure out why the raise() didn't work), this is something I'd like to discuss with upstream after the release is out, and hopefully come to a solution that can be taken upstream.

Revision history for this message
Bryce Harrington (bryce) wrote :

What's weird is that this *was* working a little while ago. See bug 274693 for example. 284470 is a fresh one from yesterday with fairly up-to-date intrepid. I'd expect it to either work, or not work, but it appears to be working *sometimes*.

Also, when I was testing the most recent version of the rethrow patch, it was definitely working when I killed the process synthetically. Often the signal handler was getting called twice before it crashed properly, and there was one corner case (which I couldn't reproduce) where it didn't generate a crash report, but in the general case it did work. But now I can't get it to trigger at all.

Revision history for this message
Bryce Harrington (bryce) wrote :

Hmm, you know in thinking about it more, I bet I had this set on the system I had done the initial development on:

Section "ServerFlags"
        Option "NoTrapSignals" "true"
EndSection

(Our backtracing docs suggest having this on for debugging purposes - https://wiki.ubuntu.com/X/Backtracing)

However, we don't ship that on by default, and the system I'm trying to use now is running with all stock defaults and does not have this enabled. When I get some time to investigate more after the release, I'll try with and without this setting. This may explain why it's working sometimes.

Revision history for this message
Martin Pitt (pitti) wrote :

Bryce and I finally tracked it down. This is the updated 135_rethrow_signals.patch.

Changed in xorg-server:
status: In Progress → Fix Committed
Revision history for this message
Bryce Harrington (bryce) wrote :

Previous patch was missing some important bits, causing it to FTBS. Here's the more complete version. I also updated it to apply to 1.6 without any fuzzy matching.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xorg-server - 2:1.5.99.3-0ubuntu3

---------------
xorg-server (2:1.5.99.3-0ubuntu3) jaunty; urgency=low

  [Timo Aaltonen]
  * debian/rules: Disable builtin fonts (LP: #308649)

  [Bryce Harrington]
  * 135_rethrow_signals.patch: Update for 1.6 and re-enable.
    (LP: #226668)

 -- Bryce Harrington <email address hidden> Tue, 16 Dec 2008 19:04:14 -0800

Changed in xorg-server:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.