Mir

Fatal exceptions raised in a compositing thread have no usable stack trace

Bug #1237332 reported by Daniel van Vugt on 2013-10-09

128

This bug affects 43 people

Affects		Status	Importance	Assigned to	Milestone
	Mir	Fix Released	Critical	Daniel van Vugt	Mir 0.1.6
	mir (Ubuntu)	Fix Released	Critical	Unassigned

Bug Description

Fatal exceptions raised in the Mir code have no usable stack trace.

For example, if I insert this in the compositor code:

throw std::runtime_error("He's dead, Jim");

And then run it:

terminate called after throwing an instance of 'std::runtime_error'
what(): He's dead, Jim
Aborted (core dumped)

The resulting core file has no trace of where the error occurred:

(gdb) bt
#0 0x00007f1d7d720f77 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f1d7d7245e8 in __GI_abort () at abort.c:90
#2 0x00007f1d7dd276e5 in __gnu_cxx::__verbose_terminate_handler() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007f1d7dd25856 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007f1d7dd25883 in std::terminate() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007f1d7dd78805 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007f1d7dfd2f6e in start_thread (arg=0x7f1d6c63b700)
    at pthread_create.c:311
#7 0x00007f1d7d7e3ecd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Related branches

lp:~vanvugt/mir/fix-1237332

Merged into lp:mir at revision 1432

Daniel van Vugt: Approve on 2014-02-27

PS Jenkins bot (community): Approve (continuous-integration) on 2014-02-27

Kevin DuBois (community): Approve on 2014-02-26

Alexandros Frantzis (community): Approve on 2014-02-26

Alan Griffiths: Approve on 2014-02-26

lp:ubuntu/trusty-proposed/mir

lp:~vanvugt/mir/crashing-server

On hold for merging into lp:mir

PS Jenkins bot (community): Approve (continuous-integration) on 2014-05-05

Alan Griffiths: Needs Fixing on 2014-05-02

Kevin DuBois (community): Needs Information on 2014-05-01

Mir development team: Pending requested 2014-05-19

Daniel van Vugt (vanvugt) on 2013-10-17

Changed in mir:
status:	New → Triaged

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-02-10:

This is becoming painful. Real world reports don't get any stack trace.

Changed in mir:
importance:	High → Critical

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-02-26:

It seems there's a lot of "unwinding" going on. Eliminating the top-level try/catch statements in our demo server main()'s solves the problem in some cases, but most exceptions from secondary threads still have the problem. The resulting core files have no useful stack trace.

Revision history for this message

Alan Griffiths (alan-griffiths) wrote on 2014-02-26:

Exceptions are not useful for reporting fatal conditions. That's what abort() is for.

The expectation when throwing an exception is that it will be caught (on the same thread) and dealt with somewhere that has the necessary context. (Possibly by calling abort().)

The core you're seeing is not as a result of the exception, but a result of a *failure to handle it*.

Having said that, it can be useful in debugging to trap a stack trace when throwing an exception. (I've seen code for doing that, maybe I should track it down - it does add to the cost of throwing exceptions.)

Changed in mir:
status:	Triaged → Invalid

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-02-26:

I installed package "libstdc++6-4.8-dbg" and got the missing symbols, but overall still not a useful stack trace:

(gdb) bt
#0 0x00007f755d227f79 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f755d22b388 in __GI_abort () at abort.c:89
#2 0x00007f755d82d6b5 in __gnu_cxx::__verbose_terminate_handler ()
    at ../../../../src/libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x00007f755d82b836 in __cxxabiv1::__terminate (handler=<optimised out>)
    at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:38
#4 0x00007f755d82b863 in std::terminate ()
    at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:48
#5 0x00007f755d87ec85 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimised out>)
    at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:92
#6 0x00007f755df4b182 in start_thread (arg=0x7f7553223700)
    at pthread_create.c:312
#7 0x00007f755d2ec12d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

Revision history for this message

Thomas Voß (thomas-voss) wrote on 2014-02-26:

Could you please try to get a backtrace with "t a a bt"?

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-02-26:

Seems to be a gcc and/or C++ spec problem :(
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55917

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-02-26:

OK, you can argue that's how C++ is designed. But that's not helpful. When our code crashes in the real world, how to we know what to fix?

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-02-26:

Thomas: I already did that. The other threads are unrelated to the crash and running "normally". What I have pasted is the crashing thread.

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-02-26:

The workaround of adding "noexcept" to all functions/functors passed to sd::thread appears to work nicely. Stack traces are suddenly complete and are useful. utI suspect it's technically a kludge though.

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-02-26:

#10

OK, since most of the real world instances of this bug (see the list of duplicates) are (probably) from within the compositing threads, and our compositing std::threads seem trivial to apply the workaround to, I'll just make this bug about compositing threads.

Changed in mir:
status:	Invalid → In Progress
assignee:	nobody → Daniel van Vugt (vanvugt)
milestone:	none → 0.1.6
summary:	- Fatal exceptions raised in the Mir code have no usable stack trace + Fatal exceptions raised in a compositing thread have no usable stack + trace

Revision history for this message

Alan Griffiths (alan-griffiths) wrote on 2014-02-26:

#11

We should write code that accepts "the way C++ is designed".

1. We should never allow exceptions to propagate out of the function passed to std::thread
2. If we encounter a "fatal" condition we should record enough context to debug it (and not just throw an exception in the blind hope of getting useful information later)

As I mentioned above, we can emulate Java and attach a stack trace to exceptions. But this has the cost of scanning the stack and debug info when the exception is thrown - and for normal exceptions that isn't needed (as they are caught and things go on as normal).

In any case, to extract the stack trace we'd likely need to catch the exception somewhere - and the failure to do that is the real problem underlying this bug report.

Revision history for this message

Alan Griffiths (alan-griffiths) wrote on 2014-02-26:

#12

Daniel, *fatal* exceptions are the problem not the victim.

Revision history for this message

kevin gunn (kgunn72) wrote on 2014-02-26:

#13

related https://bugs.launchpad.net/mir/+bug/1285084
as indicated, the fundamental problem is not catching & handling in the first place

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-02-27:

#14

The fatal exceptions in question are (from memory) mostly in the DRM code (running in the compositor thread) from the XMir public testing. I know this only from the log files showing single line exception output.

These are not things we can handle at runtime in any sensible way. They are exceptional circumstances which are not recoverable. The most sensible and useful thing you can do is produce a clean core file and stack trace to facilitate fixing of the bug so it doesn't happen again.

See all the duplicates of this bug...

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2014-02-27:

#15

Fix committed into lp:mir/devel at revision None, scheduled for release in mir, milestone Unknown

Changed in mir:
status:	In Progress → Fix Committed

Daniel van Vugt (vanvugt) on 2014-02-28

Changed in mir:
status:	Fix Committed → Fix Released

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-03-11:

#16

This bug was fixed in the package mir - 0.1.6+14.04.20140310-0ubuntu1
---------------
mir (0.1.6+14.04.20140310-0ubuntu1) trusty; urgency=medium

Changed in mir (Ubuntu):
importance:	Undecided → Critical
status:	New → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

gcc-bugzilla #55917
[RESOLVED FIXED] Edit

Bug watches keep track of this bug in other bug trackers.