Mir

Fatal exceptions raised in a compositing thread have no usable stack trace

Bug #1237332 reported by Daniel van Vugt
128
This bug affects 43 people
Affects Status Importance Assigned to Milestone
Mir
Fix Released
Critical
Daniel van Vugt
mir (Ubuntu)
Fix Released
Critical
Unassigned

Bug Description

Fatal exceptions raised in the Mir code have no usable stack trace.

For example, if I insert this in the compositor code:

    throw std::runtime_error("He's dead, Jim");

And then run it:

terminate called after throwing an instance of 'std::runtime_error'
  what(): He's dead, Jim
Aborted (core dumped)

The resulting core file has no trace of where the error occurred:

(gdb) bt
#0 0x00007f1d7d720f77 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f1d7d7245e8 in __GI_abort () at abort.c:90
#2 0x00007f1d7dd276e5 in __gnu_cxx::__verbose_terminate_handler() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007f1d7dd25856 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007f1d7dd25883 in std::terminate() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007f1d7dd78805 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007f1d7dfd2f6e in start_thread (arg=0x7f1d6c63b700)
    at pthread_create.c:311
#7 0x00007f1d7d7e3ecd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Related branches

Changed in mir:
status: New → Triaged
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This is becoming painful. Real world reports don't get any stack trace.

Changed in mir:
importance: High → Critical
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It seems there's a lot of "unwinding" going on. Eliminating the top-level try/catch statements in our demo server main()'s solves the problem in some cases, but most exceptions from secondary threads still have the problem. The resulting core files have no useful stack trace.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Exceptions are not useful for reporting fatal conditions. That's what abort() is for.

The expectation when throwing an exception is that it will be caught (on the same thread) and dealt with somewhere that has the necessary context. (Possibly by calling abort().)

The core you're seeing is not as a result of the exception, but a result of a *failure to handle it*.

Having said that, it can be useful in debugging to trap a stack trace when throwing an exception. (I've seen code for doing that, maybe I should track it down - it does add to the cost of throwing exceptions.)

Changed in mir:
status: Triaged → Invalid
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I installed package "libstdc++6-4.8-dbg" and got the missing symbols, but overall still not a useful stack trace:

(gdb) bt
#0 0x00007f755d227f79 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f755d22b388 in __GI_abort () at abort.c:89
#2 0x00007f755d82d6b5 in __gnu_cxx::__verbose_terminate_handler ()
    at ../../../../src/libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x00007f755d82b836 in __cxxabiv1::__terminate (handler=<optimised out>)
    at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:38
#4 0x00007f755d82b863 in std::terminate ()
    at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:48
#5 0x00007f755d87ec85 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimised out>)
    at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:92
#6 0x00007f755df4b182 in start_thread (arg=0x7f7553223700)
    at pthread_create.c:312
#7 0x00007f755d2ec12d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

Revision history for this message
Thomas Voß (thomas-voss) wrote :

Could you please try to get a backtrace with "t a a bt"?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Seems to be a gcc and/or C++ spec problem :(
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55917

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

OK, you can argue that's how C++ is designed. But that's not helpful. When our code crashes in the real world, how to we know what to fix?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thomas: I already did that. The other threads are unrelated to the crash and running "normally". What I have pasted is the crashing thread.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The workaround of adding "noexcept" to all functions/functors passed to sd::thread appears to work nicely. Stack traces are suddenly complete and are useful. utI suspect it's technically a kludge though.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

OK, since most of the real world instances of this bug (see the list of duplicates) are (probably) from within the compositing threads, and our compositing std::threads seem trivial to apply the workaround to, I'll just make this bug about compositing threads.

Changed in mir:
status: Invalid → In Progress
assignee: nobody → Daniel van Vugt (vanvugt)
milestone: none → 0.1.6
summary: - Fatal exceptions raised in the Mir code have no usable stack trace
+ Fatal exceptions raised in a compositing thread have no usable stack
+ trace
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

We should write code that accepts "the way C++ is designed".

1. We should never allow exceptions to propagate out of the function passed to std::thread
2. If we encounter a "fatal" condition we should record enough context to debug it (and not just throw an exception in the blind hope of getting useful information later)

As I mentioned above, we can emulate Java and attach a stack trace to exceptions. But this has the cost of scanning the stack and debug info when the exception is thrown - and for normal exceptions that isn't needed (as they are caught and things go on as normal).

In any case, to extract the stack trace we'd likely need to catch the exception somewhere - and the failure to do that is the real problem underlying this bug report.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Daniel, *fatal* exceptions are the problem not the victim.

Revision history for this message
kevin gunn (kgunn72) wrote :

related https://bugs.launchpad.net/mir/+bug/1285084
as indicated, the fundamental problem is not catching & handling in the first place

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The fatal exceptions in question are (from memory) mostly in the DRM code (running in the compositor thread) from the XMir public testing. I know this only from the log files showing single line exception output.

These are not things we can handle at runtime in any sensible way. They are exceptional circumstances which are not recoverable. The most sensible and useful thing you can do is produce a clean core file and stack trace to facilitate fixing of the bug so it doesn't happen again.

See all the duplicates of this bug...

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir/devel at revision None, scheduled for release in mir, milestone Unknown

Changed in mir:
status: In Progress → Fix Committed
Changed in mir:
status: Fix Committed → Fix Released
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This bug was fixed in the package mir - 0.1.6+14.04.20140310-0ubuntu1
---------------
mir (0.1.6+14.04.20140310-0ubuntu1) trusty; urgency=medium

Changed in mir (Ubuntu):
importance: Undecided → Critical
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.