Mir

Mir servers crash with SIGSEGV in libhybris-common.so.1 on Nexus7 when using the hwcomposer (tegra3)

Bug #1231917 reported by Daniel van Vugt
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mir
Fix Released
Medium
Kevin DuBois
android-src-vendor (Ubuntu)
Fix Released
Medium
Ricardo Salveti
libhybris (Ubuntu)
Invalid
Medium
Unassigned
mir (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Mir servers crash with SIGSEGV in libhybris-common.so.1 on Nexus7, when using android's hw composer.

gdb ./mir_demo_server_basic
...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x494e2460 (LWP 7515)]
0x40c0f880 in ?? () from /usr/lib/arm-linux-gnueabihf/libhybris-common.so.1
(gdb) bt
#0 0x40c0f880 in ?? ()
   from /usr/lib/arm-linux-gnueabihf/libhybris-common.so.1
#1 0x41be1dbc in ?? ()
#2 0x41be1dbc in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

This might be the same issue crashing integration-tests.... ?

Tags: nexus7

Related branches

Revision history for this message
Kevin DuBois (kdub) wrote :

old nexus 7 (nvidia hardware) or new nexus 7 (qcom hardware)?

it might be that the hwc isn't working for the nexus 7. You can try to mv /system/lib/hw/hwcomposer.* out of that directory, which will force mir into a backup composition mode.

Revision history for this message
Pete Woods (pete-woods) wrote :

Thanks for the suggestion, Kevin. Unfortunately this doesn't seem to help. :(

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libhybris (Ubuntu):
status: New → Confirmed
Revision history for this message
Pete Woods (pete-woods) wrote :

I should note that is old Nexus hardware (i.e. tegra3).

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Yes, old Nexus 7 (Tegra 3).

Changed in mir:
status: New → Confirmed
importance: Undecided → Low
Changed in libhybris (Ubuntu):
importance: Undecided → Low
Revision history for this message
Ricardo Salveti (rsalveti) wrote :

I did a quick check, and noticed that it crashes in my_pthread_mutex_lock, a hybris hook for the original pthread_mutex_lock.

The problem is that the mutex used (__mutex) contains an anddress that's not part of the mapped memory region for the process, causing a segfault when we try to access the content of such variable.

To make things even more complicated, the call is originated by /system/lib/hw/gralloc.tegra3.so, and if you also refuse to handle the lock, it'll crash inside the library later on.

Revision history for this message
Ricardo Salveti (rsalveti) wrote :

Seems to be related with the hwcomposer, as after removing /system/lib/hw/hwcomposer.tegra3.so I'm able to start mir_demo and run a few examples.

Unity8 is crashing still it seems, still debugging.

Changed in mir:
importance: Low → Medium
Changed in libhybris (Ubuntu):
importance: Low → Medium
Revision history for this message
Ricardo Salveti (rsalveti) wrote :

Ok, the other unity8 related crash was because we didn't had the permissions set right. Will be pushing a change to lxc-android-config to get that fixed.

Also uploading a new android package without the hwcomposer for grouper, until we're able to ignore it from Mir itself (could use getprop to check if grouper, and not load it).

Hopefully tomorrow we should have an image which is capable of running Mir. There's still one rendering issue when opening apps and moving back to the shell, but will report that properly once the image is out.

summary: - Mir servers crash with SIGSEGV in libhybris-common.so.1 on Nexus7
+ Mir servers crash with SIGSEGV in libhybris-common.so.1 on Nexus7 when
+ using the hwcomposer (tegra3)
Revision history for this message
Pete Woods (pete-woods) wrote :

Update: unity appears to no-longer crash on startup when using Mir on my Nexus 7.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Seems to be fixed in:

libhybris (0.1.0+git20130606+c5d897a-0ubuntu33) saucy; urgency=low

Changed in mir:
status: Confirmed → Invalid
Changed in libhybris (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Ricardo Salveti (rsalveti) wrote :

Marking as valid still as it happens when using the android hwcomposper.

Changed in mir:
status: Invalid → Confirmed
Changed in libhybris (Ubuntu):
status: Fix Released → Confirmed
description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

OK, but why reopen for Mir? What needs fixing in Mir?

Changed in mir:
status: Confirmed → Incomplete
Revision history for this message
Ricardo Salveti (rsalveti) wrote :

After a bit of more investigation, this actually seems to be a bug in the way Mir is using the hwcomposer interface (not using the 2 layers, and also with a different init/flags set).

The crash happens because it tries to use a pthread_mutex that is not initialized. From our current investigation nothing is calling/using this pthread_mutex before the crash, so it's probably due a different code path used with Mir.

Changed in mir:
status: Incomplete → Confirmed
assignee: nobody → Kevin DuBois (kdub)
Revision history for this message
Ricardo Salveti (rsalveti) wrote :

Assigned to kdub as he's currently investigating the issue.

Revision history for this message
Kevin DuBois (kdub) wrote :

the issue was that the hwcomposer was relying on HWC_GEOMETRY_CHANGED flag to do some initialization of its internal data structures (including some pthread mutexes). Mir was not setting this flag leaving those mutexes uninitialized. When hwc tried to lock the uninitialized mutexes, we would segfault in hybris at that point.

Fixed by
1) setting HWC_GEOMETRY_CHANGED during posting
2) submitting a skipped gles layer to force posting everytime.

branch with fix is linked

Changed in mir:
status: Confirmed → In Progress
Revision history for this message
Kevin DuBois (kdub) wrote :

for the reports about 'my nexus 7 works'....

In mir, we first try to load the HWC display mechanism, and then, if we can't load HWC, we load an alternative display mechanism (FB)

We saw that the HWC was segfaulting, so we removed hwcomposer.tegra3.so from the build. This forced FB composition.
After the fix lands, we can safely add back hwcomposer.tegra3.so to the build.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I assume "FB composition" is not desirable, and slower... ?

Changed in mir:
milestone: none → 0.1.2
Kevin DuBois (kdub)
Changed in mir:
status: In Progress → Fix Committed
Changed in mir:
status: Fix Committed → Fix Released
Changed in libhybris (Ubuntu):
status: Confirmed → Triaged
Changed in mir (Ubuntu):
status: New → Triaged
importance: Undecided → High
importance: High → Medium
Revision history for this message
Ricardo Salveti (rsalveti) wrote :

Adding bug task to android-src-vendor so we can add hwcomposer.tegra3.so back once latest mir is available in the archive.

Changed in libhybris (Ubuntu):
status: Triaged → Invalid
Changed in android-src-vendor (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Ricardo Salveti (rsalveti)
Changed in mir (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.3 KiB)

This bug was fixed in the package mir - 0.1.2+14.04.20131128.1-0ubuntu1

---------------
mir (0.1.2+14.04.20131128.1-0ubuntu1) trusty; urgency=low

  [ Kevin Gunn ]
  * New upstream release 0.1.2
    - graphics: android: improve interface for mga::DisplayDevice so its
      just concerned with rendering and posting.
    - surfaces: rename "surfaces" component to "scene".
    - surfaces, shell: Migrate Session data model from shell to surfaces.
    - graphics: change fill_ipc_package() to use real pointers.
    - mir_client_library.h: Fix typo "do and locking" should be "do any
      locking".
    - API enumerations cleanup: Remove slightly misleading *_enum_max_
      values, and replace them with more accurate plural forms.
    - test_android_communication_package: Do not expect opened fd to be >0,
      we may have closed stdin making this a valid value (LP: #1247718).
    - Update docs about running Mir on the desktop to mention new package
      ubuntu-desktop-mir.
    - offscreen: Add a display that renders its output to offscreen buffers
    - graphics: android: fix regression for hwc1.0 devices introduced in r1228
      (LP: #1252433).
    - OffscreenPlatform provides the services that the offscreen display
      needs from the Platform.
    - graphics: android: consolidate the GLContexts classes in use.
    - Fix uninitialized variable causing random drm_auth_magic test
      failures. (LP: #1252144).
    - Add a fullyish functional Udev wrapper. This currently sits in
      graphics/gbm, but will be moved to the top-level when input device
      detection migrates.
    - Add resizing support to example code; demo-shell and clients.
    - eglapp: Clarify messages about pixel formats (LP: #1168304).
    - Adds support to the MirMotionEvent under pointer_coordinates called
      tool_type. This will allow clients to tell what type of tool is
      being used, from mouse/finger/etc. (LP: #1252498)
    - client,frontend: Report the real available surface pixel formats to
      clients. (LP: #1240833)
    - graphics: android: 1) change hwc1.1 to make use of sync fences during
      the compositor's gl renderloop. Note that we no longer wait for the
      render to complete, we pass this responsibility to the driver and the
      kernel. 2) support nexus 10. (LP: #1252173) (LP: #1203268)
    - shell: don't publish SurfacesContainer - it can be private to shell.
    - gbm: Don't mess up the VT mode on setup failure Only restore the
      previous VT mode during shutdown if it was VT_AUTO.
    - Fix a crash due to a failed eglMakeCurrent() call when in nested mode.
    - shell: unity-mir uses shell::FocusSetter - make the header public again
    - Add resize support to client surfaces (mir::client::MirSurface).
    - graphics: android: support 'old aka 2012' nexus 7 hwc (nvidia tegra3
      SoC) better. (LP: #1231917)
    - Add resize support to *ClientBuffer classes. Now always get dimensions
      from the latest buffer package.
    - android: support driver hooks for the Mali T604 (present in nexus 10)
    - Add width and height to the protocol Buffer messages, in preparation
      for resizable surfaces.
    - surfaces, shell, logging, te...

Read more...

Changed in mir (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

android-src-vendor (5-0ubuntu1) trusty; urgency=medium

  * Adding back hwcompositor for grouper as it's now compatible with
    MIR

 -- Ricardo Salveti de Araujo <email address hidden> Tue, 14 Jan 2014 22:40:23 -0200

Changed in android-src-vendor (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.