Mir

[regression] mali, powervr locks up with around the introduction or removal of a third overlay

Bug #1413211 reported by Kevin DuBois
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mir
Fix Released
High
Kevin DuBois
0.12
Fix Released
High
Kevin DuBois
mir (Ubuntu)
Fix Released
High
Unassigned

Bug Description

To reproduce:
Run mir_demo_server with overlays enabled and 3 onscreen surfaces. It will lock up within a few iterations of introducing and removing the 3rd surface.

Should not affect the current stack (as we always have <= 2 surfaces)

Related branches

Revision history for this message
Kevin DuBois (kdub) wrote :

appears to be a regression. Rev 2100 is capable of this operation, still working on pinpointing exact revision

summary: - n10 locks up with around the introduction or removal of a third overlay
+ [regression] n10 locks up with around the introduction or removal of a
+ third overlay
Kevin DuBois (kdub)
Changed in mir:
status: New → In Progress
Revision history for this message
Kevin DuBois (kdub) wrote : Re: [regression] n10 locks up with around the introduction or removal of a third overlay

It appears that the introduction of the software cursor (rev 2225) is causing this to happen. Perhaps the usage bits on the buffer memory backing the cursor are causing hwc problems... have to dig a bit. In the meantime, might back out

summary: - [regression] n10 locks up with around the introduction or removal of a
- third overlay
+ [regression] mali, powervr locks up with around the introduction or
+ removal of a third overlay
Revision history for this message
Kevin DuBois (kdub) wrote :

Changed description, seems to affect mali and powervr (adreno is okay with the sw cursor)

Revision history for this message
Kevin DuBois (kdub) wrote :

My first guess is that the software buffer isn't tagged appropriately for HWC usage/mapping, and that adreno compensates for this while the others dont (or has a better mmu and doesn't care).

Shouldn't affect u8/touch, because we don't have a cursor, but might affect u8 desktop-preview folks if released.

tags: added: overlays
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

A complete lockup when using overlays on mali sounds much like bug 1391261. In both cases it's a matter of how much you stress the overlay logic so they could both be the same bug in the end.

Also, we knew the software cursor would cause some bugs with overlays, as touchspots do too. See: https://bugs.launchpad.net/mir/+bugs?field.tag=overlays

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

P.S. It seems likely that the current implementation of software cursor and touchspots could cause QtMir's compositor to spin and never sleep properly or even hang. This is due QtMir doing:
   src/modules/Unity/Application/mirsurfaceitem.cpp: while (renderable->buffers_ready_for_compositor() > 0) {
which needs to be rewritten quite soon.

I'm working on something that will help us get there:
   https://code.launchpad.net/~vanvugt/mir/fix-buffers_ready_for_compositor/+merge/247124

Revision history for this message
Kevin DuBois (kdub) wrote :

The buffers being submitted for software cursor were not allocated in a HWC-appropriate way, and that seems the likely culprit of this bug. fences aren't involved, it just seems that powervr and mali care more about the mapping bits on the buffer than adreno does.

lp: #1391261 is likely due to the BufferQueue and HwcDevice interaction handing out a buffer to the client that is fenced (the buffer is onscreen), and the client waiting on that fence.

Kevin DuBois (kdub)
tags: added: regression
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I don't yet understand the role of fences in the Android code... At a higher level, in a platform-independent way Mir already guarantees mutual exclusion of buffers across the architecture.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I was going to suggest simple renderables like cursors and touchspots only need to be static textures on the GPU (no buffers), but now realize that would prevent them from being overlayable...

Changed in mir:
milestone: 0.11.0 → 0.12.0
Revision history for this message
Kevin DuBois (kdub) wrote :

So it seems my initial theory was incorrect, the mali drivers (at least) do not choke if you submit an improperly-tagged buffer to the hwc, nor does adding the proper bits to the software buffers affect the bug. Rather something in the cursor and observer infrastructures seems to be hanging. Overlays (and in particular, the 2-way fencing where we don't necessarily wait on makecurrent/swapbuffers that mali & powervr has, but adreno doesn't) exacerbate the problem.

Revision history for this message
Kevin DuBois (kdub) wrote :

mmk, have a handle on what's going on. Probably won't wrap up the testing for the branch today, so fix forthcoming tomorrow (wed 2/4/15)

Revision history for this message
Kevin DuBois (kdub) wrote :

So, when adding the new surface, the code ends up in SoftwareCursor::create_renderable_for, which calls the mg::Buffer::write. On android while using overlays, the software buffer is almost-always locked, as it always onscreen. mg::Buffer::write appropriately blocks, waiting for the buffer to go offscreen, but the cursor doesn't really go offscreen, so it locks forever.

The attached fix changes from trying to reuse and rewrite an existing buffer to just allocating a new buffer every time.

The alternatives seemed worse (at least in the short run). I could have tagged the buffer internally as 'the software cursor' and rejected it on submission to HWC (effectively disabling overlay for no reason in the large majority of cases). I could have expanded the write() signature to fail if it would block, but that had strange semantics.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 0.12.0

Changed in mir:
status: In Progress → Fix Committed
Kevin DuBois (kdub)
Changed in mir:
milestone: 0.12.0 → 0.13.0
Changed in mir (Ubuntu):
importance: Undecided → High
status: New → Triaged
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mir - 0.12.1+15.04.20150324-0ubuntu1

---------------
mir (0.12.1+15.04.20150324-0ubuntu1) vivid; urgency=medium

  [ Alexandros Frantzis ]
  * New upstream release 0.12.1 (https://launchpad.net/mir/+milestone/0.12.1)
    - Bug fixes:
      . [regression] mali, powervr locks up with around the introduction or
        removal of a third overlay (LP: #1413211)
      . USC - mouse cursor on AMD graphics is drawing incorrectly
        (LP: #1417581)
      . mir_demo_server doesn't emit hover_exit events (LP: #1418569)
      . SessionMediator locks mutexes in one thread and unlocks them in
        another (LP: #1427976)
      . ProtobufResponder::send_response_result race (LP: #1428402)
      . Some protobuf Closure objects can access dead objects (LP: #1433330)
      . DisplayConfigurationOutput.physical_size_mm is undefined/zero
        (LP: #1430315)
      . vivid fails to build Mir as of 2015-03-22: error: #warning
        "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE"
        [-Werror=cpp] (LP: #1435127)
      . valgrind on armhf fails with with many errors (LP: #1435186)
 -- CI Train Bot <email address hidden> Tue, 24 Mar 2015 16:09:54 +0000

Changed in mir (Ubuntu):
status: Triaged → Fix Released
Changed in mir:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.