continual GUI freeze

Bug #1789059 reported by Be
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mixxx
Fix Released
Critical
Be
mixxx (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

[Impact]

Several times using the master branch with Qt 5.10.1 on Fedora, the GUI has completely frozen. At first I thought it might be related to using my touchscreen because it happened a few times within a minute or two of using my touchscreen with the overview waveforms. That seems to be a coincidence though because I just encountered the bug again without using my touchscreen at all. When this occurs, the audio keeps playing until the end of the track, which allows for a relatively graceful recovery if this happens during a performance (which has happened to me a few times now). I can tell the controller thread still runs okay because LEDs light up when I press buttons. However, there is no change in the audio from manipulating my controller. Unfortunately I have no idea how to reproduce this. The best idea I have currently for attempting to debug this is to run Mixxx in a debugger until the bug happens by chance.

[Test Case]

 * Use Mixxx with the effected setup and verify that there is no GUI deadlock.

[Regression Potential]

Due to the changed way of building the GUI elements there might be visual regressions. Nothing known though.

[Other Info]

The Cosmic 2.1.3 build is also effected, see: https://bugs.launchpad.net/ubuntu/+source/mixxx/+bug/1804513

Be (be.ing)
Changed in mixxx:
importance: Undecided → Critical
milestone: none → 2.2.0
Revision history for this message
Daniel Schürmann (daschuer) wrote :

I cannot confirm this with Ubuntu Trusty and Xenial.

Some ideas:
* Wich waveforms and which Skin do you use?
* Does this happen with the preview button coloumn hidden?
* What is the CPU load during the issue?
* Do other applications react normal?
* You may also try the branch which removes the qt4 scaling.
* Which OS and qt version do you use?

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

This sounds like a deadlock in the GUI thread. Hard to trace if there is no common pattern for triggering the issue. It might be caused by a signal loop of direct slot connections.

I've not seen this here on Fedora 28, yet. Disclaimer: I'm using the new multithreaded analysis code. That shouldn't make a difference if you don't re-analyze your tracks upon loading.

Revision history for this message
Be (be.ing) wrote :

I use GLSL waveforms with Intel UHD Graphics 620. I use Tango. I don't use the preview deck so that column is always hidden in the library. I haven't explicitly looked at the CPU load when this happens, but it seems to be fine considering the audio keeps playing until the end of the track and other applications respond fine.

I'm pretty sure the problem is not in the track analysis code because this happens randomly during playback, not when loading a track or doing a batch analysis.

Be (be.ing)
summary: - GUI freeze
+ continual GUI freeze
Be (be.ing)
Changed in mixxx:
status: New → Incomplete
Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

Note that you can attach gdb to a running process with gdb/lldb -p `pidof mixxx`, which helps if you get a deadlock but aren't running under a debugger. It doesn't help with a crash though. For that, it's helpful to set "ulimit -c unlimited" so you get core dumps (which you can load with gdb after the process is already dead/gone).

Revision history for this message
Be (be.ing) wrote :

I encountered this twice tonight when loading tracks. I am not certain this is the same issue as I originally reported because before I do not think it was connected to loading tracks. Here are the backtraces. It seems that setting the window title of DlgCoverArtFullSize is somehow causing a deadlock in the GUI thread.

Revision history for this message
Be (be.ing) wrote :
Changed in mixxx:
assignee: nobody → Be (be.ing)
Revision history for this message
Be (be.ing) wrote :

Hmm, maybe there is a bug in Wayland causing this? Perhaps that is why no one else has encountered this...

Revision history for this message
Be (be.ing) wrote :
Revision history for this message
Be (be.ing) wrote :

Uwe, do you use X or Wayland?

Be (be.ing)
Changed in mixxx:
status: Incomplete → In Progress
Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

I'm using Wayland/XWayland, but never experienced this issue in full screen mode.

Sometimes (first once a day, now very rarely) the whole desktop crashes, caused by an invalid X window handle according to the logs. Luckily this never happened while running Mixxx in full screen mode.

The behavior might also depend on the actual graphics driver, Intel HD Graphics 530 in my case.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

I wonder why DlgCoverArtFullSize is always created, connected, and updated even if it is not or never shown? Formatting the title string and updating the window title on every track load while hidden does not make sense to me.

DlgCoverArtFullSize should be created lazily upon request and destroyed afterwards. We don't need to eagerly provision and keep unused instances for an extended period of time that may cause trouble (as seen here).

Revision history for this message
Be (be.ing) wrote :

This happened when running Mixxx in full screen mode. I do not know if that is relevant. My onboard graphics are Intel UHD Graphics 620.

Lazily creating DlgCoverArtFullSize may work around this. However, I'm still worried to just work around this without fully understanding the problem. I am afraid there may be other code in Mixxx triggering similar deadlocks which may have triggered this when I initially reported it.

Revision history for this message
Be (be.ing) wrote :

Hopefully this has been worked around in https://github.com/mixxxdj/mixxx/pull/1849 Unfortunately because this bug was difficult to reproduce, I'm not certain it is fixed, but I'll mark this as Fix Committed now.

Changed in mixxx:
status: In Progress → Won't Fix
status: Won't Fix → Fix Committed
Revision history for this message
Owen Williams (ywwg) wrote :

For the record I didn't see this on my Intel 620 HD / Ubuntu Wayland laptop, but I'll let you know if anything like it happens after this fix

Revision history for this message
Be (be.ing) wrote :

If you do encounter this, please get a backtrace when it happens by attaching a debugger to the running process as RJ described in comment #4. Note that I had to press Ctrl + C in the console I had mixxx running in to get a gdb prompt where I could type "set height 0" then "thread apply all bt". If there is a track playing when this happens, either let the track finish before pressing Ctrl + C or turn off/disconnect your speakers, otherwise you'll hear a horrible buzzing sound when you interrupt mixxx.

Changed in mixxx:
status: Fix Committed → Fix Released
description: updated
Revision history for this message
Marc Ranolfi (marc.2377) wrote :

I've been getting this on Mixxx 2.2.4 (which somehow was released to Arch Linux much sooner than the official release which was just yesterday) for the past 5 days or so. Maybe the fact that I'm running JACK with tighter settings and two external USB soundcards, and streaming with OBS studio at the same time (loading my CPUs to ~35%) makes the problem more apparent.

It seems to happen somewhat randomly when loading tracks at times of high cpu load. I got it on video - was streaming live, in fact (unlisted, thankfully): https://www.youtube.com/watch?v=RnhlpDaXFB8&t=17m14s. See at the 17:24 mark. The music continues to play fine (I have JACK running at RTPRIO 85 (PR -86), which means the mixxx client thread is PR -81. There's another thread at priority -2 and the remaining threads are not real-time (PR 20).

I get this on the stock Arch kernel, as well as linux-ck and linux-rt (with/without CONFIG_HZ=1000 and other optimizations), which is to say, it happens with all kernels and config sets.

The log file does not contain anything useful and running with --developer also did not help. Attached backtrace according to comments #4 and #15.

As I've been able to reproduce this consistently (with a bit of patience), do let me know if I can be of any more help.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

Version 2.2.4 was tagged much earlier but then not released officially due to internal build server issues. Nothing to worry about.

The issue might depend on the exact Qt version, the build settings used by Arch, and how Mixxx is started (i.e. with "-platform xcb" on Wayland?). I have never experienced such an issue on Fedora with Qt 5.12/5.13/5.14.

If you resume Mixxx after creating the backtrace and then suspend Mixxx again to create a new backtrace: Does the call stack look similiar, i.e. is it still blocking in QSemaphore::acquire(int)? Then this could be a hint for a deadlock in some Qt UI code.

Revision history for this message
Marc Ranolfi (marc.2377) wrote :

Got it, thanks.

I'm running cinnamon-4.6.5-1 with qt5-base-5.15.0-3 (https://git.archlinux.org/svntogit/packages.git/log/trunk?h=packages/qt5-base). I start Mixxx with '-platform xcb', per default, but needless since Cinnamon runs on X (1.20.8-2). I just removed 'pasuspender' from the official launcher.

The build sources for Arch are at (https://git.archlinux.org/svntogit/community.git/log/trunk?h=packages/mixxx).

I did what you suggested and the call stack is similar, even did a diff between'em. The only change was in one thread that was not blocking on that call (see attached). This is the thread that is running at priority -2 that I mentioned previously, don't know what it is/does.
The first time it was caught on an unknown function at 'libm.so.6' and now it stays in 'clock_nanosleep@GLIBC_2.2.5 ()' at 'libc.so.6'. It's using the majority of CPU of the process, about 5%, even during the hang state.

'strace' of a hang thread shows FUTEX_WAIT_PRIVATE, predictably.

I tested with suspending multiple times with the debugger and there were no further changes, not even after the song had stopped.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

PKGBUILD: Why is Mixxx built with "perftools=1 perftools_profiler=1"? We don't use those options neither for CI builds nor for release builds.

Revision history for this message
Marc Ranolfi (marc.2377) wrote :

Don't know. Should I build locally without it? If you can you tell me what the implications are (or point me to some documentation or code) I'd like to know more about it.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

https://github.com/mixxxdj/mixxx/blob/dc1b9a1e64ac48e9b926e2afd452a6651a20081d/build/features.py#L495

I have never used it myself and am unsure about the consequences or side-effects. In the best case it results in a minor performance regression.

Using a local build with debug infos would hopefully provide helpful information about what function in Mixxx actually triggers the deadlock.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

tcmalloc could even be faster than the system allocator. But as already mentioned this is not used in any of the official builds.

Please first do a local build with the original Arch settings and try to reproduce the deadlock with a full stacktrace. Then rebuild without these options.

Revision history for this message
Marc Ranolfi (marc.2377) wrote :

Oh, so it's related to tcmalloc. Thanks.

Here is the full backtrace with debug symbols.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mixxx (Ubuntu):
status: New → Confirmed
Revision history for this message
Marc Ranolfi (marc.2377) wrote :

I just had it hang when exiting the program, at a time of low CPU load. Curiously enough, attaching the debugger in order to obtain the backtrace was sufficient to make it recover (when I detached the debugger the program closed successfully).

I know it sounds exotic but I'm actually documenting this all on video (if only to provide reproduction steps should anyone require it, althought it should also serve to show off some of my backtracing skills on YouTube in the form of a timelapse).

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

The hang on exit while closing the sound devices is unrelated. Probably not a deadlock, only a slow I/O operation and hardware interaction. Hard to trace without the actual device setup at hand.

Now we know that the deadlock is originating from WOverview.

Do you still experience any deadlocks after disabling tcmalloc and profiler from Google PerfTools?

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

Since this deadlock is probably not related to the original issue that has already been fixed we should create a new issue. Otherwise tracking becomes difficult.

Revision history for this message
Marc Ranolfi (marc.2377) wrote :

Ok I spent some time looking at the functions indicated by source files and line numbers from other threads, like that related to CoverArtCache, but you are right.

Have yet to test with a build without PerfTools actually. Gotta be a bit later though.

> we should create a new issue
Will do. I was unsure since the author of the bug was not entirely clear whether it was fixed. (If you prefer to create the new bug entry instead of me, please do go ahead).

Thanks.

Revision history for this message
Marc Ranolfi (marc.2377) wrote :

Reproduced in a build without perftools. Created report https://bugs.launchpad.net/mixxx/+bug/1885894.

Revision history for this message
Swiftb0y (swiftb0y) wrote :

Mixxx now uses GitHub for bug tracking. This bug has been migrated to:
https://github.com/mixxxdj/mixxx/issues/9415

lock status: Metadata changes locked and limited to project staff
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.