Xorg's Indirect GLX broken from upstream regression

Bug #1776447 reported by Steve Dodd
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
X.Org X server
Fix Released
Unknown
xorg-server (Ubuntu)
Fix Released
Medium
Timo Aaltonen

Bug Description

[Impact]

 * IGLX is a way for an opengl application sending the 3D drawing GLX commands to the Xorg server for rendering instead of using the DRI infrastructure it is commonplace for High Performance Compute and Scientific institutions to use this feature of the Xorg server. (ref: https://www.phoronix.com/scan.php?page=news_item&px=Xorg-IGLX-Potential-Bye-Bye)

 * The feature is completely broken, currently, and any attempt to use it will cause the Xorg server to crash. This feature has been broken since a commit upstream in 2016.

 * The attached patch is from the revert commit entered into the upstream repository on 28 Jan 2020 following a git bisect of the bug discovering the offending commit.

[Test Case]

 * Create a new file at /etc/X11/xorg.conf.d/99-IGLX.conf with the following contents:

```
Section "ServerFlags"
    Option "IndirectGLX" "on"
EndSection
```

 * Log out of your Xorg session
 * Switch to a VT (Ctrl+Alt+F2) and login
 * Run `sudo systemctl restart gdm` to restart Xorg
 * Login to the graphical session
 * If you do not have mesa-utils installed, then install it with the terminal command `sudo apt install mesa-utils`
 * Open a terminal and execute `LIBGL_ALWAYS_INDIRECT=1 glxgears`
 * If the fix is successful then glxgears should start and remain active without the Xorg server or glxgears crashing/segfaulting

[Regression Potential]

 * This change is within DRI-related functionality and so might break all 3D accelerated applications.

 * There is a potential that the change is insufficient to fix the IGLX feature and so attempts to use it may still have undesirable and undefined results.

 * Upstream has run the patch through their CI system, which passed with no errors. This does not necessarily mean the patch will achieve the desired results, but at least indicates that the code does compile cleanly and can start an Xvfb headless instance.

[Other Info]

 * This bug affects Ubuntu releases all the way back to, and including, Bionic 18.04.

---- Original Bug Report Follows ----

There's already an upstream bug report here:

https://bugs.freedesktop.org/show_bug.cgi?id=99555

Basically, with Option "IndirectGLX" "True" and LIBGL_ALWAYS_INDIRECT=1 I get immediate crashes e.g. with glxgear:

steved@xubuntu:~$ LIBGL_ALWAYS_INDIRECT=true glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
X Error of failed request: 255
  Major opcode of failed request: 155 (GLX)
  Minor opcode of failed request: 1 (X_GLXRender)
  Serial number of failed request: 42
  Current serial number in output stream: 936

I am trying to run what is effectively a traditional X Terminal, but as the hardware I am using has a 3D accelerated chipset it seems sensible to try to use it..

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: xserver-xorg-core 2:1.19.6-1ubuntu4
ProcVersionSignature: Ubuntu 4.15.0-22.24-generic 4.15.17
Uname: Linux 4.15.0-22-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.1
Architecture: amd64
BootLog: Error: [Errno 13] Permission denied: '/var/log/boot.log'
Date: Tue Jun 12 10:09:30 2018
DistUpgraded: Fresh install
DistroCodename: bionic
DistroVariant: ubuntu
DkmsStatus:
 rtl8812au, 4.3.8.12175.20140902+dfsg, 4.15.0-22-generic, x86_64: installed
 rtl8812au, 4.3.8.12175.20140902+dfsg, 4.15.0-23-generic, x86_64: installed
ExtraDebuggingInterest: Yes
GraphicsCard:
 Intel Corporation Device [8086:5a85] (rev 0b) (prog-if 00 [VGA controller])
   Subsystem: Intel Corporation Device [8086:2212]
InstallationDate: Installed on 2018-05-31 (11 days ago)
InstallationMedia: Xubuntu 18.04 LTS "Bionic Beaver" - Release amd64 (20180426)
MachineType: AZW S I
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-22-generic root=/dev/mapper/xubuntu--vg-root ro noresume
SourcePackage: xorg-server
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/18/2017
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 00.12
dmi.board.asset.tag: Default string
dmi.board.name: AB1
dmi.board.vendor: IP3 Tech
dmi.board.version: Default string
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 35
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr00.12:bd12/18/2017:svnAZW:pnSI:pvrDefaultstring:rvnIP3Tech:rnAB1:rvrDefaultstring:cvnDefaultstring:ct35:cvrDefaultstring:
dmi.product.family: Mini PC
dmi.product.name: S I
dmi.product.version: Default string
dmi.sys.vendor: AZW
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.91-2
version.libgl1-mesa-dri: libgl1-mesa-dri 18.0.0~rc5-1ubuntu1
version.libgl1-mesa-glx: libgl1-mesa-glx 18.0.0~rc5-1ubuntu1
version.xserver-xorg-core: xserver-xorg-core 2:1.19.6-1ubuntu4
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:18.0.1-1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20171229-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.15-2

Revision history for this message
In , Diplosarus (diplosarus) wrote :

LIBGL_ALWAYS_INDIRECT=1 glxgears
fails on latest git on r600 with:

X Error of failed request: 255
  Major opcode of failed request: 155 (GLX)
  Minor opcode of failed request: 1 (X_GLXRender)
  Serial number of failed request: 42
  Current serial number in output stream: 164

After rendering one frame. this is on a ATI HD5850. This is 100% repeatable and occures with every opengl app. Direct Rendering works fine.

Indirect rendering works with the vesa x11 driver and llvmpipe. xf86-video-ati versus xf86-video-modestetting both fail.

Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

This code in __glXForceCurrent:

    if (cx->wait && (*cx->wait) (cx, cl, error))
        return NULL;

calls DRI2WaitSwap, which returns TRUE because there is a pending swap. I suspect something to deal with this went missing at a higher level at some point.

Revision history for this message
In , Diplosarus (diplosarus) wrote :

Xorg 1.18.3 and below works fine, xorg 1.18.4 fails.

Revision history for this message
In , Diplosarus (diplosarus) wrote :

Looking at the commits between 1.18.3 and 1.18.4 this commit seams pertain to the case: https://cgit.freedesktop.org/xorg/xserver/commit/?h=server-1.18-branch&id=54ba95861e5ae54051d3963e5e7ced7d69a6de7b

Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

(In reply to diplosarus from comment #3)
> Looking at the commits between 1.18.3 and 1.18.4 this commit seams pertain
> to the case:
> https://cgit.freedesktop.org/xorg/xserver/commit/?h=server-1.18-
> branch&id=54ba95861e5ae54051d3963e5e7ced7d69a6de7b

Does reverting that commit fix it?

Revision history for this message
In , Diplosarus (diplosarus) wrote :

(In reply to Michel Dänzer from comment #4)
> Does reverting that commit fix it?

No sadly not.

Also this bug Persists in 1.19.3

Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

Can you bisect between 1.18.3 and 1.18.4?

Revision history for this message
In , Kusanagi Kouichi (slash-ac) wrote :

I tested with xf86-video-ati-7.10.0-23-g733f606d. xorg-server-1.18.4 is good and xorg-server-1.19.0 is bad. Bisected and the result is:

7d33ab0f8c7958b205076f71e4b47c24aace77fd is the first bad commit
commit 7d33ab0f8c7958b205076f71e4b47c24aace77fd
Author: Adam Jackson <email address hidden>
Date: Tue Jun 28 15:54:44 2016 -0400

    dri2: Don't make reference to noClientException

    noClientException is now never filled in with a meaningful value, it's
    always -1. The sole caller of this function disregards the error value
    in any case.

    Reviewed-by: Eric Anholt <email address hidden>
    Signed-off-by: Adam Jackson <email address hidden>

:040000 040000 50ecf07fd43345192f937697193acbc91086e975 6d6494a0a06aed80c78491b758f7e35bf3059389 M glx

Without 7d33ab0f8c7958b205076f71e4b47c24aace77fd, xorg-server-1.19.0 works but xorg-server-1.19.0-587-gbebcc8477 doesn't work. No X error but the following errors in dmesg:
[1654246.207734] radeon 0000:00:01.0: bo ffff90626cd73000 don't has a mapping in vm ffff90624f11d200
[1654246.212430] radeon 0000:00:01.0: bo ffff90626cd73000 don't has a mapping in vm ffff90624f11d200
[1654246.220569] radeon 0000:00:01.0: bo ffff90626cd73000 don't has a mapping in vm ffff90624f11d200
[1654246.220790] radeon 0000:00:01.0: bo ffff90626cd73000 don't has a mapping in vm ffff90624f11d200
[1654246.220880] radeon 0000:00:01.0: bo ffff90626cd73000 don't has a mapping in vm ffff90624f11d200
[1654246.236924] radeon 0000:00:01.0: bo ffff90626cd73000 don't has a mapping in vm ffff90624f11d200
[1654246.237176] radeon 0000:00:01.0: bo ffff90626cd73000 don't has a mapping in vm ffff90624f11d200
[1654246.237255] radeon 0000:00:01.0: bo ffff90626cd73000 don't has a mapping in vm ffff90624f11d200
[1654246.248485] radeon 0000:00:01.0: bo ffff90626cd73000 don't has a mapping in vm ffff90624f11d200
[1654246.248810] radeon 0000:00:01.0: bo ffff90626cd73000 don't has a mapping in vm ffff90624f11d200
[1654246.248960] radeon 0000:00:01.0: bo ffff90626cd73000 don't has a mapping in vm ffff90624f11d200
[1654246.265295] radeon 0000:00:01.0: bo ffff90626cd73000 don't has a mapping in vm ffff90624f11d200
[1654246.265866] radeon 0000:00:01.0: bo ffff90626cd73000 don't has a mapping in vm ffff90624f11d200
[1654246.266024] radeon 0000:00:01.0: bo ffff90626cd73000 don't has a mapping in vm ffff90624f11d200
[1654246.281986] radeon 0000:00:01.0: bo ffff90626cd73000 don't has a mapping in vm ffff90624f11d200
[1654246.992998] [drm:radeon_cs_parser_relocs [radeon]] *ERROR* gem object lookup failed 0x4c
[1654246.993025] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -2!

Revision history for this message
Steve Dodd (anarchetic) wrote :
Changed in xorg-server:
importance: Unknown → Medium
status: Unknown → Confirmed
Changed in xorg-server (Ubuntu):
status: New → Confirmed
Revision history for this message
In , Steve Dodd (anarchetic) wrote :

I've hit this on Ubuntu 18.04 (xorg 1.19.6), modesetting driver. I have opened a bug over there in case someone on that side has time to investigate (https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1776447)

Interestingly, if I use VirtualGL as a true GLX proxy..

VGL_READBACK=sync LIBGL_ALWAYS_INDIRECT=1 VGL_DISPLAY=192.168.128.163:0 VGL_LOGO=1 vglrun +v /opt/VirtualGL/bin/glxspheres64

.. which I'm pretty sure it was never designed to do, things run successfully. So IGLX obviously isn't catastrophically broken ..

Revision history for this message
In , Diplosarus (diplosarus) wrote :

Created attachment 141901
Xorg log from iglx crash

Revision history for this message
In , Diplosarus (diplosarus) wrote :

Somewhere between 1.19 and version 1.20.1 this has goten worse. The Xorg server now crashes after LIBGL_ALWAYS_INDIRECT=1 glxgears. Log attached.

Revision history for this message
In , Gitlab-migration (gitlab-migration) wrote :

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/xserver/issues/211.

Changed in xorg-server:
status: Confirmed → Unknown
Changed in xorg-server:
importance: Medium → Unknown
tags: added: focal
tags: added: disco eoan
Changed in xorg-server:
status: Unknown → New
Revision history for this message
Lucy Llewellyn (lucyllewy) wrote : Re: Indirect GLX (LIBGL_ALWAYS_INDIRECT=1) causes opengl programms to crash

A fix has now landed upstream. I think this could be a candidate for an SRU cherry-pick for every supported release back to Xenial. The upstream commit is https://gitlab.freedesktop.org/xorg/xserver/commit/e1fa3beb2fe2519e69f859f0acdc68e5a770de27. It's a very simple one-line change and is merely a revert of a prior upstream commit.

Revision history for this message
Lucy Llewellyn (lucyllewy) wrote :

This is the upstream patch. It changes the return from __glXDRIcontextWait from hardcoded -1 to noClientException which will either be -1 or nothing. This is the pre-2016 behaviour that is being restored.

Revision history for this message
Steve Dodd (anarchetic) wrote :

That would be excellent - thanks! IGLX is one of those things probably not many people use, but those of us who do kind of really need it. It also seems to be a thing in HPC / research circles: https://www.phoronix.com/scan.php?page=news_item&px=Xorg-IGLX-Potential-Bye-Bye

FWIW, I'm now (finally) using bionic everywhere..

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "xorg-server-iglx.diff" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
description: updated
description: updated
description: updated
summary: - Indirect GLX (LIBGL_ALWAYS_INDIRECT=1) causes opengl programms to crash
+ Xorg's Indirect GLX broken from upstream regression
tags: added: champagne
Changed in xorg-server (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
tags: added: fixed-upstream
Revision history for this message
Lucy Llewellyn (lucyllewy) wrote :

I've built this locally and tested it. While it improves matters, it is still incomplete to get IGLX working fully - there is a new crash at the point you close the IGLX-using application instead of immediately upon starting it. This fix is still warranted, however, because it does improve things. I'm going to go back to bisecting Xorg upstream to see if I can identify the new crash.

Revision history for this message
Lucy Llewellyn (lucyllewy) wrote :

There are, thankfully, seemingly no regressions in unrelated areas :-)

Revision history for this message
Lucy Llewellyn (lucyllewy) wrote :

The new crash backtrace:

(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007fb60370c899 in __GI_abort () at abort.c:79
#2 0x0000560aacd068c0 in OsAbort () at ../../../../os/utils.c:1351
#3 0x0000560aacd0c4f9 in AbortServer () at ../../../../os/log.c:879
#4 0x0000560aacd0d35a in FatalError (f=f@entry=0x560aacd40230 "Caught signal %d (%s). Server aborting\n") at ../../../../os/log.c:1017
#5 0x0000560aacd03c09 in OsSigHandler (unused=<optimised out>, sip=0x7ffd89b28470, signo=11) at ../../../../os/osinit.c:156
#6 OsSigHandler (signo=11, sip=0x7ffd89b28470, unused=<optimised out>) at ../../../../os/osinit.c:110
#7 0x00007fb6038ef540 in <signal handler called> () at /lib/x86_64-linux-gnu/libpthread.so.0
#8 0x0000560aacd02cf3 in ResetCurrentRequest (client=client@entry=0x560aae67f0c0) at ../../../../os/io.c:560
#9 0x0000560aacccfa66 in DRI2WaitSwap (client=0x560aae67f0c0, pDrawable=<optimised out>) at ../../../../../../hw/xfree86/dri2/dri2.c:1082
#10 0x00007fb6033fa5f1 in __glXDRIcontextWait (baseContext=<optimised out>, cl=0x560aae67f1d8, error=0x7ffd89b28aa8) at ../../../../glx/glxdri2.c:291
#11 0x00007fb6033f29c7 in __glXForceCurrent (cl=0x560aae67f1d8, tag=tag@entry=1, error=error@entry=0x7ffd89b28aa8) at ../../../../glx/glxext.c:621
#12 0x00007fb6033ee900 in xorgGlxMakeCurrent (client=0x560aae67f0c0, tag=1, drawId=<optimised out>, readId=0, contextId=<optimised out>, newContextTag=0) at ../../../../glx/glxcmds.c:616
#13 0x0000560aaccd378a in GlxFreeClientData (client=0x560aae67f0c0) at ../../../../glx/vndext.c:168
#14 0x0000560aacba838c in _CallCallbacks (pcbl=pcbl@entry=0x560aacdb2538 <ClientStateCallback>, call_data=call_data@entry=0x7ffd89b28b50) at ../../../../dix/dixutils.c:743
#15 0x0000560aacba2273 in CallCallbacks (call_data=0x7ffd89b28b50, pcbl=0x560aacdb2538 <ClientStateCallback>) at ../../../../include/callback.h:83
#16 CloseDownClient (client=0x560aae67f0c0) at ../../../../dix/dispatch.c:3473
#17 0x0000560aacd045f1 in ospoll_wait (ospoll=0x560aad403a10, timeout=<optimised out>) at ../../../../os/ospoll.c:651
#18 0x0000560aaccfd3b3 in WaitForSomething (are_ready=0) at ../../../../os/WaitFor.c:208
#19 0x0000560aacba2ca7 in Dispatch () at ../../../../include/list.h:220
#20 0x0000560aacba6f94 in dix_main (argc=12, argv=0x7ffd89b299b8, envp=<optimised out>) at ../../../../dix/main.c:276
#21 0x00007fb60370e1e3 in __libc_start_main (main=0x560aacb90a00 <main>, argc=12, argv=0x7ffd89b299b8, init=<optimised out>, fini=<optimised out>, rtld_fini=<optimised out>, stack_end=0x7ffd89b299a8)
    at ../csu/libc-start.c:308
#22 0x0000560aacb90a3e in _start ()

Changed in xorg-server:
status: New → Fix Released
Revision history for this message
Lucy Llewellyn (lucyllewy) wrote :

This debdiff for Bionic fixes the follow-on crash (on client close) that was unearthed by fixing the first crash (on client start).

Revision history for this message
Lucy Llewellyn (lucyllewy) wrote :

This debdiff for Eoan fixes the follow-on crash (on client close) that was unearthed by fixing the first crash (on client start).

Revision history for this message
Lucy Llewellyn (lucyllewy) wrote :

This debdiff for Focal fixes the follow-on crash (on client close) that was unearthed by fixing the first crash (on client start).

Revision history for this message
Lucy Llewellyn (lucyllewy) wrote :

The refreshed patches fix both crashes. The first change is upstream commit e1fa3be [1] and the second change is from upstream PR 388 [2].

@Sponsors, this is now good to go from my point of view.

[1] https://gitlab.freedesktop.org/xorg/xserver/commit/e1fa3beb2fe2519e69f859f0acdc68e5a770de27
[2] https://gitlab.freedesktop.org/xorg/xserver/merge_requests/388

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I can't find any existing reports of that secondary crash, but ideally it would not usually be part of the fix here...

I guess it's OK if the secondary crash was impossible without the first fix. If it wasn't then ideally a new bug should be opened to describe that and both bugs mentioned in the patch.

Revision history for this message
Lucy Llewellyn (lucyllewy) wrote :

The secondary crash doesn't occur without the first fix. You cannot reach it until the first fix is applied. However, I suppose it is in a different bit of code, but it is only reached when you use indirect GLX and prevents you using indirect GLX which is what this issue is attempting to fix.

Revision history for this message
Lucy Llewellyn (lucyllewy) wrote :

both changes are now in upstream Xorg's repository, as PR388 was merged a few hours ago. So the changes in this debdiff are a backport of those to Bionic, Eoan and Focal (although if Focal were to sync with the upstream code then the delta won't be necessary going forward).

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Great to hear.

I don't have any power to help get this released sooner. It looks like you've done everything right for an SRU, and Timo (who handles Xorg) is already aware of it.

Changed in xorg-server (Ubuntu):
assignee: nobody → Timo Aaltonen (tjaalton)
Revision history for this message
Sebastien Bacher (seb128) wrote :

Tagging rls-ff-notfxing, the bug is not important enough to be rls tracked. Still it's in the sponsoring queue and hopefully gets fixed before focal is out

tags: added: rls-ff-notfixing
removed: champagne
Mathew Hodson (mhodson)
tags: added: patch-accepted-upstream
removed: fixed-upstream
Mathew Hodson (mhodson)
tags: added: regression-release
Revision history for this message
Mathew Hodson (mhodson) wrote :

Fixed upstream in xorg-server 1.20.8

---
xorg-server (2:1.20.8-2ubuntu2) focal; urgency=medium

  * randr-auto-bind-of-gpu-is-a-config-change.diff: Backport GPU hotplug
    RandR fix. (LP: #1862753)

xorg-server (2:1.20.8-2ubuntu1) focal; urgency=medium

  * Merge from Debian.
  * modesetting-Disable-atomic-support-by-default.patch: Dropped,
    upstream.

xorg-server (2:1.20.8-2) unstable; urgency=medium

  * rules: Exclude udeb/ from indep dh_missing. (Closes: #955399)

xorg-server (2:1.20.8-1) unstable; urgency=medium

  * New upstream release.
  * patches: Dropped patches applied upstream:
    - fix-modesetting-build.diff
    - add-EGL_QUERY_DRIVER-check.diff
    - fix-rotate-crash.diff
  * control: Use debhelper-compat, bump to 12.
  * rules: Migrate to dh_missing.

Changed in xorg-server (Ubuntu):
status: Triaged → Fix Released
Changed in xorg-server (Ubuntu Bionic):
importance: Undecided → Medium
no longer affects: xorg-server (Ubuntu Bionic)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.