Xserver crash in radeon_frame_event_handler

Bug #617201 reported by Chris Halse Rogers
38
This bug affects 5 people
Affects Status Importance Assigned to Milestone
xserver-xorg-driver-ati
Fix Released
Medium
xserver-xorg-video-ati (Ubuntu)
Fix Released
High
Chris Halse Rogers

Bug Description

Binary package hint: xserver-xorg-video-ati

There's an easily reproducible X server crash in Radeon's vblank code. It's most easily triggered by switching between screensavers in the screensaver preferences.

The problem appears to be that a client can quit with a pending frame event. When the drm callback gets triggered, the Client's resources have been freed, leaving a garbage Pixmap that causes the crash.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: xserver-xorg-video-ati 1:6.13.1-1ubuntu1
ProcVersionSignature: Ubuntu 2.6.35-13.18-generic 2.6.35-rc6
Uname: Linux 2.6.35-13-generic x86_64
Architecture: amd64
DRM.card0.HDMI_Type_A.1:
 status: disconnected
 enabled: disabled
 dpms: On
 modes:
 edid-base64:
DRM.card0.VGA.1:
 status: disconnected
 enabled: disabled
 dpms: On
 modes:
 edid-base64:
Date: Fri Aug 13 13:59:47 2010
DkmsStatus:

Lsusb:
 Bus 002 Device 004: ID 05a4:9860 Ortek Technology, Inc.
 Bus 002 Device 003: ID 05a4:9835 Ortek Technology, Inc.
 Bus 002 Device 002: ID 045e:0095 Microsoft Corp. IntelliMouse Explorer 4.0 (IntelliPoint)
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.35-13-generic root=/dev/mapper/Primary--Storage-Ubuntu--Root ro crashkernel=384M-2G:64M,2G-:128M quiet splash
ProcEnviron:
 PATH=(custom, user)
 LANG=en_AU.UTF-8
 SHELL=/bin/zsh
SourcePackage: xserver-xorg-video-ati
dmi.bios.date: 12/29/2004
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F1
dmi.board.name: NF-CK804
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.type: 3
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF1:bd12/29/2004:svn:pn:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnNF-CK804:rvrx.x:cvn:ct3:cvr:
system:
 distro: Ubuntu
 codename: maverick
 architecture: x86_64
 kernel: 2.6.35-13-generic

Revision history for this message
In , Alban Browaeys (prahal) wrote :

Created an attachment (id=37055)
xorg log of a crash (happens way later than the actual issue).

This is where the xserver ends up crashing.

Revision history for this message
In , Alban Browaeys (prahal) wrote :

Stack details:
kernel : 2.6.35-rc5+
xserver: 3209b094a3b1466b579e8020e12a4f3fa78a5f3f with entervt fix from bug 27114 .
libdrm: b803918f3f77c62edf22e78cb2095be399753423
xf86-video-ati : 06691376b1ee963c711420edaf5a03eab6f5658f
mesa : c4066b78c0aad41c199eb27157538c2ec9ab5bfd

Revision history for this message
In , Oldřich Jedlička (oldium) wrote :

It looks like I have the same problem (latest git versions of libdrm, mesa, xf86-video-ati, kernel drm-radeon-testing, xorg-server 1.8.2). The crash is often reproducible when I unlock my screen (I'm using GL screensaver). The core file's backtrace looks like:

#0 radeon_dri2_copy_region (drawable=0xa25b4b8, region=0xbff5abe4, dest_buffer=0xa467308, src_buffer=0xa49ae38) at radeon_dri2.c:306
#1 0xb7332733 in radeon_dri2_frame_event_handler (frame=350152, tv_sec=1281125129, tv_usec=640400, event_data=0xa7a8e20) at radeon_dri2.c:405
#2 0xb7335f03 in drmmode_vblank_handler (fd=8, frame=350152, tv_sec=1281125129, tv_usec=640400, event_data=0xa7a8e20)
    at drmmode_display.c:1189
#3 0xb7364b02 in drmHandleEvent (fd=8, evctx=0x9412bf0) at xf86drmMode.c:781
#4 0xb7335ec9 in drm_wakeup_handler (data=0x9412bcc, err=2, p=0x8211d60) at drmmode_display.c:1199
#5 0x080752d1 in WakeupHandler (result=2, pReadmask=0x8211d60) at dixutils.c:403
#6 0x0809dfbe in WaitForSomething (pClientsReady=0x98d00d8) at WaitFor.c:232
#7 0x0806fe36 in Dispatch () at dispatch.c:375
#8 0x0806593d in main (argc=-1074416620, argv=0x8065540, envp=Cannot access memory at address 0x28
) at main.c:286

Which corresponds perfectly to the described scenario.

Revision history for this message
In , Oldřich Jedlička (oldium) wrote :

I think this problem of freed ClientPtr could be solved by using some unique identifier (increasing number) - client_id (currently non-existent, client_index isn't enough). The event handler would then ask the server for the ClientPtr to continue. In this way the ClientPtr could be freed, the event handler would not receive the ClientPtr, so it could ignore the request.

Revision history for this message
In , Chris Halse Rogers (raof) wrote :

*** Bug 29310 has been marked as a duplicate of this bug. ***

Revision history for this message
Chris Halse Rogers (raof) wrote :

Binary package hint: xserver-xorg-video-ati

There's an easily reproducible X server crash in Radeon's vblank code. It's most easily triggered by switching between screensavers in the screensaver preferences.

The problem appears to be that a client can quit with a pending frame event. When the drm callback gets triggered, the Client's resources have been freed, leaving a garbage Pixmap that causes the crash.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: xserver-xorg-video-ati 1:6.13.1-1ubuntu1
ProcVersionSignature: Ubuntu 2.6.35-13.18-generic 2.6.35-rc6
Uname: Linux 2.6.35-13-generic x86_64
Architecture: amd64
DRM.card0.HDMI_Type_A.1:
 status: disconnected
 enabled: disabled
 dpms: On
 modes:
 edid-base64:
DRM.card0.VGA.1:
 status: disconnected
 enabled: disabled
 dpms: On
 modes:
 edid-base64:
Date: Fri Aug 13 13:59:47 2010
DkmsStatus:

Lsusb:
 Bus 002 Device 004: ID 05a4:9860 Ortek Technology, Inc.
 Bus 002 Device 003: ID 05a4:9835 Ortek Technology, Inc.
 Bus 002 Device 002: ID 045e:0095 Microsoft Corp. IntelliMouse Explorer 4.0 (IntelliPoint)
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.35-13-generic root=/dev/mapper/Primary--Storage-Ubuntu--Root ro crashkernel=384M-2G:64M,2G-:128M quiet splash
ProcEnviron:
 PATH=(custom, user)
 LANG=en_AU.UTF-8
 SHELL=/bin/zsh
SourcePackage: xserver-xorg-video-ati
dmi.bios.date: 12/29/2004
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F1
dmi.board.name: NF-CK804
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.type: 3
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF1:bd12/29/2004:svn:pn:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnNF-CK804:rvrx.x:cvn:ct3:cvr:
system:
 distro: Ubuntu
 codename: maverick
 architecture: x86_64
 kernel: 2.6.35-13-generic

Revision history for this message
Chris Halse Rogers (raof) wrote :
Bryce Harrington (bryce)
tags: added: crash
Changed in xserver-xorg-video-ati (Ubuntu):
status: New → Confirmed
Changed in xserver-xorg-video-ati (Ubuntu):
importance: Undecided → High
assignee: nobody → Chris Halse Rogers (raof)
status: Confirmed → In Progress
Revision history for this message
In , Alban Browaeys (prahal) wrote :

Created an attachment (id=38119)
a way to fix this vblank and clientgone issue

This is not perfect. I still had issues with it ... like lattice and matrixvew staying assigned to init (running zombies) while gnome-screensaver-preferences was closed . Though I fixed as much as I could at this point . Please test and report issues or improve.
In the end it should go into dri2 layer as discussed on the ML though I believe moving step by step (while keeping a wide view) is easier .

Revision history for this message
In , Alban Browaeys (prahal) wrote :

Note I did not used a unique identifier that increase ... because ti would have required it to increase indefinitely which is not easy to handle. If we made it loop we would have ended up with the same issue than with client->index ... it behing reused.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xserver-xorg-video-ati - 1:6.13.1-1ubuntu3

---------------
xserver-xorg-video-ati (1:6.13.1-1ubuntu3) maverick; urgency=low

  * debian/rules:
    + Drop the upstream ChangeLog from the packages, saving precious CD
      space.
  * debian/patches/101_ref-count-dri2-buffers.patch:
    + Add reference-counting to DRI2 buffers, and take a reference in
      ScheduleSwap. Prevents the buffers from being destroyed on client
      quit between calling ScheduleSwap and the associated vblank event.
      Fixes Xserver segfault when a GL client quits (LP: #617201).
 -- Christopher James Halse Rogers <email address hidden> Tue, 24 Aug 2010 16:56:47 +1000

Changed in xserver-xorg-video-ati (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
In , Oldřich Jedlička (oldium) wrote :

Just note that Christopher James Halse Rogers is also working on a solution on xorg ATI mailing list (<email address hidden>).

If you want, have a look at the xorg-driver-ati mailing list - 17.8.2010 and further, subject "[PATCH] dri2: Reference-count DRI2 buffers.".

Revision history for this message
In , Alban Browaeys (prahal) wrote :

(In reply to comment #8)
> Just note that Christopher James Halse Rogers is also working on a solution on
> xorg ATI mailing list (<email address hidden>).
>
> If you want, have a look at the xorg-driver-ati mailing list - 17.8.2010 and
> further, subject "[PATCH] dri2: Reference-count DRI2 buffers.".

the reference count one or a newer one ? In next patch I implemented the option2 from this thread and I would like it to go option3. I talked with him on irc and he explained me this option 2. I really like the option 3 both of you talked about though I do not know if James is working on it so I will start it nevertheless.

Revision history for this message
In , Alban Browaeys (prahal) wrote :

Created an attachment (id=38139)
the option 2 version of the vblank when client gone fix.

Revision history for this message
Marcos Magalhães (marcos-daekdroom) wrote :

This bug still occurs on maverick after 6:13.1-1ubuntu3, and affects xorg-edgers' radeon driver as well.

Revision history for this message
In , Alban Browaeys (prahal) wrote :

(In reply to comment #8)
> Just note that Christopher James Halse Rogers is also working on a solution on
> xorg ATI mailing list (<email address hidden>).
>
> If you want, have a look at the xorg-driver-ati mailing list - 17.8.2010 and
> further, subject "[PATCH] dri2: Reference-count DRI2 buffers.".

The dri2: Reference-count DRI2 buffers I tried yesterday and it leads to random crashes of X in radeon drv after DRI2 layer . Ie it is in xorg-edgers and there I reproduced it . Replacing this patch in the xorg-edgers by the option 2 patch fixes all crashes.
I tried to report to launchpad though it looks like bugs report are not doable without an ubuntu-bug tool, at least for that project.

Revision history for this message
In , Oldřich Jedlička (oldium) wrote :

(In reply to comment #10)
> Created an attachment (id=38139) [details]
> the option 2 version of the vblank when client gone fix.

The patch (option 2) looks good and works for me; I like the used idea ;-)

I have one question - I'm not sure if the method radeon_dri2_screen_init is the right place to call AddCallback, because it should be called once per server instance. dixRegisterPrivateKey can be called more times, it doesn't reserve the space twice. But I have to admit that I don't know how many times the method radeon_dri2_screen_init could be called (on the multi-head setup for example) - how many screens could you have on one xorg-server instance...

Revision history for this message
Anton Anikin (anton-anikin) wrote :

I have the similar problem

Revision history for this message
Anton Anikin (anton-anikin) wrote :

6:13.1-1ubuntu2 works fine for me.
With 6:13.1-1ubuntu4 I have this crash right after logon:

[ 4947.741]
Backtrace:
[ 4947.741] 0: /usr/bin/X (xorg_backtrace+0x28) [0x462758]
[ 4947.742] 1: /usr/bin/X (0x400000+0x5d87d) [0x45d87d]
[ 4947.742] 2: /lib/libpthread.so.0 (0x7f3952268000+0xfb40) [0x7f3952277b40]
[ 4947.742] 3: /usr/lib/xorg/modules/drivers/radeon_drv.so (0x7f394f12a000+0xd70b4) [0x7f394f2010b4]
[ 4947.742] 4: /lib/libdrm.so.2 (drmHandleEvent+0x103) [0x7f394f841533]
[ 4947.742] 5: /usr/bin/X (WakeupHandler+0x4b) [0x43f7bb]
[ 4947.742] 6: /usr/bin/X (WaitForSomething+0x1d7) [0x45cac7]
[ 4947.742] 7: /usr/bin/X (0x400000+0x35062) [0x435062]
[ 4947.742] 8: /usr/bin/X (0x400000+0x2184b) [0x42184b]
[ 4947.742] 9: /lib/libc.so.6 (__libc_start_main+0xfe) [0x7f39511cfd8e]
[ 4947.742] 10: /usr/bin/X (0x400000+0x213d9) [0x4213d9]
[ 4947.742] Segmentation fault at address 0x18
[ 4947.742]
Caught signal 11 (Segmentation fault). Server aborting
[ 4947.742]
Please consult the The X.Org Foundation support
  at http://wiki.x.org
 for help.
[ 4947.742] Please also check the log file at "/var/log/Xorg.1.log" for additional information.

Revision history for this message
In , Alban Browaeys (prahal) wrote :

Created an attachment (id=38209)
this is option 3 , ie it relies on another patch for dri2 in xserver

The issue you mentioned in option 2 I agree should be dealt with . Though I completed this partial option 3 which is my favorite (except not as polished as option 2). Which has the advantage that it provides a mechanism for all drivers (and without the event normalization does not remove any flexibility in the driver event structure.

I will attach the xserver dri2 patch to this bug report (seems I cannot add more than one patch in a row).

Revision history for this message
In , Alban Browaeys (prahal) wrote :

Created an attachment (id=38210)
xserver part of the option 3 patch

Revision history for this message
In , Alban Browaeys (prahal) wrote :

Created an attachment (id=38211)
this is option 3 , ie it relies on another patch for dri2 in xserver (v2)

obviously when cleaning up a cherry pick from a branch to another one has to take care not to delete more than intended. This version of the patch is complete.

Revision history for this message
Chris Halse Rogers (raof) wrote : Re: [Bug 617201] Re: Xserver crash in radeon_frame_event_handler

This crash should be resolved in xserver-xorg-video-ati 6.13.1-1ubuntu4.
If it is not, please reopen this bug.

Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

I'm leaning towards option 2 - option 3 seems to incur basically all of the versioning pain of moving things into the server while not providing all of the benefits of handling it in the server.

Revision history for this message
In , Oldřich Jedlička (oldium) wrote :

Created an attachment (id=38484)
the option 2, updated fix: vblank when client gone event version (v2)

Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

(In reply to comment #17)
> Created an attachment (id=38484) [details]
> the option 2, updated fix: vblank when client gone event version (v2)

Please make the commit log more self-explanatory and submit to the mailing list for review, ideally with git send-email but at least generated by git format-patch.

Changed in xserver-xorg-driver-ati:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
In , Oldřich Jedlička (oldium) wrote :

Created an attachment (id=38703)
the option 2: updated fix sent to xorg-driver-ati mailing list (version 3)

Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

*** Bug 30206 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Oldřich Jedlička (oldium) wrote :

Created an attachment (id=38943)
the option 2: updated fix sent to xorg-driver-ati mailing list (version 4)

This is the latest patch (V4) rebased for the current git HEAD. The V4 patch was sent to mailing list on 19.9.2010 too and covers review comments.

Revision history for this message
In , Oldřich Jedlička (oldium) wrote :

Created an attachment (id=39128)
the option 2: updated fix sent to xorg-driver-ati mailing list (version 5)

v5: Distribute list.h as xorg_list.h, remove xorg-server version check. Use the version from xorg-server when available (checked in configure.ac).

Changed in xserver-xorg-driver-ati:
importance: Medium → Unknown
status: Confirmed → Fix Released
Changed in xserver-xorg-driver-ati:
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.