X locks up when scrolling on Radeon Mobility M6 LY

Bug #404331 reported by Cedders
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xserver-xorg-video-ati (Ubuntu)
Invalid
High
Unassigned

Bug Description

Binary package hint: xserver-xorg-video-ati

This was a Ubuntu 8.04 fresh install on Viglen Dossier LX. Every few hours, apparently randomly, everything freezes apart from the mouse pointer. Ctrl-Alt-F1 or Ctrl-Alt-Del does nothing, but I can ping the laptop. Typically this occurs when scrolling up or down in Firefox using touchpad or keyboard, and there is some screen corruption at the bottom of the scroll bar (as if a blit failed or was interrupted?).

The symptoms aren't quite the same as bug #36596, bug #248438: it does seem closest to bug #184430 which was attributed to a faulty adapter, but again doesn't sound identical, nor to (Debian bug) #459428 which sounded unresolved. No visual effects, because it claims they are not available; before the hang I can switch to a console. I've tried Option "RenderAccel" "off" and Option "AGPMode" "2" for "Configured Video Device" but neither seems to help, nor reducing AGP aperture in BIOS.

Xorg.0.log shows the uninformative:
mieqEnequeue: out-of-order valuator event; dropping.
tossed event which came in late
mieqEnequeue: out-of-order valuator event; dropping.
tossed event which came in late
tossed event which came in late
tossed event which came in late

reported elsewhere. An strace shows EBUSY errors:
22:30:05 [b7f64410] read(37, "\233\7\2\0E\"\203\2\233\4\5\0F\"\203\2:\"\203\0028\0\0"..., 4096) = 208
22:30:05 [b7f64410] ioctl(7, 0xc0286429, 0xbff07634) = -1 EBUSY (Device or resource busy)

(I'll upload fuller strace and try getting more info via SSH as described at https://wiki.ubuntu.com/X/Backtracing )

ProblemType: Bug
Architecture: i386
Date: Fri Jul 24 20:03:45 2009
DistroRelease: Ubuntu 8.04
Package: xserver-xorg-video-ati 1:6.8.0-1ubuntu1
PackageArchitecture: i386
ProcEnviron:
 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/opt/real/RealPlayer
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
SourcePackage: xserver-xorg-video-ati
Uname: Linux 2.6.24-24-generic i686

[lspci]
00:00.0 Host bridge: Intel Corporation 82830 830 Chipset Host Bridge (rev 02)
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility M6 LY

Revision history for this message
Cedders (cedric-gn) wrote :
Revision history for this message
Cedders (cedric-gn) wrote :
description: updated
Revision history for this message
Rolf Leggewie (r0lf) wrote :

Does this happen with a release later than hardy, have you ever tried?

Revision history for this message
Cedders (cedric-gn) wrote :

I've not tried, but am about to try installing xserver-xorg-core 2:1.6.0-0ubuntu14 and xserver-xorg-video-ati 1:6.12.1-0ubuntu2 from Jaunty, and will report back.

3 backtraces attached, occurring at 2 different places where it seems to get stuck and use all available CPU:

#2 0xb7b2896d in drmDMA () from /usr/lib/libdrm.so.2
#3 0xb7ac8b64 in RADEONCPGetBuffer (pScrn=0x821bd90)
    at ../../src/radeon_accel.c:521
#4 0xb7ad008b in RADEONSetupForSolidFillCP (pScrn=0x821bd90, color=16777215,
    rop=6, planemask=4294967295) at ../../src/radeon_accelfuncs.c:137

#2 0xb7b6f96d in drmDMA () from /usr/lib/libdrm.so.2
#3 0xb7b0fb64 in RADEONCPGetBuffer (pScrn=0x821bd90)
    at ../../src/radeon_accel.c:521
#4 0xb7b0fcfb in RADEONCPFlushIndirect (pScrn=0x821bd90, discard=1)
    at ../../src/radeon_accel.c:575

Revision history for this message
Cedders (cedric-gn) wrote :

Couldn't install upgrades with build-dep or aptitude, so upgraded to 8.10 Intrepid, xserver-xorg-core 2:1.5.2-2ubuntu3.1 , xserver-xorg-video-ati 1:6.9.0+git20081003.f9826a56-0ubuntu2.1. Very similar effect has occurred once, (although cursor movement was slow/jerky after lockup this time).

Revision history for this message
Cedders (cedric-gn) wrote :

Backtrace for Intrepid attached - it looks like essentially the same problem in the same function (RADEONCPGetBuffer). It may be slightly less frequent now, twice in three days. Might as well upgrade to Jaunty now...

Revision history for this message
Cedders (cedric-gn) wrote :

If intrepid was somewhat better, jaunty has been considerably worse, often freezing up within an hour using Firefox (I think it's particularly common scrolling immediately after switching into the Firefox window). I did have visual effects working for a while, and there also seemed to be a reliable way to produce a lockup using compiz 'Paint fire on screen' with some high values for particle size and number. In the case of intrepid and jaunty, not only does the cursor become jerky (longer timeout?), and I've not seen the screen corruption, but the slowdown is gradual over a course of about 2 seconds, as if whatever resource is getting locked is slowly approaching saturation. Then a few seconds after that the laptop fan comes on as CPU usage hits 100%. The difference in jaunty is it seems to happen more readily, whereas I got the impression in intrepid it sometimes approached this slowup and halt, but recovered. There's nothing special in Xorg.log: the last lines are about monitor detection.

xserver-xorg-video-radeon/jaunty uptodate 1:6.12.1-0ubuntu2
xserver-xorg-video-ati/jaunty uptodate 1:6.12.1-0ubuntu2
xserver-xorg-core/jaunty uptodate 2:1.6.0-0ubuntu14
libgl1-mesa-dri/jaunty-updates uptodate 7.4-0ubuntu3.1
libgl1-mesa-glx/jaunty-updates uptodate 7.4-0ubuntu3.1
libdrm2/jaunty uptodate 2.4.5-0ubuntu4

Nothing similar to this happens on the laptop under Windows XP (although occasionally there is a driver BSOD on boot), so if it's a hardware timing issue, it's one that could be worked around, as a last resort by resetting the card and driver. Anyway, my current workaround is
  Option "DRI" "false"

And that seems to be working. Have I filed this against the wrong package?

Bryce Harrington (bryce)
Changed in xserver-xorg-video-ati (Ubuntu):
status: New → Confirmed
Bryce Harrington (bryce)
tags: added: hardy
Bryce Harrington (bryce)
tags: added: intrepid jaunty
Revision history for this message
Bryce Harrington (bryce) wrote :

[This is an automatic notification.]

A new version of the -ati driver is now available in Karmic.

This is a significant update to -ati which brings in kernel mode-setting
(currently disabled) and scores of fixes for DRI2, EXA, etc.

I've posted the new version of this driver to the following PPA,
would you mind testing it and seeing if it resolves the bug you
reported?

  https://edge.launchpad.net/~bryceharrington/+archive/ppa/+sourcepub/709908/+listing-archive-extra

If you're not running this release of Ubuntu, you can try booting the Karmic
LiveCD and loading the PPA onto it, and then log out/in to restart X.
ISOs are available at http://cdimages.ubuntu.com/releases/

After testing Karmic, report back here whether it's still an issue or not,
and if it is please post a fresh Xorg.0.log and 'dmesg' output.

Note there could be new bugs... please file these as new reports using
the command 'ubuntu-bug xorg'.

Changed in xserver-xorg-video-ati (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Cedders (cedric-gn) wrote :

Thanks for that, but it looks like whatever improvements have been made in the PPA, they don't affect this bug (or the new bug 421842).

I hadn't actually reinstalled the OS or run a live CD since reporting this, just upgraded using Update Manager. The Karmic live CD (alpha-4) seems to cause none of these problems: the most likely differences I can think of are that I have ndisgtk (with a Broadcom 43xx/Linksys PCI card) on the version installed on the hard drive; I had installed and uninstalled the proprietary RealPlayer; or the memory usage is different (I've run memtest86+ for a few hours with no errors). DRI and Compiz is enabled when running the Live CD, there is no xorg.conf to edit that I can see, but scrolling in Firefox is very slow. At least it rules out a simple hardware fault; I can't see how this could be down to a corrupt binary, since each package has probably been freshly downloaded since first reporting this bug, and it must be something in the configuration. I'll attach files from the Live and installed versions for comparison.

DRI-only issues mentioned in bug 421842 include segfaults and screen corruption on the window losing focus.

Revision history for this message
Cedders (cedric-gn) wrote :
Revision history for this message
Cedders (cedric-gn) wrote :
Revision history for this message
Cedders (cedric-gn) wrote :
Revision history for this message
Cedders (cedric-gn) wrote :
Revision history for this message
Cedders (cedric-gn) wrote :
Revision history for this message
Cedders (cedric-gn) wrote :
Revision history for this message
Cedders (cedric-gn) wrote :
Revision history for this message
Cedders (cedric-gn) wrote :
Bryce Harrington (bryce)
Changed in xserver-xorg-video-ati (Ubuntu):
status: Incomplete → Confirmed
importance: Undecided → High
Bryce Harrington (bryce)
tags: added: karmic
Bryce Harrington (bryce)
description: updated
Revision history for this message
Bryce Harrington (bryce) wrote :

Given that in bug 421842 you mentioned there had been a rough upgrade with possibly corrupted packages, please first follow the directions to repro this bug on a livecd updated to ~a-5; if you did have a faulty upgrade I could imagine that it could cause both of these bugs, so we should rule that out before going further.

Changed in xserver-xorg-video-ati (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Cedders (cedric-gn) wrote :

I've also just reproduced this problem (Firefox scrolling) on fresh Jaunty 9.04 live CD, using a 3COM wireless card rather than ndisgtk. I think this means we can rule out anything to do with persistent configuration errors or corrupted packages, or realplayer or ndiswrapper (and probably nothing to do with IRQ conflict with the PCMCIA card).

I think we may be dealing with at least 3 separate bugs - this one 404331 has been present since HDD was formatted and Windows XP and Hardy installed. It only hangs when DRI is enabled, and then Xorg takes 99% of CPU and cursor movement is jerky. It was definitely present in Jaunty, but I'm *not* 100% convinced it's in Karmic, because the other, newer bugs mask it (including one in alpha-5 live CD which causes distorted screen corruption a different freeze involving dd and rsyslogd and stationary cursor; and the reproducible effect I've only seen on latest karmic on the installed version with 'tartan' window corruption) and also I have DRI turned off just in case on the HD installation.

I'll continue to test, but I'm not sure which bug to concentrate on (about to try upgrading individual packages on live CD to alpha-5). This one (404331) is annoyingly infrequent, because I usually have to turn DRI and work for an hour or two before it happens, meaning losing some work...

Revision history for this message
Rolf Leggewie (r0lf) wrote :

could this be bug 363238?

Revision history for this message
Cedders (cedric-gn) wrote :

I don't think it's bug 363238, since those are slowdowns rather than lockups and the logs don't seem to match.

However, but it hasn't happened in my hacked radeon_drv.so I've been using (without compiz), so it probably has been fixed in 6.12.2. I'm leaving this open for the moment just because I'm not sure what the situation will be in the final karmic release.

Bryce Harrington (bryce)
tags: added: freeze
Revision history for this message
Bryce Harrington (bryce) wrote :

I'll be optimistic and hope lack of recent response means the issue is gone away. :-)

In any case, the next step is to have you re-test this on lucid. If it is not reproducible there we'll consider the problem resolved now.

Changed in xserver-xorg-video-ati (Ubuntu):
status: Incomplete → Fix Released
Revision history for this message
Cedders (cedric-gn) wrote :

Unfortunately, I think this was never resolved. On upgrading karmic to lucid, the problem re-emerged, even though I was using the Option "DRI" "false" workaround.

I've been suffering lockups/crashes at about the same frequency, every 2-3 hours say, for several months (except when I use Windows, which is very tempting - there's a additional and probably separate problem that lucid never resumes successfully from hibernate or suspend.)

Again, the lockups are most common when scrolling in Firefox or switching between windows (as if it's a DMA/timing thing?). Again, there's often a corruption of the scroll bar visible either side of the slider, and the bottom 10% of the window hasn't scrolled. However, what happens next differs from the problem as in hardy-jaunty with DRI true. At approximately equal frequency it may be one of:

a) The cursor is still moving but the screen not updating (eg I have system monitor in the top panel and it stops). CPU usage is not particularly high. The only thing I can do in Gnome is click on the buttons on the taskbar to minimise applications, and Gnome will redraw the desktop background correctly. I can use Ctrl+Alt+F1 to switch to a text console and reboot cleanly. I've tried taking a screenshot of this state, but it has been saved as a distorted PNG.

b) There is a complete freeze: the corruption in the Firefox window may be visible, but the cursor won't move. Even Alt+SysRq+E keys won't work and I have to manually power-cycle.

c) As b, but the screen blanks after a fraction of a second, and remains blank. I know in this situation Alt+SysRq won't do anything.

With a different Xorg configuration (I may have changed AccelMethod, plus there have been some xorg driver upgrades) I've also seen X/gdm crash and restart after a):

(II) Sleep Button: Close
(II) UnloadModule: "evdev"
(II) Power Button: Close
(II) UnloadModule: "evdev"
 ddxSigGiveUp: Closing log

I got a backtrace from a) but never uploaded it, as it wasn't very revealing:

#0 0x00daa422 in __kernel_vsyscall ()
#1 0x0040b93d in ___newselect_nocancel () at ../sysdeps/unix/syscall-template.S:82
#2 0x080a3f77 in WaitForSomething (pClientsReady=0x9b74aa0) at ../../os/WaitFor.c:229
#3 0x080721b0 in Dispatch () at ../../dix/dispatch.c:375
#4 0x08066d7a in main (argc=8, argv=0xbfc3a6d4, envp=0xbfc3a6f8) at ../../dix/main.c:285

Revision history for this message
Cedders (cedric-gn) wrote :

delayed reopening

Changed in xserver-xorg-video-ati (Ubuntu):
status: Fix Released → New
Bryce Harrington (bryce)
Changed in xserver-xorg-video-ati (Ubuntu):
status: New → Confirmed
Revision history for this message
Cedders (cedric-gn) wrote :

A little more information in case it gives any clues:

As well as the 3 scenarios above (triggering under identical conditions as far as I can see), there is a fourth after all:

d) As a) but the screen blanks. That is, there is a momentary pause, the screen goes entirely black, but I am able to (say) Ctrl+Alt+F1, llogin and sudo reboot.

So the screen blanking is a random event happening in about 25% of cases. Type a) is probably more common than the others.

I'm attaching a drawing of what the might happen to the scrollbar in the application window when the freeze happens. There are black horizonal bars across the scrollbar, with a further few white pixels immediately below, and sometimes the status line is corrupted. As I mention, any screenshot I can produce is corrupted.

If I restart gdm (and/or X) from the console after the freeze, I get the fade into the greeter background (the brown ubuntu spotlight) but the login dialogue is not drawn (usually that space is occupied by a faded version of whatever was on the screen before the gdm restart). So it's similar to the situation after the freeze in a) before restarting gdm: the background is drawn correctly, but any other windows cannot be written. I'm pretty sure that if I could change the video mode (to text, say, and back again), gdm would display correctly. Is there any way to do this - resizecons or xvidtune don't seem to help?

If I comment the Option "DRI" "false" workaround (which seemed to work for lucid), and enable Compiz, visual effects seem to work correctly until the freeze. It's fairly easy now to reproduce a freeze: turn visual effects no normal or high, visit a page like this in Epiphany and scroll down - I can scroll down a page or two before the freeze. Also, when this happens it seems Compiz crashes because the title bars disappear.

Very little is logged about the problem in Xorg.0.log (I've just added -logverbose to
/etc/X11/xinit/xserverrc in case that helps.)

Once it's got into this state (a), switching with Ctrl+Alt and back again to the frozen Xorg gives:

(II) AIGLX: Suspending AIGLX clients for VT switch
[dix] couldn't enable device 9
[dix] couldn't enable device 9
[dix] couldn't enable device 9
[dix] couldn't enable device 9
[dix] couldn't enable device 9
(II) Open ACPI successful (/var/run/acpid.socket)
(II) AIGLX: Resuming AIGLX clients after VT switch

but of course I'm still in the state of having X frozen with only the cursor working.

Revision history for this message
Cedders (cedric-gn) wrote :
Revision history for this message
Cedders (cedric-gn) wrote :
Revision history for this message
Mörgæs (moergaes) wrote :

Closed due to age.
If the problem appears in 13.04 please open a new report.

Changed in xserver-xorg-video-ati (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.