Solid hangs in VIA glx code

Bug #29586 reported by John Moser
12
Affects Status Importance Assigned to Milestone
linux-source-2.6.15 (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Recognized something! My screensaver is back on! I disabled it to black screen earlier because the GLX code in the partial Via driver for my video chipset causes hard kernel freezes. Something turned it back on, apparently upgrading the screensaver package thought this a good idea.

I have disabled random screen savers.

==Original description follows==
If I run rhythmbox for a half hour or so listening to music, eventually the system suddenly freezes. Keyboard buttons can't mess with the LEDs; ping doesn't reach the machine; mouse doesn't work; disk activity stops; and the last frame of audio loops so I hear the same 20-50mS of sound over and over and over.

I'm on an emu10k1 based SB audigy.

Machine is Socket 754 Athlon 64 Newcastle 2800+, not overclocked.

1024MB of ram.

Running in 32-bit protected mode on the i686 2.6.15-13 kernel.
Linux icebox 2.6.15-13-686 #1 SMP PREEMPT Thu Jan 19 17:12:14 UTC 2006 i686 GNU/Linux

Revision history for this message
Phil Bull (philbull) wrote :

Thanks for the report.

Can you provide any more information?

 * Do you get the same problem in non-gstreamer-based audio apps, like mplayer?
 * What driver is being loaded for your sound card?

Try following some of the debugging procedures here to see if you can find anything more out:

https://wiki.ubuntu.com/DebuggingProcedures

Note: I'm also changing the severity of this bug down to 'normal'. Please read the following for an explanation why:

https://wiki.ubuntu.com/HelpingWithBugsOld

Thanks again

Changed in desktop-base:
status: Unconfirmed → Needs Info
John Moser (nigelenki)
description: updated
Revision history for this message
Sebastien Bacher (seb128) wrote :

seems to be a linux issue, reassigning

Revision history for this message
John Moser (nigelenki) wrote :
Download full text (4.4 KiB)

Digging an Oops out of my kernel logs, they all look like this. From AMD64

Feb 23 07:52:06 localhost kernel: [33790.181014] Oops: 0000 [1] PREEMPT SMP
Feb 23 07:52:06 localhost kernel: [33790.181017] CPU 0
Feb 23 07:52:06 localhost kernel: [33790.181020] Modules linked in: af_packet nls_utf8 usb_storage rfcomm l2cap bluetooth powernow_k8 cpufreq_userspace cpufreq_stats freq_table cpufreq_powersave cpufreq_ondemand cpufreq_conservative via drm video tc1100_wmi sony_acpi pcc_acpi hotkey dev_acpi container button acpi_sbs battery i2c_acpi_ec ac nls_iso8859_1 nls_cp437 vfat fat ext2 dm_mod md_mod sr_mod sbp2 parport_pc lp parport ipv6 psmouse serio_raw snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul tsdev snd_seq_dummy snd_seq_oss snd_seq_midi snd_seq_midi_event snd_seq i2c_viapro snd_emu10k1 snd_rawmidi snd_ac97_codec
snd_ac97_bus usblp snd_pcm_oss snd_mixer_oss i2c_core pcspkr rtc ohci1394 ieee1394 snd_pcm snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd 8139cp 8139too mii soundcore shpchp usbhid pci_hotplug sg evdev xfs exportfs ehci_hcd uhci_hcd usbcore ide_generic ide_cd cdrom generic via82cxxx sd_mod sata_via
libata scsi_mod thermal processor fan capability commoncap vga16fb cf
Feb 23 07:52:06 localhost kernel: copyarea vgastate cfbimgblt cfbfillrect fbcon
tileblit font bitblit softcursor
Feb 23 07:52:06 localhost kernel: [33790.181069] Pid: 4204, comm: Xorg Not tainted 2.6.15-15-amd64-k8 #1
Feb 23 07:52:06 localhost kernel: [33790.181072] RIP: 0010:[_end+134037098/2132357120] <ffffffff88440e6a>{:via:via_mmFreeMem+10}
Feb 23 07:52:06 localhost kernel: [33790.181081] RSP: 0018:ffff8100348fddf8 EFLAGS: 00010202
Feb 23 07:52:06 localhost kernel: [33790.181085] RAX: 0000000000000001 RBX: ffff8100382e0000 RCX: 0000000000000000
Feb 23 07:52:06 localhost kernel: [33790.181088] RDX: 0000000000000001 RSI: ffff8100348fde18 RDI: 0000000005fdc280
Feb 23 07:52:06 localhost kernel: [33790.181091] RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000001
Feb 23 07:52:06 localhost kernel: [33790.181094] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
Feb 23 07:52:06 localhost kernel: [33790.181099] R13: ffff8100348fde18 R14: ffff810035238800 R15: ffff810031bc0000
Feb 23 07:52:06 localhost kernel: [33790.181103] FS: 00002aaaab7acce0(0000) GS:ffffffff80426800(0000) knlGS:0000000000000000
Feb 23 07:52:06 localhost kernel: [33790.181107] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Feb 23 07:52:06 localhost kernel: [33790.181110] CR2: 0000000005fdc288 CR3: 0000000035625000 CR4: 00000000000006e0
Feb 23 07:52:06 localhost kernel: [33790.181115] Process Xorg (pid: 4204, threadinfo ffff8100348fc000, task ffff81003ab688e0)
Feb 23 07:52:06 localhost kernel: [33790.181117] Stack: ffff8100382e0000 ffffffff8844151a 0000000100000000 ffff81002b346688
Feb 23 07:52:06 localhost kernel: [33790.181126] 0000000005fdc280 ffff81003bcdb4c0 ffff810035238800 0000000000000021
Feb 23 07:52:06 localhost kernel: [33790.181132] ffff81003bcdb4c0 ffff81003523884c
Feb 23 07:52:06 localhost kernel: [33790.181136] Call Trace:<ffffffff8844151a>{:via:via_final_context+202} <ffffffff8842a5a0>{:drm:drm_rmctx+...

Read more...

Revision history for this message
John Moser (nigelenki) wrote :

Increasing severity to critical, milestoning to dapper. This is a kernel bug; data loss happens when you crash your system a lot.

Current workaround is switching to VESA, but this causes slowness to the max if any video watching happens.

Changed in linux:
status: Needs Info → Confirmed
Changed in linux:
assignee: nobody → kernel-bugs
Revision history for this message
John Moser (nigelenki) wrote :

[ 1378.035582] agpgart: Found an AGP 3.0 compliant device at 0000:00:00.0.
[ 1378.035596] agpgart: Xorg tried to set rate=x12. Setting to AGP3 x8 mode.
[ 1378.035601] agpgart: Putting AGP V3 device at 0000:00:00.0 into 8x mode
[ 1378.035640] agpgart: Putting AGP V3 device at 0000:01:00.0 into 8x mode
[ 1378.252992] irq 201: nobody cared (try booting with the "irqpoll" option)
[ 1378.252998]
[ 1378.252999] Call Trace: <IRQ> <ffffffff80165aa5>{__report_bad_irq+53} <ffffffff80165d1a>{note_interrupt+538}
[ 1378.253030] <ffffffff80165407>{__do_IRQ+215} <ffffffff8011300f>{do_IRQ+47}
[ 1378.253052] <ffffffff80142af0>{ksoftirqd+0} <ffffffff80110480>{ret_from_intr+0}
[ 1378.253061] <EOI> <ffffffff8030fd62>{thread_return+82} <ffffffff8010e720>{default_idle+0}
[ 1378.253078] <ffffffff8010e758>{default_idle+56} <ffffffff8010e846>{cpu_idle+150}
[ 1378.253090] <ffffffff80439895>{start_kernel+485} <ffffffff80439286>{_sinittext+646}
[ 1378.253102]
[ 1378.253107] handlers:
[ 1378.253109] [<ffffffff8846c430>] (via_driver_irq_handler+0x0/0x170 [via])
[ 1378.253120] Disabling IRQ #201

This happens too, while switched to vesa. Yikes.

Revision history for this message
John Moser (nigelenki) wrote :

Matthew Garret asked me to test a patch he located below, which appears to solve the problem. I can't manually trigger the bug, so I had to test based on whether the system crashes in reasonable time or not.

To this end, I typically crash in about an hour**, give or take 45 minutes; picking a few (6) approximate variations I come up with 27.386 minutes for a standard deviation*.

I've been up for 17h35m now, so I'm about 38.5 standard deviations above the mean; 5 standard deviations gives a 99.9999999% probability of the result being non-chance, 38.5 means the sun should burn out before this happens.

In short, I'm pretty certain the below patch fixes the problem:

http://webcvs.freedesktop.org/dri/drm/shared-core/via_mm.c?r1=1.21&r2=1.22&makepatch=1&diff_format=u

In case anyone doesn't get it, sizeof(int) != sizeof(long) on 64-bit systems.

*My variations are {25,30,15,45,10,25} from 60, in minutes. It doesn't matter whether they're negative or positive; standard deviation is calculated by (sum(variation^2))^(1/2), so the sign is lost.

**My mean and variations are highly approximated. I have not made 2 hours, but I've made an hour, and also had failures within 10 minutes. Under these assumptions it is always provable to better than 99.999% that the increased system stability is related by non-chance to the changes made.

Changed in linux-source-2.6.15:
status: Confirmed → Fix Committed
Changed in linux-source-2.6.15:
status: Fix Committed → Fix Released
Curtis Hovey (sinzui)
Changed in linux-source-2.6.15 (Ubuntu):
assignee: Registry Administrators (registry) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.