2.6.35 hangs at boot due to regression in i915 or intel_agp

Bug #597075 reported by Kees Cook
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
High
linux (Ubuntu)
Fix Released
High
Tim Gardner
Maverick
Fix Released
High
Tim Gardner

Bug Description

All of the 2.6.35 kernels from maverick (through -4) have failed to boot for me. The hang is very early, and appears related to whatever the kernel is bringing up on it's own. Booting with "break=top" drops me to an initramfs prompt at which point the system hangs anyway. Since the e1000e driver isn't up yet, I can't even do netconsole to see if something is being hidden during the hang. 2.6.34's all work fine. I don't know what to do next to debug this beside bisecting which I think will end up being extremely time-consuming. :(
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.22.1.
AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: kees 4513 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xe0420000 irq 32'
   Mixer name : 'Realtek ALC268'
   Components : 'HDA:10ec0268,80860000,00100003'
   Controls : 17
   Simple ctrls : 11
DistroRelease: Ubuntu 10.10
HibernationDevice: RESUME=/dev/md1
Package: linux (not installed)
ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.34-5-generic root=/dev/mapper/systemvg-root2lv ro quiet splash
ProcEnviron:
 LANGUAGE=en_US:en
 PATH=(custom, user)
 LANG=en_US.utf8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.34-5.14-generic 2.6.34
Regression: Yes
RelatedPackageVersions: linux-firmware 1.36
Reproducible: Yes
RfKill:

Tags: maverick regression-potential needs-upstream-testing
Uname: Linux 2.6.34-5-generic x86_64
UserGroups: adm admin audio cdrom dialout dip floppy fuse libvirtd lpadmin mythtv plugdev sambashare sbuild scanner video
WpaSupplicantLog:

dmi.bios.date: 09/22/2008
dmi.bios.vendor: Intel Corp.
dmi.bios.version: JOQ3510J.86A.0954.2008.0922.2331
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: DQ35JO
dmi.board.vendor: Intel Corporation
dmi.board.version: AAD82085-800
dmi.chassis.type: 3
dmi.modalias: dmi:bvnIntelCorp.:bvrJOQ3510J.86A.0954.2008.0922.2331:bd09/22/2008:svn:pn:pvr:rvnIntelCorporation:rnDQ35JO:rvrAAD82085-800:cvn:ct3:cvr:

tags: added: kernel-candidate maverick
Revision history for this message
Kees Cook (kees) wrote : AlsaDevices.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Kees Cook (kees) wrote : ArecordDevices.txt

apport information

Revision history for this message
Kees Cook (kees) wrote : BootDmesg.txt

apport information

Revision history for this message
Kees Cook (kees) wrote : Card0.Amixer.values.txt

apport information

Revision history for this message
Kees Cook (kees) wrote : Card0.Codecs.codec.2.txt

apport information

Revision history for this message
Kees Cook (kees) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Kees Cook (kees) wrote : IwConfig.txt

apport information

Revision history for this message
Kees Cook (kees) wrote : Lspci.txt

apport information

Revision history for this message
Kees Cook (kees) wrote : Lsusb.txt

apport information

Revision history for this message
Kees Cook (kees) wrote : PciMultimedia.txt

apport information

Revision history for this message
Kees Cook (kees) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Kees Cook (kees) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Kees Cook (kees) wrote : ProcModules.txt

apport information

Revision history for this message
Kees Cook (kees) wrote : UdevDb.txt

apport information

Revision history for this message
Kees Cook (kees) wrote : UdevLog.txt

apport information

Revision history for this message
Kees Cook (kees) wrote : WifiSyslog.txt

apport information

Revision history for this message
Kees Cook (kees) wrote : Re: 2.6.35 hangs at boot

I've attached all the apport-collect details for the 2.6.34 kernel.

Changed in linux (Ubuntu):
status: New → Triaged
Revision history for this message
Kees Cook (kees) wrote :

This appears to be related to either i915 or intel_agp. I have to blacklist both or my system will hang during boot.

summary: - 2.6.35 hangs at boot
+ 2.6.35 hangs at boot due to regression in i915 or intel_agp
Revision history for this message
Kees Cook (kees) wrote :

Bug 597862 makes this very frustrating to debug.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

~/linux-2.6$ git log --pretty=oneline v2.6.34..v2.6.35-rc1 -- drivers/gpu/drm/i915/ | wc -l
71

It looks like i915 saw 71 commits between 2.6.34 and 2.6.35-rc1, so lets start with i915 and try to bisect...

~/linux-2.6$ git bisect start v2.6.35-rc1 v2.6.34 -- drivers/gpu/drm/i915
Bisecting: 35 revisions left to test after this (roughly 5 steps)
[461ed3caee9b615393eb5beb9a8148d230354b41] drm/i915: Add support of SDVO on Ibexpeak PCH

I've built the resulting bisected kernel and placed it at the following. Please test and let me know your results and I'll then kick off the next build.

http://people.canonical.com/~ogasawara/lp597075/461ed3c

Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
Kees Cook (kees) wrote :

Argh, sorry, I got this wrong. I've now retested all the published maverick kernels (I wanted to be _really_ sure before going down the bisect route here...)

Anyway:

2.6.34-5.14: ok
2.6.35-1.1: ok
2.6.35-2.2: FAIL
(and all the rest fail too)

So, sorry, the bisection should be between .35-1.1 and .35-2.2.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Thanks for verifying. 2.6.35-1.1 was based on upstream 2.6.35-rc1 and 2.6.35-2.2 was based on upstream 2.6.35-rc2. It looks like i915 saw 43 commits between 2.6.35-rc1 and 2.6.35-rc2. intel-agp looks like it had 0 changes:

~/linux-2.6$ git log --pretty=oneline v2.6.35-rc1..v2.6.35-rc2 -- drivers/gpu/drm/i915/ | wc -l
43
~/linux-2.6$ git log --pretty=oneline v2.6.35-rc1..v2.6.35-rc2 -- drivers/char/agp/intel-agp.c | wc -l
0

So lets stick with the bisect of the i915 changes.

~/linux-2.6$ git bisect start v2.6.35-rc2 v2.6.35-rc1 -- drivers/gpu/drm/i915/
Bisecting: 21 revisions left to test after this (roughly 5 steps)
[9553426372eef71c849499fb1d232f4b0577c0f9] drm/i915: Add CxSR support on Pineview DDR3

I've built the resulting bisected kernel and placed it at the following url. I realize the .deb has an odd version name, but please test anyways and let me know your results. Thanks.

http://people.canonical.com/~ogasawara/lp597075/9553426/

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Kees verified that http://people.canonical.com/~ogasawara/lp597075/9553426/ failed. Marking it as bad and building next kernel.

~/linux-2.6$ git bisect bad
Bisecting: 10 revisions left to test after this (roughly 3 steps)
[734b4157b367d66405f7dab80085d17c9c8dd3b5] drm/i915: Add support for interlaced display.

Next test kernel at:

http://people.canonical.com/~ogasawara/lp597075/734b415/

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Sweet, have verification 734b15 boot successfully. Marking it as good and building the next.

ogasawara@tyler:~/linux-2.6$ git bisect good
Bisecting: 5 revisions left to test after this (roughly 3 steps)
[9962c9252e46eda7058067cbe73bdf1ed74b0d37] drm/i915/dp: Only enable enhanced framing if the sink supports it

Next test kernel at:

http://people.canonical.com/~ogasawara/lp597075/9962c92/

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

9962c92 fails to boot. Marking it bad and building the next.

~/linux-2.6$ git bisect bad
Bisecting: 2 revisions left to test after this (roughly 1 step)
[7648fa99eb77a2e1a90b7beaa420e07d819b9c11] drm/i915: add power monitoring support

Next text kernel at:

http://people.canonical.com/~ogasawara/lp597075/7648fa9/

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

7648fa9 boots successfully. Marking it good and building the next.

~/linux-2.6$ git bisect good
Bisecting: 0 revisions left to test after this (roughly 1 step)
[9908ff736adf261e749b4887486a32ffa209304c] drm/i915: Kill dangerous pending-flip debugging

Next test kernel at:

http://people.canonical.com/~ogasawara/lp597075/9908ff7/

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

9908ff7 fails to boot. Marking it bad and building the next.

~/linux-2.6$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[9a7e8492d17394a81d5534abf90b5b2ada7ea3c0] drm/i915: Storage class should be before const qualifier

Next test kernel at:

http://people.canonical.com/~ogasawara/lp597075/9a7e849/

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

9a7e849 boots successfully. Marking it good which points to 9908ff7 as being the culprit. I've built two final test kernels. The first is a mainline 2.6.35-rc3 kernel with just the offending commit reverted:

http://people.canonical.com/~ogasawara/lp597075/2.6.35-rc3-9908ff7/

The second is the latest Maverick 2.6.35-6.7 kernel (which is based on mainline 2.6.35-rc3) with the offending commit reverted:

http://people.canonical.com/~ogasawara/lp597075/2.6.35-6.7-9908ff7/

Please test both and let me know your results. Thanks.

~/linux-2.6$ git bisect good
9908ff736adf261e749b4887486a32ffa209304c is the first bad commit
commit 9908ff736adf261e749b4887486a32ffa209304c
Author: Chris Wilson <email address hidden>
Date: Sat May 15 09:57:03 2010 +0100

    drm/i915: Kill dangerous pending-flip debugging

    We can, by virtue of a vblank interrupt firing in the middle of setting
    up the unpin work (i.e. after we set the unpin_work field and before we
    write to the ringbuffer) enter intel_finish_page_flip() prior to
    receiving the pending flip notification. Therefore we can expect to hit
    intel_finish_page_flip() under normal circumstances without a pending flip
    and even without installing the pending_flip_obj. This is exacerbated by
    aperture thrashing whilst binding the framebuffer

    References:

      Bug 28079 - "glresize" causes kernel panic in intel_finish_page_flip.
      https://bugs.freedesktop.org/show_bug.cgi?id=28079

    Reported-by: Nick Bowler <email address hidden>
    Signed-off-by: Chris Wilson <email address hidden>
    Cc: Jesse Barnes <email address hidden>
    Cc: <email address hidden>
    Reviewed-by: Jesse Barnes <email address hidden>
    Signed-off-by: Eric Anholt <email address hidden>

:040000 040000 8472a2b84c676e3c714e8c1f8392255b1959ae83 bd59f09180a36d398f7d0e63060851582593c14d M drivers

Revision history for this message
Kees Cook (kees) wrote :
Download full text (3.3 KiB)

So, this didn't work either, so I went back and did an unrestricted bisect across the entire rc1 to rc2 span. It resulted in:

kees@tyler:~/lp597075/bisect/linux-2.6$ git bisect start v2.6.35-rc2 v2.6.35-rc1
Bisecting: 373 revisions left to test after this (roughly 9 steps)
[b1413357d924792e2e332dcb6b712a7fb2a5fb25] fbdev: fix frame buffer devices menu
kees@tyler:~/lp597075/bisect/linux-2.6$ git bisect bad
Bisecting: 179 revisions left to test after this (roughly 8 steps)
[aef4b9aaae1decc775778903922bd0075cce7a88] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
kees@tyler:~/lp597075/bisect/linux-2.6$ git bisect bad
Bisecting: 111 revisions left to test after this (roughly 7 steps)
[64ffc9ff424c65adcffe7d590018cc75e2d5d42a] kbuild: Revert part of e8d400a to res
kees@tyler:~/lp597075/bisect/linux-2.6$ git bisect good
Bisecting: 63 revisions left to test after this (roughly 6 steps)
[08a66859e69264f3223560d06b88e80c1a6a6387] FS-Cache: Remove unneeded null checks
kees@tyler:~/lp597075/bisect/linux-2.6$ git bisect good
Bisecting: 31 revisions left to test after this (roughly 5 steps)
[85cd4612fdab4e837d7eea048a697c75d0477d3b] drm/i915: Check error code whilst moving buffer to GTT domain.
kees@tyler:~/lp597075/bisect/linux-2.6$ git bisect bad
Bisecting: 15 revisions left to test after this (roughly 4 steps)
[9962c9252e46eda7058067cbe73bdf1ed74b0d37] drm/i915/dp: Only enable enhanced framing if the sink supports it
kees@tyler:~/lp597075/bisect/linux-2.6$ git bisect bad
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[f41275e893191eeb7a88e431d594e167adbd5234] drm/i915: Convert more trace events to DEFINE_EVENT
kees@tyler:~/lp597075/bisect/linux-2.6$ git bisect good
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[7648fa99eb77a2e1a90b7beaa420e07d819b9c11] drm/i915: add power monitoring support
kees@tyler:~/lp597075/bisect/linux-2.6$ git bisect good
Bisecting: 1 revision left to test after this (roughly 1 step)
[f1befe71fa7a79ab733011b045639d8d809924ad] agp/intel: Restrict GTT mapping to valid range on i915 and i945
kees@tyler:~/lp597075/bisect/linux-2.6$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[9a7e8492d17394a81d5534abf90b5b2ada7ea3c0] drm/i915: Storage class should be before const qualifier
kees@tyler:~/lp597075/bisect/linux-2.6$ git bisect good
f1befe71fa7a79ab733011b045639d8d809924ad is the first bad commit
commit f1befe71fa7a79ab733011b045639d8d809924ad
Author: Chris Wilson <email address hidden>
Date: Tue May 18 12:24:51 2010 +0100

    agp/intel: Restrict GTT mapping to valid range on i915 and i945

    References:

      Bug 15733 - Crash when accessing nonexistent GTT entries in i915
      https://bugzilla.kernel.org/show_bug.cgi?id=15733

    On G33 and above, the size of the GTT space is determined by the GMCH
    control register. Prior to this revision, the size is determined by the
    size of the aperture. So we must careful to map and fill the appropriate
    range depending on chipset.

    Signed-off-by: Chris Wilson <email address hidden>
    Signed-off-by: Eric Anholt <email address hidden>

:040000 040000 a640ccd942ba...

Read more...

Revision history for this message
Kees Cook (kees) wrote :

I can confirm that reverting f1befe71fa7a79ab733011b045639d8d809924ad allows me to boot the -6 maverick kernel correctly.

Kees Cook (kees)
Changed in linux (Ubuntu):
milestone: none → maverick-alpha-2
Revision history for this message
Kees Cook (kees) wrote :
Changed in linux (Ubuntu):
milestone: maverick-alpha-2 → maverick-alpha-3
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Maverick):
assignee: Canonical Kernel Team (canonical-kernel-team) → Tim Gardner (timg-tpi)
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.35-7.12

---------------
linux (2.6.35-7.12) maverick; urgency=low

  [ Tim Gardner ]

  * [Upstream] i915: Use the correct mask to detect i830 aperture size.
    - LP: #597075

  [ Upstream Kernel Changes ]

  * (drop after 2.6.35) drm/radeon/kms: add ioport register access
    (squashed)
 -- Tim Gardner <email address hidden> Thu, 08 Jul 2010 09:53:13 -0600

Changed in linux (Ubuntu Maverick):
status: In Progress → Fix Released
Changed in linux:
status: Unknown → Fix Released
Changed in linux:
importance: Unknown → High
Revision history for this message
Brad Fitzpatrick (brad-danga) wrote : Ladies Watches

Hello Customer

If you want to find the present you wish the one who it’s for will like – visit Prestige and you will see here presents that will never be named as unneeded.
You will see the same resources, the same details and the same class.

--------------------
FANTASTIC WATCH. 100% AS DESCRIBED. MY JEWELER COULD NOT TELL IT WAS NOT A REAL ROLEX. I HIGHLY RECOMMEND DOING BUSINESS WITH THIS SELLER.
Cheers!
                     Brianna Medina
--------------------

Click here ---> http://gionc.ru

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.