Panda Boot hang w/ linaro 2.6.39 kernel

Bug #802693 reported by John Stultz
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro Android
Fix Released
Undecided
Unassigned
Linaro Linux
Fix Released
High
Avik Sil

Bug Description

As reported on linaro-kernel:

Trying to boot pandaboard w/ linaro-2.6.39 kernel hangs in early boot.
Enabling EARLY_PRINTK showed the hang happening after:

[ 0.420776] print_constraints: dummy:
[ 0.425170] NET: Registered protocol family 16
[ 0.430206] GPMC revision 6.0
[ 0.437591] OMAP GPIO hardware version 0.1
[ 0.442474] OMAP GPIO hardware version 0.1
[ 0.446807] omap_device: omap_gpio.2: new worst case activate latency 0: 3057
[ 0.454833] OMAP GPIO hardware version 0.1
[ 0.459686] OMAP GPIO hardware version 0.1
[ 0.464569] OMAP GPIO hardware version 0.1
[ 0.469390] OMAP GPIO hardware version 0.1
[ 0.476440] omap_mux_init: Add partition: #1: core, flags: 2
[ 0.483764] omap_mux_init: Add partition: #2: wkup, flags: 2
[ 0.489807] error setting wl12xx data
[ 0.496032] omap_device: omap_uart.0: new worst case deactivate latency 0: 37
[ 0.504150] omap_device: omap_uart.1: new worst case activate latency 0: 3057
[ 0.517211] hw-breakpoint: found 6 breakpoint and 1 watchpoint registers.
[ 0.524322] hw-breakpoint: 1 breakpoint(s) reserved for watchpoint single-st.
[ 0.531951] hw-breakpoint: maximum watchpoint size is 4 bytes.

I bisected the issue down to:
arm: omap4: support pmu - 4fefbe94a77c3d3b5e75d386628c36298c72c57f

However, after reverting that patch against linaro-2.6.39, it then hangs
a bit later after:

[ 2.630554] omapdss HDMI: cannot lock PLL
[ 2.630554] omapdss HDMI: CFG1 0xc00
[ 2.630554] omapdss HDMI: CFG2 0x2004
[ 2.630584] omapdss HDMI: CFG4 0x23955
[ 2.630584] omapdss HDMI error: failed to power on device
[ 2.630645] omapdss error: failed to power on
[ 2.630645] omapfb omapfb: Failed to enable display 'hdmi'
[ 2.630645] omapfb omapfb: failed to initialize default display
[ 2.632202] omapfb omapfb: failed to setup omapfb
[ 2.632232] omapfb: probe of omapfb failed with error -5
[ 2.632720] regulator_init_complete: CLK32KG: incomplete constraints, leavinn
[ 2.632873] regulator_init_complete: VAUX3_6030: incomplete constraints, lean
[ 2.633056] regulator_init_complete: VAUX2_6030: incomplete constraints, lean
[ 2.633209] regulator_init_complete: VDAC: incomplete constraints, leaving on
[ 2.633392] regulator_init_complete: VCXIO: incomplete constraints, leaving n
[ 2.633544] regulator_init_complete: VANA: incomplete constraints, leaving on
[ 2.633697] regulator_init_complete: VPP: incomplete constraints, leaving on
[ 2.633880] regulator_init_complete: VUSB: incomplete constraints, leaving on

This second hang however seems to be config related (likely something changed through the bisection), as it disappeared after I went back to my original config.

Revision history for this message
John Stultz (jstultz) wrote :

Config for sha 54dab05c329c5fb6121d89fddf9b82e0fed35efb from linaro-2.6.39 tree that can be used to reproduce the hang.

Revision history for this message
John Stultz (jstultz) wrote :

Interestingly it seems the second config specific hang (after "regulator_init_complete: VUSB: incomplete constraints, leaving on") seems to be linked to early printk.

Revision history for this message
Nicolas Pitre (npitre) wrote :

Avik, since the PMU patch appears to make a difference, could you have
a look and confirm if you get the same problem?

Changed in linux-linaro:
assignee: nobody → Avik Sil (aviksil)
Revision history for this message
Avik Sil (aviksil) wrote :

With the PMU patch I could boot to shell prompt with the config file attached in comment #1 (see attached log). Though sometimes I do get hang at "regulator_init_complete: VUSB: incomplete constraints, leaving on"

Revision history for this message
John Stultz (jstultz) wrote :

Yea, the second hang seems to be more sporadic.

Hrm. Maybe its a gcc versioning issue? My gcc is 4.4.0. Maybe that's too old? Or maybe there's an actual board difference?

Revision history for this message
John Stultz (jstultz) wrote :

Updating my crosscompiler to 4.6.0 to verify its a gcc issue or not.

Revision history for this message
John Stultz (jstultz) wrote :

Reproduced the same thing w/ gcc 4.6.0. Maybe something specific to the board then?

Revision history for this message
Deepak Saxena (dsaxena-linaro) wrote :

John, do you think this is likely related to #803142?

Revision history for this message
John Stultz (jstultz) wrote :

I don't think its directly related (as that bug deals with 2.6.38 based tree) , but bug #803142 does point out that uboot differences need to be taken into account as well.

Revision history for this message
Avik Sil (aviksil) wrote :

FYI, I've been using "U-Boot 2011.03 (Apr 20 2011 - 07:37:43)" and my Pandaboard is Rev A1.

Revision history for this message
Zach Pfeffer (pfefferz) wrote :

I see this with this build:

https://android-build.linaro.org/builds/~linaro-android/panda-11.06-release/#build=3

after I have replaced the uImage with jstultz's tip:

git://git.linaro.org/people/jstultz/android

commit 666bc7df26c1510701b3edb6d7e07b484b0081fe
Merge: 9072700 54dab05
Author: John Stultz <email address hidden>
Date: Fri Jun 24 15:59:23 2011 -0700

    Merge branch 'upstream/linaro.39' into linaro-android.39

    Conflicts:
        drivers/mmc/card/block.c
        drivers/mmc/core/core.c
        drivers/mmc/core/mmc_ops.c
        drivers/mmc/core/quirks.c
        include/linux/mmc/card.h
        include/linux/mmc/core.h
        include/linux/mmc/host.h

Revision history for this message
John Stultz (jstultz) wrote :

Zach seems to be seeing the same issue with "U-Boot 2011.06 (Jun 29 2011 - 07:59:26)"
http://pastebin.ubuntu.com/635883/

Revision history for this message
Zach Pfeffer (pfefferz) wrote :
Revision history for this message
Zach Pfeffer (pfefferz) wrote :

Info from #pandaboard

nhg> pfefferz: sounds like the function in the display driver that reads the resolution info from your monitor is failing...see some similar queries: http://www.spinics.net/lists/linux-omap/msg48636.html
<pfefferz> hey cool
<pfefferz> thanks

John Stultz (jstultz)
Changed in linux-linaro:
importance: Undecided → High
Revision history for this message
John Stultz (jstultz) wrote :

Avik: I realize you don't see the issue, and that makes it really hard to debug. Could you maybe send out a debug patch to me that would help narrow down where the issue is?

Otherwise, I might push pretty hard to get the patch reverted until the hang can be figured out. And I'd be happy to test patches from you so we could later get it back in.

Its just these sorts of breakage are way too common in the ARM tree these days (see the linux-3.0 ehci boot hang on panda), and its holding back our ability to make solid releases. So I think we really have to be diligent about quickly reverting code that breaks users if a fix isn't immediately found.

Revision history for this message
Zach Pfeffer (pfefferz) wrote :

Yeah...we're tracking tip and if its broken than people won't be able to develop. Going forward we won't be able to take commits that cause tip not to boot.

Revision history for this message
Avik Sil (aviksil) wrote :

John, I don't have any debug patch as such. The commit 4fefbe94a77c3d3b5e75d386628c36298c72c57f was taken from http://lists.infradead.org/pipermail/linux-arm-kernel/2011-March/045283.html written by Ming Lei.

Revision history for this message
John Stultz (jstultz) wrote :

If we don't have any steps to resolve this, I'm going to go ahead and push to revert.

Revision history for this message
warmcat (andy-warmcat) wrote :

The tilt-tracking-android tree has some kind of boot race in it at the moment with similar symptoms ~50% of the boots. It doesn't have this patch in.

It also hangs after reporting UART latency stuff.

I can also change the probability of seeing it with loglevel=8 and earlyprintk, so it seems like some kind of race.

3.0 doesn't have any of the CTI stuff in from 4fefbe94a77c3d3b5e75d386628c36298c72c57f so if these issues have a common cause, I am not sure that is it.

Revision history for this message
John Stultz (jstultz) wrote :

Nico has applied the revert. Upstream (3.0) doesn't seem to be affected, so marking this fixed.

Changed in linux-linaro:
status: New → Fix Committed
John Stultz (jstultz)
Changed in linaro-android:
status: New → Fix Committed
John Stultz (jstultz)
Changed in linux-linaro:
status: Fix Committed → Fix Released
Changed in linaro-android:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.