kernel 2.6.31-2.16 is crashing on boot

Bug #396780 reported by Christophe Dumez
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Medium
linux (Ubuntu)
Fix Released
High
Stefan Bader

Bug Description

Binary package hint: linux-image-2.6.31-2-generic

Kernel 2.6.31 is crashing on boot on my computer. If I boot in normal mode, I simply see a lot of Input/Output errors and boot stops before launching X. If I run in safe mode, I can get a backtrace which I will post here.

I'm using a DELL XPS M1330 laptop with up-to-date Karmic. At the moment, I have to run Jaunty kernel which works fine.

kernel 2.6.30 boots fine on my laptop. Problems started with v2.6.31-rc1 but I was also affected by bug 392709 at that moment.

The backtrace is:

rcu_do_batch+0x27/0x90
__rcu_process_callbacks+0xc8/0x100
tick_handle_oneshot_broadcast+0xdd/0x100
rcu_process_callbacks+0x20/0x40
timer_interrupt+0x21/0x70
handle_IRQ_event+0x56/0x120
do_softirq+0x3c/0x40
irq_exit+0x5c/0x70
do_IRQ+0x4f/0xc0
common_interrupt+0x29/0x30
sys_getresuid+0x3b/0x70
acpi_idle_enter_bm+0x19a/0x1c9
cpuidle_idle_call+0x6f/0xc0
cpu_idle+0x42/0x80
start_secondary+0xae/0cd0

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :
Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

And here is the output in normal boot mode. It stops on this screen.

If I type "sudo startx" it will display "input/output error" too (and it won't launch).

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :
Revision history for this message
Christophe Dumez (hydr0g3n) wrote :
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Christophe,

Thanks for all the debug info. If you get a chance, care to also quickly test with the latest mainline 2.6.31-rc2 kernel - https://wiki.ubuntu.com/KernelTeam/MainlineBuilds . Just want to make sure this is not an Ubuntu specific regression that's been introduced. Thanks!

tags: added: regression-potential
Changed in linux (Ubuntu):
importance: Undecided → High
status: New → Triaged
Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

I could but I don't see any i386 image in this folder:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.31-rc2/

According to the doc (https://wiki.ubuntu.com/KernelTeam/MainlineBuilds), there should be one but I can find it. There is only an image for amd64 architecture.

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

looks like rc2 did not build correctly:
ERROR: "__udivdi3" [drivers/gpu/drm/i915/i915.ko] undefined!
make[3]: *** [__modpost] Error 1
make[2]: *** [modules] Error 2
make[1]: *** [sub-make] Error 2
make[1]: Leaving directory `/home/kernel-ppa/mainline/build'
make: *** [/home/kernel-ppa/mainline/build/debian/stamps/stamp-build-generic] Error 2

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

Well, I'm compiling 2.6.31-rc2 vanilla kernel myself. I'm using the same .config as the one used by Ubuntu package though to rule out misconfiguration. I'll report back when I'm done.

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

I can confirm that I experience exactly the same problem with 2.6.31-rc2 vanilla kernel. I may try a git snapshot later just to make sure it has not already been fixed upstream.

Steve Beattie (sbeattie)
Changed in linux (Ubuntu):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

I can also confirm that the problem is still present in vanilla kernel 2.6.31-rc2-git4.

Revision history for this message
Andy Whitcroft (apw) wrote :

Yes that is a bug in the kernel, that stops it compiling with older compilers. I have had this fixed in mainline and the daily build below has a i386 kernel. v2.6.31-rc3 mainline build will also have the fix when it builds today:

    http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2009-07-13/

Andy Whitcroft (apw)
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

I grabbed rc3 from here:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.31-rc3/

I'm sorry to say that there is still no change. I still get a lot of input/output errors.

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

I replace "splash quiet" by "debug" in boot line to get the crash backtrace. You're probably missing the beginning of the backtrace but when this happens, I have no control over the laptop (LEDs are are flashing and keyboard is unresponsive).

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

I set the screen resolution as high as I could and I took the picture with a better camera. I got more of the backtrace and I hope it will be more helpful.

I still cannot get the whole backtrace but if anyone has a way to do so, please tell me.

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

The backtrace does not seem to be the same with mainline kernel (I tried today's build) so I decided to post it too.

description: updated
Changed in linux:
status: Unknown → Confirmed
Revision history for this message
Christophe Dumez (hydr0g3n) wrote :
Revision history for this message
Christophe Dumez (hydr0g3n) wrote :
description: updated
Stefan Bader (smb)
Changed in linux (Ubuntu):
assignee: Canonical Kernel Team (canonical-kernel-team) → Stefan Bader (stefan-bader-canonical)
Changed in linux:
status: Confirmed → Incomplete
Revision history for this message
Stefan Bader (smb) wrote :

This is just a wild guess, but would booting with "idle=halt" or "idle=poll" succeed?

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

In the other bug report, they also asked for idle=poll. It did not fix the problem. I will try idle=halt later.

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

"idle=halt" did not change anything either. The call trace is not exactly the same but it looks similar.

Revision history for this message
Stefan Bader (smb) wrote :

Ok, thanks for trying. There was on change between 2.6.30 and 2.6.31 which touched the area which is present int the backtrace. I got test kernels on http://people.canonical.com/~smb/bug387161/ which would revert that change. If that still crashes, the best approach will be to provide precompiled bisection kernels between 2.6.30 and 2.6.31-rc1.
When you said 2.6.30 worked, was that an Ubuntu kernel or a Mainline kernel?

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

Thanks for the test kernel, I'll try it.

I did not try 2.6.30 from Ubuntu (I switched to Karmic when 2.6.31rc1 was already in the respositories). I used 2.6.30 mainline.

I'm currently using git bisect to identify the bad commit. I hope I can get it in a few hours.

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

Your test kernel is still crashing with the same backtrace but I believe I get some new output that can be interesting:
kernel BUG at /home/stefan/builds/karmic-i386/ubuntu-2.6/kernel/cred.c:60!

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

Thanks to git-bisect, I identified the following commit as the problem:
commit 936940a9c7e3d99b25859bf1ff140d8c2480183a
Merge: 09ce42d 1cbd20d
Author: Linus Torvalds <email address hidden>
Date: Wed Jun 24 10:03:12 2009 -0700

    Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits)
      switch xfs to generic acl caching helpers
      helpers for acl caching + switch to those
      switch shmem to inode->i_acl
      switch reiserfs to inode->i_acl
      switch reiserfs to usual conventions for caching ACLs
      reiserfs: minimal fix for ACL caching
      switch nilfs2 to inode->i_acl
      switch btrfs to inode->i_acl
      switch jffs2 to inode->i_acl
      switch jfs to inode->i_acl
      switch ext4 to inode->i_acl
      switch ext3 to inode->i_acl
      switch ext2 to inode->i_acl
      add caching of ACLs in struct inode
      fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls
      cleanup __writeback_single_inode
      ... and the same for vfsmount id/mount group id
      Make allocation of anon devices cheaper
      update Documentation/filesystems/Locking
      devpts: remove module-related code
      ...

Note that I'm using jfs so it could be related.

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

I'm joining mount output since it seems related to the filesystem.

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

I got a bit more precise now, still using git-bisect.

The problem occurred after :
[6582a0e6f6bc7bf64817b9e1a424782855292ab0] switch ext3 to inode->i_acl

and of course before:
[936940a9c7e3d99b25859bf1ff140d8c2480183a] Merge branch 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

still using git-bisect.

The problem occurred after :
[290c263bf83cd78e53b1aa3b42165f588163f2be] switch jffs2 to inode->i_acl

and of course before:
[936940a9c7e3d99b25859bf1ff140d8c2480183a] Merge branch 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

still using git-bisect.

The problem occurred after :
[281eede0328c84a8f20e0e85b807d5b51c3de4f2] switch reiserfs to inode->i_acl

and of course before:
[936940a9c7e3d99b25859bf1ff140d8c2480183a] Merge branch 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6

Revision history for this message
Stefan Bader (smb) wrote : Re: [Bug 396780] Re: kernel 2.6.31-2.16 is crashing on boot

Christophe Dumez wrote:
> still using git-bisect.

Hi Christophe,

perfect. Just the thing I would have proposed to do. So it is narrowed down
quite a bit. And with a bit more trials, this should lead to the offender.

> The problem occurred after :
> [290c263bf83cd78e53b1aa3b42165f588163f2be] switch jffs2 to inode->i_acl

Just to make sure not to misunderstand: this one is still good?

> and of course before:
> [936940a9c7e3d99b25859bf1ff140d8c2480183a] Merge branch 'for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
>

Assuming your thought if jfs is true (and this would make a log of sense as it
must be something not in the usual install, otherwise we should see much more
problems), there is one commit between your current place and the merge that
touches the jfs code. And that would be "helpers for acl caching + switch to
those"...

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

[290c263bf83cd78e53b1aa3b42165f588163f2be] switch jffs2 to inode->i_acl
and all commits before that are GOOD.

I'm compiling
[073aaa1b142461d91f83da66db1184d7c1b1edea] helpers for acl caching + switch to those

as we speak.

Revision history for this message
Stefan Bader (smb) wrote :

Christophe Dumez wrote:
> [290c263bf83cd78e53b1aa3b42165f588163f2be] switch jffs2 to inode->i_acl
> and all commits before that are GOOD.
>
> I'm compiling
> [073aaa1b142461d91f83da66db1184d7c1b1edea] helpers for acl caching + switch to those
>
> as we speak.
>

if that if bad, can you try this change? It looks like doing the wrong thing here.

diff --git a/fs/jfs/acl.c b/fs/jfs/acl.c
index f272bf0..3c88d1b 100644
--- a/fs/jfs/acl.c
+++ b/fs/jfs/acl.c
@@ -67,10 +67,8 @@ static struct posix_acl *jfs_get_acl(struct inode *inode, int
                 acl = posix_acl_from_xattr(value, size);
         }
         kfree(value);
- if (!IS_ERR(acl)) {
+ if (!IS_ERR(acl))
                 set_cached_acl(inode, type, acl);
- posix_acl_release(acl);
- }
         return acl;
  }

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

Thanks, I'll sure do but it takes a lot of time to compile here.

Revision history for this message
Stefan Bader (smb) wrote :

If this finally turns out to be the solution, I prepared the following patch, which I would submit to upstream (If you agree to mention your name and email in there). Many thanks for the great work! Have you ever thought of joining us at #ubuntu-kernel on Freenode IRC? We are always looking for skilled people to build a better kernel community.

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

I have great news :)

Your patch does work. I also tested it with rc4 and it boots just fine with your patch. Of course, I also tested rc4 with the same config but without your patch to make sure the bug had not been fixed in the meantime.

Thanks a lot for your work.

Of course, I agree to mention my name upstream.
Surname: Dumez
First name: Christophe
e-mail: <email address hidden>

Revision history for this message
Christophe Dumez (hydr0g3n) wrote :

Ok. I'll stay connected to #ubuntu-kernel channel then. My nickname is chris-qBT. Thanks for the info.

Stefan Bader (smb)
Changed in linux (Ubuntu):
status: Incomplete → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.31-4.20

---------------
linux (2.6.31-4.20) karmic; urgency=low

  [ Andy Whitcroft ]

  * SAUCE: iscsitarget -- update to SVN revision r214
  * SAUCE: iscsitarget -- renable driver
  * [Config] consolidate lpia/lpia and i386/generic configs
  * [Config] enable CRYPTO modules for all architectures
  * [Config] enable cryptoloop
  * [Config] enable various filesystems for armel
  * [Config] sync i386 generic and generic-pae
  * [Config] add the 386 (486 processors and above) flavour
  * [Config] re-set DEFAULT_MMAP_MIN_ADDR
    - LP: #399914
  * add genconfigs/genportsconfigs to extract the built configs
  * updateconfigs -- alter concatenation order allow easier updates
  * intelfb -- INTELFB now conflicts with DRM_I915
  * printchanges -- rebase tree does not have stable tags use changelog
  * AppArmor: fix argument size missmatch on 64 bit builds

  [ Ike Panhc ]

  * Ship bnx2x firmware in nic-modules udeb
    - LP: #360966

  [ Jeff Mahoney ]

  * AppArmor: fix build failure on ia64

  [ John Johansen ]

  * AppArmour: ensure apparmor enabled parmater is off if AppArmor fails to
    initialize.
  * AppArmour: fix auditing of domain transitions to include target profile
    information
  * AppArmor: fix C99 violation
  * AppArmor: revert reporting of create to write permission.
  * SAUCE: Add config option to set a default LSM
  * [Config] enable AppArmor by default
  * AppArmor: Fix NULL pointer dereference oops in profile attachment.

  [ Keith Packard ]

  * SAUCE: drm/i915: Allow frame buffers up to 4096x4096 on 915/945 class
    hardware
    - LP: #351756

  [ Luke Yelavich ]

  * [Config] add .o files found in arch/powerpc/lib to all powerpc kernel
    header packages
    - LP: #355344

  [ Michael Casadevall ]

  * [Config] update SPARC config files to allow success build

  [ Scott James Remnant ]

  * SAUCE: trace: add trace_event for the open() syscall

  [ Stefan Bader ]

  * SAUCE: jfs: Fix early release of acl in jfs_get_acl
    - LP: #396780

  [ Tim Gardner ]

  * [Upstream] Fix Soltech TA12 volume hotkeys not sending key release
    - LP: #397499
  * [Upstream] USB Option driver - Add USB ID for Novatel MC727/U727/USB727
    refresh
    - LP: #365291
  * [Config] SSB/B44 are common across all arches/flavours.

  [ Upstream ]

  * Rebased to 2.6.31-rc4

 -- Andy Whitcroft <email address hidden> Thu, 23 Jul 2009 08:41:39 +0100

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Changed in linux:
importance: Unknown → Medium
Changed in linux:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.