Random crashes with 8.04 beta (512+128 MB RAM with interleave disabled)

Bug #213747 reported by André Pirard
26
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Random crashes occur some time after Ubuntu 8.04 beta has booted.
As if a badly processed interrupt crashed anything that's currently running.
Rather frequent at first (like 0 to 4 in 10 minutes), they get rare afterwards.
Could someone analyze the crash and tell me how to investigate?
How do I have the bug reporter add more data to the same bug?
Hoping to add a brick to this wonderfully great building.
Thanks.
aaaaa

ProblemType: Crash
Architecture: i386
Date: Tue Apr 8 05:37:44 2008
Disassembly: 0xb7d041c8:
DistroRelease: Ubuntu 8.04
ExecutablePath: /bin/chown
Package: coreutils 6.10-3ubuntu2
PackageArchitecture: i386
ProcCmdline: chown --reference=/etc/resolv.conf /etc/resolv.conf.dhclient-new
ProcEnviron: PATH=/usr/ucb:/usr/bin:/usr/sbin:/bin:/sbin
Signal: 11
SourcePackage: coreutils
Stacktrace:
 #0 0xb7d041c8 in ?? ()
 #1 0xb7e7d912 in getpwuid () from /lib/libc.so.6
 #2 0x0804aa6d in ?? ()
 #3 0x00000000 in ?? ()
StacktraceTop:
 ?? ()
 getpwuid () from /lib/libc.so.6
 ?? ()
 ?? ()
Title: chown crashed with SIGSEGV in getpwuid()
Uname: Linux 2.6.24-15-generic i686
UserGroups:

Revision history for this message
André Pirard (a.pirard) wrote :
Revision history for this message
Apport retracing service (apport) wrote : Symbolic stack trace

StacktraceTop:?? ()
getpwuid () from /lib/libc.so.6
uid_to_name (uid=0) at chown-core.c:100
main (argc=) at chown.c:302

Revision history for this message
Apport retracing service (apport) wrote : Symbolic threaded stack trace
Revision history for this message
Daniel Hahler (blueyed) wrote : Re: Random crashes with 8.04 beta

That appears to be a very strange bug, when "chown" segfaults..

Can you please submit another crash report, so we can see if it's related at all?

Note: with kernel 2.6.24-16-generic boot fails completely: bug 217639
Like I've said there, please test your hardware using memtest86.

Revision history for this message
André Pirard (a.pirard) wrote : Re: [Bug 213747] More reports

(sorry for delay : mountable devices playing tricks on me again)
Two more crash samples from the harvest below (I said "random"!!!)
I may upload the rest to bugs.launchpad or send them privately.

I had considered a RAM problem too.
But RAM was tested very thoroughly last year with Live CD.
Will test it again overnight (35 min done OK already).

Crashes show soon after Login, calm down but may reappear later in a row.
I mean, at least visible ones.
The system may also be damaged in various ways and behave unpredictably.
(lately, WiFi would no longer start unless rebooted)
I made camera shots of the AMI BIOS setup.
Would you like them and where?

Please say what you've seen in reports and your feelings too.
I'll know what to be watching then.

Thanks Daniel, and Wenzhuo.
Guys like you make chasing bugs almost a pleasure ;-)

André.

_sbin_dhclient-script.0.crash
_usr_bin_gnome-panel.1000.crash
_usr_bin_nautilus.1000.crash
_usr_bin_nm-applet.1000.crash
_usr_bin_update-notifier.1000.crash
_usr_bin_Xorg.0.crash
_usr_lib_firefox-3.0b5_firefox.1000.crash
_usr_lib_gnome-applets_modem_applet.1000.crash
_usr_lib_gvfs_gvfsd-trash.1000.crash
_usr_lib_hal_hald-runner.0.crash
_usr_sbin_avahi-autoipd.0.crash
_usr_sbin_console-kit-daemon.0.crash

Revision history for this message
André Pirard (a.pirard) wrote : Memory tests results

Memory test results attached.
As I expected, ran for more than 9 hours without a fault.
Note that when a technician once added 256 to the existing 128 K, he didn't notice there was a problem.
I suspiciously ran a memory test and indeed it failed, but only after some time.
Memory tests must always be run full cycle.
What I suspected was right : I had to disable RAM interleaving in the BIOS setup.
I did and that's written on a note I left inside the case.
This all should be transparent to the software, though.

Revision history for this message
André Pirard (a.pirard) wrote : Bingo

The reaction of the knowing one was to try pulling the 128 MB out.
And this is what happened by running the 512 MB alone.
- 2.6.24-16-generic booted !!!
- could have been just luck, but there was no crash in about 10 min.
- testing reboot and my dinner is disturbed by resets making booting loop
- but in fact, the reason is that fsck is silently rebooting the machine
- the naive user I could have been needed to boot in recovery mode, to guess that fsck /etc/sda6 is what I had to type in a self-started (by fsck) maintenance shell and to reply yes to a series of question that even the less naive me didn't understand completely.
- after that, *** boot and run are now quite normal *** , yeah, wow, 'rrrray.

So, the bottom lines are :

- what on earth makes Linux fail with a properly configured RAM (PC tech + my correction) in which both Linux memory test and Windows XP are happy?
- a memory mapping issue I should say, but what?
- Does one want to investigate? Practically speaking, I'd prefer to move on to other urgent problems like making a linmodem work, and I would buy a second 512 MB to increase memory and speed (interleaving speeds up). But I will help those who help.

- Am I a fool to strive to setup a 100 EUR Ubuntu laptop to offer to a little Belorussian girl to communicate with us by e-mail, play and learn? Will she be able to do what I did above to get rid of the fsck problem? I can manage some Russian by e-mail but it's impossible to describe/solve such problems on the phone let alone SMS. Won't she dump Linux and use Windows instead?

Thanks so far for the dialog that led me to what had to be thought.

Revision history for this message
Daniel Hahler (blueyed) wrote :

Thank you very much for tracking this down.

If you feel still adventurous, you could track it down to code changes in the kernel, by using git-bisect. I've not used it myself, but generally you would pull the Ubuntu kernel sources, and the use "git bisect" to find the particular commit, which caused this.

See e.g. http://kerneltrap.org/node/11753 - it's just one of the results that turned up when googling for "git-bisect" though.

You can find the Git sources of the Ubuntu kernel at: http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=summary and you can then build the kernel using e.g.
AUTOBUILD=1 NOEXTRAS=1 fakeroot debian/rules custom-binary-generic

See also https://wiki.ubuntu.com/KernelTeam/GitKernelBuild.

It would be great, if you could even track it further down, but for now it's great that you've found a workaround.

Revision history for this message
André Pirard (a.pirard) wrote :

Daniel, Ubuntu is very happy with memory reduced to 256 MB.
I don't think this problem must be tackled statistically.
Why does a 256 MB work and not a 256 + 128 MB?
(Correctly configured without interleaving).
Many would call that a hardware problem.
But the memory test passes OK and Windows runs happily.
But I may be the only one on earth with this problem.
So, I'll probably buy another 256 MB.
And if someone wants to have me make tests, I'm here.
But that would be after august '08, please.
Thanks for your cooperation. André.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
André Pirard (a.pirard) wrote :

Fix released : I see no way to indicate another kind of solution to a valid problem, in this case a hardware change.

Changed in linux (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.