ruby1.8: fails to build on ppc under 2.6.15 kernel

Bug #61861 reported by Jonathan Riddell
10
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Low
Unassigned
ruby1.8 (Ubuntu)
Fix Released
Medium
Matthias Klose

Bug Description

Build segfaults on ppc, but only on the machines in the datacentre so far (that are running 64bit kernels)

Colin Watson (cjwatson)
description: updated
Revision history for this message
Colin Watson (cjwatson) wrote :

This is a threading bug. While applying gdb to the problem, I noticed that (a) the thing that was NULL was prot_tag and (b) prot_tag was fiddled with in some rb_thread_* functions, so I tried building with threading disabled, and it built fine (at least 'debian/rules build'; 'debian/rules binary' failed because Ruby/Tk doesn't work when Ruby is built without pthreads but Tk is built with pthreads).

Obviously this isn't actually a viable workaround due to the Ruby/Tk problem and the fact that building without pthreads almost certainly changes libruby's ABI, but it does indicate where to start looking. mono had a similar problem, and we worked around it for a while with something along the lines of:

#ifdef __powerpc__
#include <sched.h>
#endif

#ifdef __powerpc__
    cpu_set_t cpuset;
    CPU_ZERO(&cpuset);
    CPU_SET(0, &cpuset);
    sched_setaffinity(0, sizeof(cpuset), &cpuset);
#endif

Changed in ruby1.8:
importance: Untriaged → Medium
status: Unconfirmed → Confirmed
Revision history for this message
Colin Watson (cjwatson) wrote :

(The sched_setaffinity rune above effectively disables concurrent execution of threads without ABI problems by binding the process to run on only a single CPU. It is definitely not an optimal solution.)

Revision history for this message
Colin Watson (cjwatson) wrote :

Unfortunately, my sched_setaffinity workaround doesn't actually appear to work.

Revision history for this message
Fabio Massimo Di Nitto (fabbione) wrote :

Ben,

do you happen to have any idea what the problem could be? the kernel on these machines is .15 from dapper ppc64.

Thanks
Fabio

description: updated
Revision history for this message
Benjamin Herrenschmidt (benh-kernel) wrote :

Can I get a quick step-by-step howto reproduce the build environment ? I have a quad g5 running dapper here that should be able to reproduce. Or at least how to get the source of the bits that are segfaulting...

There is something that comes to mind right away though: rb_thread_* . I don't know what that is, but if something is trying to invent it's own threading/locking primitives without using glibc, then it's most likely to get them wrong (incorrect barriers especially).

Revision history for this message
Fabio Massimo Di Nitto (fabbione) wrote :

The easiest way to reproduce it is to upgrade to edgy running dapper kernel and:

apt-get source ruby1.8
sudo apt-get build-dep ruby1.8
sudo apt-get install fakeroot
cd ruby1.8-$version
dpkg-buildpackage -rfakeroot -uc -us -b

If upgrading to edgy is not an option you can easily use debootstrap to create a chroot.

Thanks!
Fabio

Revision history for this message
Benjamin Herrenschmidt (benh-kernel) wrote :

Well, it didn't reproduce on my quad g5, at least not right away. However, looking at the source is a bit scary ... that thing is just a steaming pile of poo to re-use paulus expression... especially the way it "synchronizes" with the timer thread without using any synchronisation primitives, not even the (generally bogus) volatile, etc... it's scray how bad that code is and I would definitely expect random behaviour in races with it's timer thread thingy. I don't know if it also takes signals but if it does, it's racy too, things like rb_disable_interrupt++ to mask interrupts just don't work....

Revision history for this message
Fabio Massimo Di Nitto (fabbione) wrote :

I have asked our sysadmins to upgrade/install an edgy kernel on davis (where we can reproduce the problem) to see if a new kernel fixes this issue. ETA is within 24/48 hours. davis also needs some hw love and that's why they can't reboot it right away.

Fabio

Revision history for this message
Benjamin Herrenschmidt (benh-kernel) wrote : Re: [Bug 61861] Re: fails to build on ppc

On Tue, 2006-09-26 at 05:46 +0000, Fabio Massimo Di Nitto wrote:
> I have asked our sysadmins to upgrade/install an edgy kernel on davis
> (where we can reproduce the problem) to see if a new kernel fixes this
> issue. ETA is within 24/48 hours. davis also needs some hw love and
> that's why they can't reboot it right away.

Note that I still reckon the code is a pile of crap and it's just
problems waiting to happen ... it's totally racy.

Ben.

Revision history for this message
Fabio Massimo Di Nitto (fabbione) wrote :

I can't agree more on that but it's still worth to dig into why a kernel can make a difference IMO.

Fabio

Revision history for this message
Benjamin Herrenschmidt (benh-kernel) wrote : Re: [Bug 61861] Re: ruby1.8: fails to build on ppc under 2.6.15 kernel

On Tue, 2006-09-26 at 17:03 +0000, Fabio Massimo Di Nitto wrote:
> I can't agree more on that but it's still worth to dig into why a kernel
> can make a difference IMO.

Could be signal & get/setcontext issues we fixed, I think, in 2.6.16 or
17. Ruby seems to use them.

Ben.

Revision history for this message
Fabio Massimo Di Nitto (fabbione) wrote :

confirmed that it does build with .17.

Ben do you have the patch to fix these signal & get/setcontext issues somewhere? is it worth to backport it to .15 (if possible at all?)

Fabio

Changed in linux-source-2.6.15:
importance: Undecided → Low
status: Unconfirmed → Confirmed
Revision history for this message
Benjamin Herrenschmidt (benh-kernel) wrote :

On Wed, 2006-09-27 at 03:25 +0000, Fabio Massimo Di Nitto wrote:
> confirmed that it does build with .17.
>
> Ben do you have the patch to fix these signal & get/setcontext issues
> somewhere? is it worth to backport it to .15 (if possible at all?)

No patch at hand. From memory, there have been a load of signal related
fixes around 2.6.16 or so, backporting might not be simple. You can try
to diff signal_32.c but it might have dependencies on changes in the asm
code...

I'll try to have a look maybe next week or later this week.

Ben

Revision history for this message
Fabio Massimo Di Nitto (fabbione) wrote :

this is no more a blocker for beta release

Revision history for this message
Zhuq! (viruzhuqi) wrote :

I'm not sure if this shold be place in this same catagory, but the package "Alexandria" was unexpectedly quit sometimes. Please refer to the attached file for further investigation. Thank you.

Revision history for this message
Matthias Klose (doko) wrote :

closing the ruby1.8 task; builds ok in feisty and gutsy

Changed in ruby1.8:
assignee: nobody → doko
status: Confirmed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote : This bug is now reported against the 'linux' package

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this linux-source-2.6.15 kernel bug to the new "linux" package. We appreciate your patience and understanding as we make this transition. Also, if you would be interested in testing the upcoming Intrepid Ibex 8.10 release, it is available at http://www.ubuntu.com/testing . Please let us know your results. Thanks!

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
MillenniumBug (millenniumbug) wrote :

This bug report concerns only obsolete releases of Ubuntu. Is it still relevant?

Revision history for this message
MillenniumBug (millenniumbug) wrote :

Closing this bug report due to obsolete versions and lack of activity.

If a similar bug appears in Ubuntu 10.10, please open a new report.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.