Intermittently fails to load on amd64

Bug #72639 reported by Troy James Sobotka
16
Affects Status Importance Assigned to Milestone
GLibC
Fix Released
Medium
glibc (Ubuntu)
Fix Released
Undecided
Matthias Klose

Bug Description

Binary package hint: evolution

Occasionally, when attempting to launch Evolution, the main window does not load after displaying the Evolution Starting window.

Subsequent loads might or might not work.

Eventually, you will manage to get it to load.

Revision history for this message
In , Suzuki-in (suzuki-in) wrote :

While running some stress tests on one of our application, we encountered an
assert() in ld.so as follows:

"Inconsistency detected by ld.so: dl-open.c: 610: _dl_open: Assertion
`_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!

with glibc-2.4.31. This race seems to be present in the libc I got from the CVS
[at code inspection]. We were able to reproduce this consistently within 4-5hrs
of run.

Upon debugging we found that it is due to a race between two threads doing a
_dl_open().

The scenario is something like this :

In elf/dl-open.c, _dl_open:

  /* Make sure we are alone. */
  __rtld_lock_lock_recursive (GL(dl_load_lock));

[...]

  int errcode = _dl_catch_error (&objname, &errstring, &malloced,
                                 dl_open_worker, &args);
#ifndef MAP_COPY
  /* We must munmap() the cache file. */
  _dl_unload_cache ();
#endif

  /* Release the lock. */
  __rtld_lock_unlock_recursive (GL(dl_load_lock));

^^^^^ This would kick any other thread waiting on the lock.

if (__builtin_expect (errstring != NULL, 0))
  {
     [...]
   assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);
  }

assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT);

And, if the thread which gets woken up is playing with the same namespace, and
sets the r_state to RT_ADD in _dl_map_object_from_fd even before we reach here
(truly possible in an SMP system), ( due to getting scheduled out ), we would
hit the assert !

So, it is not safe to believe that the r_state won't get changed once we release
the lock.

Revision history for this message
In , Suzuki-in (suzuki-in) wrote :

Created attachment 1391
patch to fix the race

This patch has been tested to fix the issue.

Comments ?

Thanks

Revision history for this message
In , Drepper-fsp (drepper-fsp) wrote :

You're addressing a real problem. The assert are unimportant by the _dl_close
call must be protected. This is fixed now.

Revision history for this message
In , Suzuki-in (suzuki-in) wrote :

(In reply to comment #2)
> You're addressing a real problem. The assert are unimportant by the _dl_close
> call must be protected. This is fixed now.

So could you please let us know if there is already a patch existing for the
issue ? Or can we use this patch as the final fix ?

Thanks.

Revision history for this message
Sebastien Bacher (seb128) wrote :

Thank you for your bug. What version of Ubuntu do you use. Do you have anything to .xsession-errors about that when you get the bug?

Changed in evolution:
assignee: nobody → desktop-bugs
status: Unconfirmed → Needs Info
Revision history for this message
Troy James Sobotka (troy-sobotka) wrote :

Ubuntu 6.10 amd64

Attached some .xsession errors related to Evolution.

Revision history for this message
Troy James Sobotka (troy-sobotka) wrote :

Managed to duplicate bug in the following fashion:

1) First load Evolution and select a folder other than Inbox. Select a mail message.
2) Load another folder. Shutdown.

3) Upon attempting to restart Evolution, the application fails to load and simply dies after attempting to load. Attachment includes fresh error report.

Hope this helps.

Revision history for this message
Troy James Sobotka (troy-sobotka) wrote :

Note that Evolution will fail to load _without_ following the technique described to trigger the event.

Revision history for this message
Sebastien Bacher (seb128) wrote :

The log has a "Inconsistency detected by ld.so: dl-open.c: 604: _dl_open: Assertion `_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!" mention, weird message, maybe a libc problem?

Changed in evolution:
status: Needs Info → Unconfirmed
Revision history for this message
Sebastien Bacher (seb128) wrote :

apparently that's a libc race condition to _dl_open: http://sourceware.org/bugzilla/show_bug.cgi?id=3429

Changed in evolution:
assignee: desktop-bugs → nobody
Revision history for this message
Vytas (vytas) wrote :

Thanks for pointing it out Seb, yes I think it is the same bug. I use i386 though.

Revision history for this message
Sebastien Bacher (seb128) wrote :

the title mentions amd64 because the submitter of that bug is using that arch not because it's specific to it

Changed in glibc:
status: Unknown → Fix Released
Revision history for this message
In , Cvs-commit (cvs-commit) wrote :

Subject: Bug 3429

CVSROOT: /cvs/glibc
Module name: libc
Branch: glibc-2_5-branch
Changes by: <email address hidden> 2007-01-12 15:21:33

Modified files:
 . : ChangeLog
 elf : Makefile dl-close.c dl-open.c
Added files:
 elf : tst-thrlock.c

Log message:
 * elf/dl-close.c (_dl_close_worker): Renamed from _dl_close and
 split out locking and parameter checking.
 (_dl_close): Call _dl_close_worker after locking and checking.
 * elf/dl-open.c (_dl_open): Call _dl_close_worker instead of
 _dl_close.
 * elf/Makefile: Add rules to build and run tst-thrlock.
 * elf/tst-thrlock.c: New file.

 [BZ #3429]
 * elf/dl-open.c (dl_open_worker): Keep holding dl_load_lock until
 we are sure we do not need it anymore for _dl_close. Also move
 the asserts inside the lock region.
 Patch mostly by Suzuki <email address hidden>.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/ChangeLog.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.10362.2.7&r2=1.10362.2.8
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/elf/tst-thrlock.c.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=NONE&r2=1.2.4.1
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/elf/Makefile.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.315&r2=1.315.2.1
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/elf/dl-close.c.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.117&r2=1.117.2.1
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/elf/dl-open.c.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.128&r2=1.128.2.1

Revision history for this message
Simon Hausmann (shausman) wrote :

Is there any chance of getting the upstream fix into feisty?
I hit that assertion quite frequently (not in evolution though).

Matthias Klose (doko)
Changed in glibc:
assignee: nobody → doko
status: Unconfirmed → In Progress
status: In Progress → Fix Committed
Revision history for this message
Matthias Klose (doko) wrote :

fixed in glibc (2.5-0ubuntu12)

Changed in glibc:
status: Fix Committed → Fix Released
Revision history for this message
Thomas Smith (tgs-resc) wrote :

Some people are getting similar assertion failures still, on gutsy. Bug#146512 has a report.

Revision history for this message
In , Radford (radford) wrote :

I noticed this same message with glibc-2.10.1-2.x86_64. It happened after a
suspend when my disk was churning, so I suspect there's another race.

Revision history for this message
In , Drepper-fsp (drepper-fsp) wrote :

Stop reopening bugs. If you have something to report open a new bug. But not
if you're not providing real information like a reproducer.

Changed in glibc:
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.