Comment 41 for bug 16317

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Mon, 23 May 2005 15:50:40 +1200
From: "Martin Langhoff (CatalystIT)" <email address hidden>
To: <email address hidden>, Torsten Landschoff <email address hidden>
Subject: Re: Bug#304735: How stable is BDB?

Torsten wrote:
>> And, just to clarify, with the 2.1 series of slapd, DB_CONFIG tweaking
>> reduced the chance of lockups, but didn't remove them at all. In fact,
>> it was so trivial to get the whole thing locked up that at one point I
>> had a pair of shellscrips that did it quite reliably if run
>> concurrently. I'll see if I can find them.
>
> Hey, that would be really great for testing. Hope you can dig them up!

I did a bit research, and it is so trivial that it doesn't require even
a shellscript.

For any given directory with many users, I was trying to delete all the
accounts doing something along the lines of:

ldapsearch (bind options) (objectType=posixAccount) | grep '^dn ' |
xargs ldapdelete (bind options)

What you get is a deadlock: ldapsearch locks the search results until
it's done, so ldapdelete cannot delete anything. Funny enough,
ldapdelete doesn't time out, so without external intervention it'll just
hang there. As soon as you kill or cancel ldapdelete, all is back to
normal. Or at least that's what happens with LDBM and OpenLDAP 2.1.x on
as many environments as I've seen (Debian Sarge, various SuSE boxes).

With BDB ldapdelete succeeds, and the ldapsearch query leaves locks on
nonexistent objects behind. Searches die as soon as they come across the
ghost records. slapcat locks up. slapd does weird stuff.

When running these tests, I usually run a couple of "while(1) do slapcat
 > /dev/null done" and "while (1) do ldapsearch /pattern/ > dev/null
done" to ensure we have some concurrency.

And even if slapd doesn't misbehave immediately, when you stop slapd and
run dbstats on the database you see that there are locks. And there
should be none.

I haven't re-tested this with the latest slapd. I try to test it this
week as time allows. But I assume you guys have some sample LDAP data.

BTW, I spent a bit of my boring Sunday reading the OpenLDAP mailing list
archive. There are plenty of reports of BDB corruption, and people are
recommending that you run dbrecover as part of your slapd init script,
as db corruption and slapd lockups are frequent.

This is a sample msg:
http://www.openldap.org/lists/openldap-software/200505/msg00267.html

And I spotted a few complaining that the RH init scripts were faulty
because they didn't call db_recover. Hmmmm.

cheers,

martin
--
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)916-7224 MOB: +64(21)364-017
       Make things as simple as possible, but no simpler - Einstein
-----------------------------------------------------------------------