Bug #16317 “slapd 2.2.23 database corruption” : Bugs : openldap2.2 package : Ubuntu

Revision history for this message

In Debian Bug tracker #304735, Steve Langasek (vorlon) wrote on 2005-04-15: Re: Bug#304735: slapd 2.2.23 database corruption

#1

On Fri, Apr 15, 2005 at 12:09:19PM +0900, Christian Balzer wrote:
> This is basically the same as #303826 (why this got classified as normal
> and 2.2.23 got pushed into sarge is beyond me).

> I have a LARGE (>60k users) users/mailsettings database in LDAP,
> on two identical servers running sarge.
> They have been rock stable like that for over a year.
> Changes are generated by ldifsort.pl/ldifdiff.pl and then applied with
> ldapmodify for a low impact and smooth operation, using ldbm as
> backend.

> Since the update of slapd in sarge 2 days ago I have been getting
> an increasing number of reports of user settings "vanishing" from
> the system. As with #303826 a full dump of the DB WILL show that
> these records are present, but a specific search for them will fail.
> So this hints very much at index corruption of some sort, as a
> stop/start of slapd does not change things. However a delete/add
> of that entire record tends to fix things and so far it seems only
> records that were touched with modify have been affected.
> Unfortunately this is not deterministic in the least, while one
> slapd instance on one server will happily return the correct data
> for a specific query the other one might not or vice versa.

> I urge you (in case this can't be fixed in a time frame of 1-2 days)
> to back out this "update" and revert to the previous version.

The previous version of slapd *also* had corruption issues, and this is the
driving reason for putting slapd 2.2 in sarge.

Which LDAP backend are you using for this directory?

--
Steve Langasek
postmodern programmer

Revision history for this message

In Debian Bug tracker #304735, Christian Balzer (chibi) wrote on 2005-04-15:

#2

Steve Langasek wrote:

>On Fri, Apr 15, 2005 at 12:09:19PM +0900, Christian Balzer wrote:

>> Changes are generated by ldifsort.pl/ldifdiff.pl and then applied with
>> ldapmodify for a low impact and smooth operation, using ldbm as
>> backend.
>
>The previous version of slapd *also* had corruption issues, and this is the
>driving reason for putting slapd 2.2 in sarge.
>
I read that and I'm all for using current versions of software when
getting near to a Debian release.

Alas it's hard to contrast one year of trouble free operation with
the current state of affairs. A fix that breaks all the users which
until now had a perfectly working setup is, well, not a fix.
Or to put it quite blunt, people encountering DB corruption with
the previous version most likely did NOT run production systems with it.
Me and others on the other hand...

>Which LDAP backend are you using for this directory?
>
See above, LDBM (whatever actual DB that defaults to these days).

I loathe BDB for the times it takes for massive adds/modifies.
Even with slapadd, which takes about 2 minutes to load the entire DB
using ldbm as backend, but about 50 minutes with BDB.

Regards,

Christian Balzer
--
Christian Balzer Network/Systems Engineer NOC
<email address hidden> Global OnLine Japan/Fusion Network Services
http://www.gol.com/

Revision history for this message

In Debian Bug tracker #304735, Steve Langasek (vorlon) wrote on 2005-04-15:

#3

On Fri, Apr 15, 2005 at 01:30:33PM +0900, Christian Balzer wrote:
>
> Steve Langasek wrote:
>
> >On Fri, Apr 15, 2005 at 12:09:19PM +0900, Christian Balzer wrote:
>
> >> Changes are generated by ldifsort.pl/ldifdiff.pl and then applied with
> >> ldapmodify for a low impact and smooth operation, using ldbm as
> >> backend.

> >The previous version of slapd *also* had corruption issues, and this is the
> >driving reason for putting slapd 2.2 in sarge.

> I read that and I'm all for using current versions of software when
> getting near to a Debian release.

> Alas it's hard to contrast one year of trouble free operation with
> the current state of affairs. A fix that breaks all the users which
> until now had a perfectly working setup is, well, not a fix.
> Or to put it quite blunt, people encountering DB corruption with
> the previous version most likely did NOT run production systems with it.
> Me and others on the other hand...

> >Which LDAP backend are you using for this directory?

> See above, LDBM (whatever actual DB that defaults to these days).

Sorry, I missed that. I would strongly encourage you to switch to BDB,
which is the recommended backend for OpenLDAP 2.2; LDBM was more stable in
2.1 because BDB itself was *un*stable, but in 2.2, BDB is reportedly quite
solid whereas LDBM is less stable than it had been in 2.1.

> I loathe BDB for the times it takes for massive adds/modifies.
> Even with slapadd, which takes about 2 minutes to load the entire DB
> using ldbm as backend, but about 50 minutes with BDB.

OpenLDAP 2.2 includes a '-q' option to slapadd that makes the load time much
quicker by disabling checks that are unnecessary while loading a fresh db.
This option will be enabled by default on database reloads in the slapd
install scripts.

--
Steve Langasek
postmodern programmer

Revision history for this message

In Debian Bug tracker #304735, Christian Balzer (chibi) wrote on 2005-04-15:

#4

Steve Langasek wrote:
>On Fri, Apr 15, 2005 at 01:30:33PM +0900, Christian Balzer wrote:
[backend used]
>> See above, LDBM (whatever actual DB that defaults to these days).
>
>Sorry, I missed that. I would strongly encourage you to switch to BDB,
>which is the recommended backend for OpenLDAP 2.2; LDBM was more stable in
>2.1 because BDB itself was *un*stable, but in 2.2, BDB is reportedly quite
>solid whereas LDBM is less stable than it had been in 2.1.
>
Seeing that it hardly can get worse (I have been running BDB on a test
machine and that worked for the limited exposure it has), I changed the
2 servers over to BDB, something that I would have not done w/o the -q
switch in slapadd (all those BDB log files otherwise, argh).

I will monitor this over the weekend and see if the problem persists,
goes away or (heavens forbid) mutates.

Not matter the outcome of this though, the severity of this bug report
remains the same. Right now anybody with a working sarge or woody
LDAP installation will find themselves encountering mysterious
heisenbugs when upgrading to 2.2.23-1 (at the very least when using
LDBM). So unless the underlying problem can be fixed or the update
somehow enforces (it didn't even suggest it) BDB usage (always
assuming this actually fixes what I'm seeing here) we have a major
show stopper.

>> I loathe BDB for the times it takes for massive adds/modifies.
>> Even with slapadd, which takes about 2 minutes to load the entire DB
>> using ldbm as backend, but about 50 minutes with BDB.
>
>OpenLDAP 2.2 includes a '-q' option to slapadd that makes the load time much
>quicker by disabling checks that are unnecessary while loading a fresh db.
>This option will be enabled by default on database reloads in the slapd
>install scripts.
>
This sure helps (helped in my case) with a fresh load. I still dread to
see BDB performance in case I have something modifying or adding a large
number of entries in normal (ldapmodify) operation.
It tends to be about 2 times slower than LDBM with that.

Regards,

Christian Balzer
--
Christian Balzer Network/Systems Engineer NOC
<email address hidden> Global OnLine Japan/Fusion Network Services
http://www.gol.com/

Steve Langasek wrote:
>On Fri, Apr 15, 2005 at 01:30:33PM +0900, Christian Balzer wrote:
[backend used]
>> See above, LDBM (whatever actual DB that defaults to these days).
>
>Sorry, I missed that.  I would strongly encourage you to switch to BDB,
>which is the recommended backend for OpenLDAP 2.2; LDBM was more stable in
>2.1 because BDB itself was *un*stable, but in 2.2, BDB is reportedly quite
>solid whereas LDBM is less stable than it had been in 2.1.
>
Seeing that it hardly can get worse (I have been running BDB on a test
machine and that worked for the limited exposure it has), I changed the
2 servers over to BDB, something that I would have not done w/o the -q
switch in slapadd (all those BDB log files otherwise, argh).

I will monitor this over the weekend and see if the problem persists,
goes away or (heavens forbid) mutates.

Not matter the outcome of this though, the severity of this bug report
remains the same. Right now anybody with a working sarge or woody
LDAP installation will find themselves encountering mysterious 
heisenbugs when upgrading to 2.2.23-1 (at the very least when using
LDBM). So unless the underlying problem can be fixed or the update
somehow enforces (it didn't even suggest it) BDB usage (always 
assuming this actually fixes what I'm seeing here) we have a major
show stopper.

>> I loathe BDB for the times it takes for massive adds/modifies.
>> Even with slapadd, which takes about 2 minutes to load the entire DB
>> using ldbm as backend, but about 50 minutes with BDB.
>
>OpenLDAP 2.2 includes a '-q' option to slapadd that makes the load time much
>quicker by disabling checks that are unnecessary while loading a fresh db.
>This option will be enabled by default on database reloads in the slapd
>install scripts.
>
This sure helps (helped in my case) with a fresh load. I still dread to
see BDB performance in case I have something modifying or adding a large 
number of entries in normal (ldapmodify) operation.
It tends to be about 2 times slower than LDBM with that.

Regards,

Christian Balzer
-- 
Christian Balzer        Network/Systems Engineer                NOC
chibi@gol.com   	Global OnLine Japan/Fusion Network Services
http://www.gol.com/

Revision history for this message

In Debian Bug tracker #304735, Torsten Landschoff (torsten) wrote on 2005-04-15:

#5

On Thu, Apr 14, 2005 at 09:52:39PM -0700, Steve Langasek wrote:
> > I loathe BDB for the times it takes for massive adds/modifies.
> > Even with slapadd, which takes about 2 minutes to load the entire DB
> > using ldbm as backend, but about 50 minutes with BDB.
>
> OpenLDAP 2.2 includes a '-q' option to slapadd that makes the load time much
> quicker by disabling checks that are unnecessary while loading a fresh db.
> This option will be enabled by default on database reloads in the slapd
> install scripts.

This -q option does not really make a big difference. Speed normally
greatly improves when running slapadd with the option

set_flags DB_TXN_NOSYNC

in the DB_CONFIG file. That file is BTW essential for good bdb
operation as the defaults don't work for real directories.

Greetings

Torsten

Revision history for this message

In Debian Bug tracker #304735, Torsten Landschoff (torsten) wrote on 2005-04-15:

#6

On Fri, Apr 15, 2005 at 12:09:19PM +0900, Christian Balzer wrote:
> I urge you (in case this can't be fixed in a time frame of 1-2 days)
> to back out this "update" and revert to the previous version.

Not easy to do after this has propagated to testing.

> If this LDAP DB would be the canonical one and not fed from a SQL
> DB, I'd be out of a job by now instead of frantically fixing things
> with good data.

It's not a really big deal to install the old 2.1.x version. If you
don't have it anymore I can build a package for you or maybe I even
still have ig on my system.

Greetings

Torsten

Revision history for this message

In Debian Bug tracker #304735, Torsten Landschoff (torsten) wrote on 2005-04-15:

#7

Hi Christian,

On Fri, Apr 15, 2005 at 02:57:48PM +0900, Christian Balzer wrote:

> I will monitor this over the weekend and see if the problem persists,
> goes away or (heavens forbid) mutates.
Thanks.

> Not matter the outcome of this though, the severity of this bug report
> remains the same. Right now anybody with a working sarge or woody
> LDAP installation will find themselves encountering mysterious
> heisenbugs when upgrading to 2.2.23-1 (at the very least when using
> LDBM). So unless the underlying problem can be fixed or the update
> somehow enforces (it didn't even suggest it) BDB usage (always
> assuming this actually fixes what I'm seeing here) we have a major
> show stopper.

Fully agreed.

> This sure helps (helped in my case) with a fresh load. I still dread to
> see BDB performance in case I have something modifying or adding a large
> number of entries in normal (ldapmodify) operation.
> It tends to be about 2 times slower than LDBM with that.

Have you seen the comments about DB_CONFIG? For a directory as big as
yours it should really make a difference.

Greetings

Torsten

Revision history for this message

In Debian Bug tracker #304735, Christian Balzer (chibi) wrote on 2005-04-15:

#8

Hello,

just a quick reply to the 3 mails from Torsten.

a) will try to ride this out with BDB and slapd 2.2.23 for the moment
and make the call if this is working or not on Monday. So far no
corruption, but also just a few modify actions. If it fails as well,
I might indeed need an old package. ;P

b) I know of the DB_CONFIG stuff from other encounter with BDB (INN
overview) and the test runs with it for slapd. It gives me headaches,
but I'll look at it again. The slapd.conf cachesize is set to 1000000
and the servers are vastly overspec'ed in all aspects. So no problems
thus far.

c) the -q did indeed help (2 minutes instead of 43) because it suppressed
those pesky log.0000000001 files which really kill the BDB performance in
this scenario.

Regards,

Christian Balzer
--
Christian Balzer Network/Systems Engineer NOC
<email address hidden> Global OnLine Japan/Fusion Network Services
http://www.gol.com/

Revision history for this message

In Debian Bug tracker #304735, Christian Balzer (chibi) wrote on 2005-04-18:

#9

Hello,

Monday is nearly over here and neither today nor over the weekend
any corruption or inconsistencies were observed (and I checked
each record that was modified in the last 3 days).

So using BDB instead of LDBM indeed seems to have fixed things for
me.

I guess the choice as far as the Debian package is concerned is
now to either get a working LDBM backend from upstream or forcibly
migrate users away from LDBM when Sarge hits the limelight...

Even with the default 256KB cache of BDB things worked quite well
and db_stat -m showed pretty nice cache hit rates.
For the record and in case somebody wants to use this data, my
DB_CONFIG now reads like this (after many tests on my test server):
---
set_cachesize 0 134217728 1
set_flags DB_LOG_AUTOREMOVE
set_flags DB_TXN_NOSYNC
---
Yes, these servers have 2GB RAM and so I was very generous with the
cache. It helps quite a bit, that alone made full load with ldapadd
6 times faster. The DB_TXN_NOSYNC speeds that up another 8 times,
so instead of 53 minutes it takes 1 minute to load the entire LDIF.
Inserting it with slapcat -q now takes 22 seconds, I'm reminded of
the god ole ldif2ldbm days.
I know that DB_LOG_AUTOREMOVE doesn't work the way it should for the
moment, but here's hoping for the future. ;)

I'm unsure about DB_TXN_NOSYNC in production, basically only writing
out changes when the server gets shut down is somewhat hair raising.
OTOH it speeds up things and I never had either slapd or the whole
server crash. In which case I could create a good instance in the
22 seconds mentioned up there.

Regards,

Christian Balzer
--
Christian Balzer Network/Systems Engineer NOC
<email address hidden> Global OnLine Japan/Fusion Network Services
http://www.gol.com/

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-04-21:

#10

Automatically imported from Debian bug report #304735 http://bugs.debian.org/304735

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-04-21:

#11

Message-Id: <email address hidden>
Date: Fri, 15 Apr 2005 12:09:19 +0900
From: Christian Balzer <email address hidden>
To: <email address hidden>
Subject: slapd 2.2.23 database corruption

Package: slapd
Version: 2.2.23-1 (sarge)
Severity: critical

This is basically the same as #303826 (why this got classified as normal
and 2.2.23 got pushed into sarge is beyond me).

I have a LARGE (>60k users) users/mailsettings database in LDAP,
on two identical servers running sarge.
They have been rock stable like that for over a year.
Changes are generated by ldifsort.pl/ldifdiff.pl and then applied with
ldapmodify for a low impact and smooth operation, using ldbm as
backend.

Since the update of slapd in sarge 2 days ago I have been getting
an increasing number of reports of user settings "vanishing" from
the system. As with #303826 a full dump of the DB WILL show that
these records are present, but a specific search for them will fail.
So this hints very much at index corruption of some sort, as a
stop/start of slapd does not change things. However a delete/add
of that entire record tends to fix things and so far it seems only
records that were touched with modify have been affected.
Unfortunately this is not deterministic in the least, while one
slapd instance on one server will happily return the correct data
for a specific query the other one might not or vice versa.

I urge you (in case this can't be fixed in a time frame of 1-2 days)
to back out this "update" and revert to the previous version.

If this LDAP DB would be the canonical one and not fed from a SQL
DB, I'd be out of a job by now instead of frantically fixing things
with good data.

Caffeinated Greetings,

Christian Balzer
--
Christian Balzer Network/Systems Engineer NOC
<email address hidden> Global OnLine Japan/Fusion Network Services
http://www.gol.com/

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-04-21:

#12

Message-ID: <email address hidden>
Date: Thu, 14 Apr 2005 20:48:51 -0700
From: Steve Langasek <email address hidden>
To: Christian Balzer <email address hidden>, <email address hidden>
Subject: Re: Bug#304735: slapd 2.2.23 database corruption

--1SQmhf2mF2YjsYvc
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Apr 15, 2005 at 12:09:19PM +0900, Christian Balzer wrote:
> This is basically the same as #303826 (why this got classified as normal=
=20
> and 2.2.23 got pushed into sarge is beyond me).

> I have a LARGE (>60k users) users/mailsettings database in LDAP,=20
> on two identical servers running sarge.=20
> They have been rock stable like that for over a year.=20
> Changes are generated by ldifsort.pl/ldifdiff.pl and then applied with=20
> ldapmodify for a low impact and smooth operation, using ldbm as
> backend.

> Since the update of slapd in sarge 2 days ago I have been getting
> an increasing number of reports of user settings "vanishing" from
> the system. As with #303826 a full dump of the DB WILL show that=20
> these records are present, but a specific search for them will fail.
> So this hints very much at index corruption of some sort, as a
> stop/start of slapd does not change things. However a delete/add=20
> of that entire record tends to fix things and so far it seems only
> records that were touched with modify have been affected.
> Unfortunately this is not deterministic in the least, while one=20
> slapd instance on one server will happily return the correct data=20
> for a specific query the other one might not or vice versa.

> I urge you (in case this can't be fixed in a time frame of 1-2 days)
> to back out this "update" and revert to the previous version.

The previous version of slapd *also* had corruption issues, and this is the
driving reason for putting slapd 2.2 in sarge.

Which LDAP backend are you using for this directory?

--=20
Steve Langasek
postmodern programmer

--1SQmhf2mF2YjsYvc
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCXzmiKN6ufymYLloRAq9PAJ99HvYAWYFrMh1lT/qpI0QzJWQQdwCgywj+
4oWNHVNcTipPtF23IixcRu0=
=LJ2g
-----END PGP SIGNATURE-----

--1SQmhf2mF2YjsYvc--

Message-ID: <20050415034850.GD10429@mauritius.dodds.net>
Date: Thu, 14 Apr 2005 20:48:51 -0700
From: Steve Langasek <vorlon@debian.org>
To: Christian Balzer <chibi@gol.com>, 304735@bugs.debian.org
Subject: Re: Bug#304735: slapd 2.2.23 database corruption

--1SQmhf2mF2YjsYvc
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Apr 15, 2005 at 12:09:19PM +0900, Christian Balzer wrote:
> This is basically the same as #303826 (why this got classified as normal=
=20
> and 2.2.23 got pushed into sarge is beyond me).

> I have a LARGE (>60k users) users/mailsettings database in LDAP,=20
> on two identical servers running sarge.=20
> They have been rock stable like that for over a year.=20
> Changes are generated by ldifsort.pl/ldifdiff.pl and then applied with=20
> ldapmodify for a low impact and smooth operation, using ldbm as
> backend.

> Since the update of slapd in sarge 2 days ago I have been getting
> an increasing number of reports of user settings "vanishing" from
> the system. As with #303826 a full dump of the DB WILL show that=20
> these records are present, but a specific search for them will fail.
> So this hints very much at index corruption of some sort, as a
> stop/start of slapd does not change things. However a delete/add=20
> of that entire record tends to fix things and so far it seems only
> records that were touched with modify have been affected.
> Unfortunately this is not deterministic in the least, while one=20
> slapd instance on one server will happily return the correct data=20
> for a specific query the other one might not or vice versa.

> I urge you (in case this can't be fixed in a time frame of 1-2 days)
> to back out this "update" and revert to the previous version.

The previous version of slapd *also* had corruption issues, and this is the
driving reason for putting slapd 2.2 in sarge.

Which LDAP backend are you using for this directory?

--=20
Steve Langasek
postmodern programmer

--1SQmhf2mF2YjsYvc
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCXzmiKN6ufymYLloRAq9PAJ99HvYAWYFrMh1lT/qpI0QzJWQQdwCgywj+
4oWNHVNcTipPtF23IixcRu0=
=LJ2g
-----END PGP SIGNATURE-----

--1SQmhf2mF2YjsYvc--

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-04-21:

#13

Message-Id: <email address hidden>
Date: Fri, 15 Apr 2005 13:30:33 +0900
From: Christian Balzer <email address hidden>
To: Steve Langasek <email address hidden>, <email address hidden>
Subject: Re: Bug#304735: slapd 2.2.23 database corruption

Steve Langasek wrote:

>On Fri, Apr 15, 2005 at 12:09:19PM +0900, Christian Balzer wrote:

>> Changes are generated by ldifsort.pl/ldifdiff.pl and then applied with
>> ldapmodify for a low impact and smooth operation, using ldbm as
>> backend.
>
>The previous version of slapd *also* had corruption issues, and this is the
>driving reason for putting slapd 2.2 in sarge.
>
I read that and I'm all for using current versions of software when
getting near to a Debian release.

Alas it's hard to contrast one year of trouble free operation with
the current state of affairs. A fix that breaks all the users which
until now had a perfectly working setup is, well, not a fix.
Or to put it quite blunt, people encountering DB corruption with
the previous version most likely did NOT run production systems with it.
Me and others on the other hand...

>Which LDAP backend are you using for this directory?
>
See above, LDBM (whatever actual DB that defaults to these days).

I loathe BDB for the times it takes for massive adds/modifies.
Even with slapadd, which takes about 2 minutes to load the entire DB
using ldbm as backend, but about 50 minutes with BDB.

Regards,

Christian Balzer
--
Christian Balzer Network/Systems Engineer NOC
<email address hidden> Global OnLine Japan/Fusion Network Services
http://www.gol.com/

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-04-21:

#14

Message-ID: <email address hidden>
Date: Thu, 14 Apr 2005 21:52:39 -0700
From: Steve Langasek <email address hidden>
To: Christian Balzer <email address hidden>
Cc: <email address hidden>
Subject: Re: Bug#304735: slapd 2.2.23 database corruption

--VV4b6MQE+OnNyhkM
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Apr 15, 2005 at 01:30:33PM +0900, Christian Balzer wrote:
>=20
> Steve Langasek wrote:
>=20
> >On Fri, Apr 15, 2005 at 12:09:19PM +0900, Christian Balzer wrote:
>=20
> >> Changes are generated by ldifsort.pl/ldifdiff.pl and then applied with
> >> ldapmodify for a low impact and smooth operation, using ldbm as
> >> backend.

> >The previous version of slapd *also* had corruption issues, and this is =
the
> >driving reason for putting slapd 2.2 in sarge.

> I read that and I'm all for using current versions of software when
> getting near to a Debian release.

> Alas it's hard to contrast one year of trouble free operation with
> the current state of affairs. A fix that breaks all the users which
> until now had a perfectly working setup is, well, not a fix.
> Or to put it quite blunt, people encountering DB corruption with
> the previous version most likely did NOT run production systems with it.
> Me and others on the other hand...

> >Which LDAP backend are you using for this directory?

> See above, LDBM (whatever actual DB that defaults to these days).

Sorry, I missed that. I would strongly encourage you to switch to BDB,
which is the recommended backend for OpenLDAP 2.2; LDBM was more stable in
2.1 because BDB itself was *un*stable, but in 2.2, BDB is reportedly quite
solid whereas LDBM is less stable than it had been in 2.1.

> I loathe BDB for the times it takes for massive adds/modifies.
> Even with slapadd, which takes about 2 minutes to load the entire DB
> using ldbm as backend, but about 50 minutes with BDB.

OpenLDAP 2.2 includes a '-q' option to slapadd that makes the load time much
quicker by disabling checks that are unnecessary while loading a fresh db.
This option will be enabled by default on database reloads in the slapd
install scripts.

--=20
Steve Langasek
postmodern programmer

--VV4b6MQE+OnNyhkM
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCX0iWKN6ufymYLloRAhhqAKCuGNIgf7FR9lJp+96/fo9eZSRC1QCeNGq5
uSPBJ/PowK7x/+XmPxDQHz0=
=f1pd
-----END PGP SIGNATURE-----

--VV4b6MQE+OnNyhkM--

Message-ID: <20050415045238.GE10429@mauritius.dodds.net>
Date: Thu, 14 Apr 2005 21:52:39 -0700
From: Steve Langasek <vorlon@debian.org>
To: Christian Balzer <chibi@gol.com>
Cc: 304735@bugs.debian.org
Subject: Re: Bug#304735: slapd 2.2.23 database corruption

--VV4b6MQE+OnNyhkM
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Apr 15, 2005 at 01:30:33PM +0900, Christian Balzer wrote:
>=20
> Steve Langasek wrote:
>=20
> >On Fri, Apr 15, 2005 at 12:09:19PM +0900, Christian Balzer wrote:
>=20
> >> Changes are generated by ldifsort.pl/ldifdiff.pl and then applied with
> >> ldapmodify for a low impact and smooth operation, using ldbm as
> >> backend.

> >The previous version of slapd *also* had corruption issues, and this is =
the
> >driving reason for putting slapd 2.2 in sarge.

> I read that and I'm all for using current versions of software when
> getting near to a Debian release.

> Alas it's hard to contrast one year of trouble free operation with
> the current state of affairs. A fix that breaks all the users which
> until now had a perfectly working setup is, well, not a fix.
> Or to put it quite blunt, people encountering DB corruption with
> the previous version most likely did NOT run production systems with it.
> Me and others on the other hand...

> >Which LDAP backend are you using for this directory?

> See above, LDBM (whatever actual DB that defaults to these days).

Sorry, I missed that.  I would strongly encourage you to switch to BDB,
which is the recommended backend for OpenLDAP 2.2; LDBM was more stable in
2.1 because BDB itself was *un*stable, but in 2.2, BDB is reportedly quite
solid whereas LDBM is less stable than it had been in 2.1.

> I loathe BDB for the times it takes for massive adds/modifies.
> Even with slapadd, which takes about 2 minutes to load the entire DB
> using ldbm as backend, but about 50 minutes with BDB.

OpenLDAP 2.2 includes a '-q' option to slapadd that makes the load time much
quicker by disabling checks that are unnecessary while loading a fresh db.
This option will be enabled by default on database reloads in the slapd
install scripts.

--=20
Steve Langasek
postmodern programmer

--VV4b6MQE+OnNyhkM
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCX0iWKN6ufymYLloRAhhqAKCuGNIgf7FR9lJp+96/fo9eZSRC1QCeNGq5
uSPBJ/PowK7x/+XmPxDQHz0=
=f1pd
-----END PGP SIGNATURE-----

--VV4b6MQE+OnNyhkM--

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-04-21:

#15

Message-Id: <email address hidden>
Date: Fri, 15 Apr 2005 14:57:48 +0900
From: Christian Balzer <email address hidden>
To: Steve Langasek <email address hidden>, <email address hidden>
Subject: Re: Bug#304735: slapd 2.2.23 database corruption

Steve Langasek wrote:
>On Fri, Apr 15, 2005 at 01:30:33PM +0900, Christian Balzer wrote:
[backend used]
>> See above, LDBM (whatever actual DB that defaults to these days).
>
>Sorry, I missed that. I would strongly encourage you to switch to BDB,
>which is the recommended backend for OpenLDAP 2.2; LDBM was more stable in
>2.1 because BDB itself was *un*stable, but in 2.2, BDB is reportedly quite
>solid whereas LDBM is less stable than it had been in 2.1.
>
Seeing that it hardly can get worse (I have been running BDB on a test
machine and that worked for the limited exposure it has), I changed the
2 servers over to BDB, something that I would have not done w/o the -q
switch in slapadd (all those BDB log files otherwise, argh).

I will monitor this over the weekend and see if the problem persists,
goes away or (heavens forbid) mutates.

Not matter the outcome of this though, the severity of this bug report
remains the same. Right now anybody with a working sarge or woody
LDAP installation will find themselves encountering mysterious
heisenbugs when upgrading to 2.2.23-1 (at the very least when using
LDBM). So unless the underlying problem can be fixed or the update
somehow enforces (it didn't even suggest it) BDB usage (always
assuming this actually fixes what I'm seeing here) we have a major
show stopper.

>> I loathe BDB for the times it takes for massive adds/modifies.
>> Even with slapadd, which takes about 2 minutes to load the entire DB
>> using ldbm as backend, but about 50 minutes with BDB.
>
>OpenLDAP 2.2 includes a '-q' option to slapadd that makes the load time much
>quicker by disabling checks that are unnecessary while loading a fresh db.
>This option will be enabled by default on database reloads in the slapd
>install scripts.
>
This sure helps (helped in my case) with a fresh load. I still dread to
see BDB performance in case I have something modifying or adding a large
number of entries in normal (ldapmodify) operation.
It tends to be about 2 times slower than LDBM with that.

Regards,

Christian Balzer
--
Christian Balzer Network/Systems Engineer NOC
<email address hidden> Global OnLine Japan/Fusion Network Services
http://www.gol.com/

Message-Id: <E1DMJps-0006ow-00@batzmaru.gol.ad.jp>
Date: Fri, 15 Apr 2005 14:57:48 +0900
From: Christian Balzer <chibi@gol.com>
To: Steve Langasek <vorlon@debian.org>, 304735@bugs.debian.org
Subject: Re: Bug#304735: slapd 2.2.23 database corruption

Steve Langasek wrote:
>On Fri, Apr 15, 2005 at 01:30:33PM +0900, Christian Balzer wrote:
[backend used]
>> See above, LDBM (whatever actual DB that defaults to these days).
>
>Sorry, I missed that.  I would strongly encourage you to switch to BDB,
>which is the recommended backend for OpenLDAP 2.2; LDBM was more stable in
>2.1 because BDB itself was *un*stable, but in 2.2, BDB is reportedly quite
>solid whereas LDBM is less stable than it had been in 2.1.
>
Seeing that it hardly can get worse (I have been running BDB on a test
machine and that worked for the limited exposure it has), I changed the
2 servers over to BDB, something that I would have not done w/o the -q
switch in slapadd (all those BDB log files otherwise, argh).

I will monitor this over the weekend and see if the problem persists,
goes away or (heavens forbid) mutates.

Not matter the outcome of this though, the severity of this bug report
remains the same. Right now anybody with a working sarge or woody
LDAP installation will find themselves encountering mysterious 
heisenbugs when upgrading to 2.2.23-1 (at the very least when using
LDBM). So unless the underlying problem can be fixed or the update
somehow enforces (it didn't even suggest it) BDB usage (always 
assuming this actually fixes what I'm seeing here) we have a major
show stopper.

>> I loathe BDB for the times it takes for massive adds/modifies.
>> Even with slapadd, which takes about 2 minutes to load the entire DB
>> using ldbm as backend, but about 50 minutes with BDB.
>
>OpenLDAP 2.2 includes a '-q' option to slapadd that makes the load time much
>quicker by disabling checks that are unnecessary while loading a fresh db.
>This option will be enabled by default on database reloads in the slapd
>install scripts.
>
This sure helps (helped in my case) with a fresh load. I still dread to
see BDB performance in case I have something modifying or adding a large 
number of entries in normal (ldapmodify) operation.
It tends to be about 2 times slower than LDBM with that.

Regards,

Christian Balzer
-- 
Christian Balzer        Network/Systems Engineer                NOC
chibi@gol.com   	Global OnLine Japan/Fusion Network Services
http://www.gol.com/

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-04-21:

#16

Message-ID: <email address hidden>
Date: Fri, 15 Apr 2005 09:36:36 +0200
From: Torsten Landschoff <email address hidden>
To: Steve Langasek <email address hidden>, <email address hidden>
Cc: Christian Balzer <email address hidden>
Subject: Re: Bug#304735: slapd 2.2.23 database corruption

--JYK4vJDZwFMowpUq
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Apr 14, 2005 at 09:52:39PM -0700, Steve Langasek wrote:
> > I loathe BDB for the times it takes for massive adds/modifies.
> > Even with slapadd, which takes about 2 minutes to load the entire DB
> > using ldbm as backend, but about 50 minutes with BDB.
>=20
> OpenLDAP 2.2 includes a '-q' option to slapadd that makes the load time m=
uch
> quicker by disabling checks that are unnecessary while loading a fresh db.
> This option will be enabled by default on database reloads in the slapd
> install scripts.

This -q option does not really make a big difference. Speed normally
greatly improves when running slapadd with the option

set_flags DB_TXN_NOSYNC

in the DB_CONFIG file. That file is BTW essential for good bdb
operation as the defaults don't work for real directories.=20

Greetings

Torsten

--JYK4vJDZwFMowpUq
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCX28DdQgHtVUb5EcRAiaQAJ0cjScx00uIyPRKsUFZDHrqUiLVHQCdENTW
saLjA7nHCGq9yrbGWv+e0Sc=
=ziWH
-----END PGP SIGNATURE-----

--JYK4vJDZwFMowpUq--

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-04-21:

#17

Message-ID: <email address hidden>
Date: Fri, 15 Apr 2005 09:40:03 +0200
From: Torsten Landschoff <email address hidden>
To: Christian Balzer <email address hidden>, <email address hidden>
Subject: Re: Bug#304735: slapd 2.2.23 database corruption

--hYooF8G/hrfVAmum
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Apr 15, 2005 at 12:09:19PM +0900, Christian Balzer wrote:
> I urge you (in case this can't be fixed in a time frame of 1-2 days)
> to back out this "update" and revert to the previous version.

Not easy to do after this has propagated to testing.=20

> If this LDAP DB would be the canonical one and not fed from a SQL
> DB, I'd be out of a job by now instead of frantically fixing things
> with good data.

It's not a really big deal to install the old 2.1.x version. If you
don't have it anymore I can build a package for you or maybe I even
still have ig on my system.

Greetings

Torsten

--hYooF8G/hrfVAmum
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCX2/TdQgHtVUb5EcRAlsRAJ0d9go7LDSCAdDYtFWmHS2DONzhCwCfTXF1
Jt3Lrja9N5mzTLJTipRsWeU=
=bxn0
-----END PGP SIGNATURE-----

--hYooF8G/hrfVAmum--

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-04-21:

#18

Message-ID: <email address hidden>
Date: Fri, 15 Apr 2005 09:42:11 +0200
From: Torsten Landschoff <email address hidden>
To: Christian Balzer <email address hidden>, <email address hidden>
Subject: Re: Bug#304735: slapd 2.2.23 database corruption

--g7w8+K/95kPelPD2
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi Christian,=20

On Fri, Apr 15, 2005 at 02:57:48PM +0900, Christian Balzer wrote:
=20
> I will monitor this over the weekend and see if the problem persists,
> goes away or (heavens forbid) mutates.=20
Thanks.

> Not matter the outcome of this though, the severity of this bug report
> remains the same. Right now anybody with a working sarge or woody
> LDAP installation will find themselves encountering mysterious=20
> heisenbugs when upgrading to 2.2.23-1 (at the very least when using
> LDBM). So unless the underlying problem can be fixed or the update
> somehow enforces (it didn't even suggest it) BDB usage (always=20
> assuming this actually fixes what I'm seeing here) we have a major
> show stopper.

Fully agreed.=20

> This sure helps (helped in my case) with a fresh load. I still dread to
> see BDB performance in case I have something modifying or adding a large=
=20
> number of entries in normal (ldapmodify) operation.
> It tends to be about 2 times slower than LDBM with that.

Have you seen the comments about DB_CONFIG? For a directory as big as
yours it should really make a difference.

Greetings

Torsten

--g7w8+K/95kPelPD2
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCX3BTdQgHtVUb5EcRAhs/AJ92cE6wXay5aAcJZjHc4yxKQlVowgCfWr/d
en3CutTUQe/EZ9/pCsia/GU=
=Eg4b
-----END PGP SIGNATURE-----

--g7w8+K/95kPelPD2--

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-04-21:

#19

Message-Id: <email address hidden>
Date: Fri, 15 Apr 2005 18:12:03 +0900
From: Christian Balzer <email address hidden>
To: Torsten Landschoff <email address hidden>
cc: Steve Langasek <email address hidden>, <email address hidden>
Subject: Re: Bug#304735: slapd 2.2.23 database corruption

Hello,

just a quick reply to the 3 mails from Torsten.

a) will try to ride this out with BDB and slapd 2.2.23 for the moment
and make the call if this is working or not on Monday. So far no
corruption, but also just a few modify actions. If it fails as well,
I might indeed need an old package. ;P

b) I know of the DB_CONFIG stuff from other encounter with BDB (INN
overview) and the test runs with it for slapd. It gives me headaches,
but I'll look at it again. The slapd.conf cachesize is set to 1000000
and the servers are vastly overspec'ed in all aspects. So no problems
thus far.

c) the -q did indeed help (2 minutes instead of 43) because it suppressed
those pesky log.0000000001 files which really kill the BDB performance in
this scenario.

Regards,

Christian Balzer
--
Christian Balzer Network/Systems Engineer NOC
<email address hidden> Global OnLine Japan/Fusion Network Services
http://www.gol.com/

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-04-21:

#20

Message-Id: <email address hidden>
Date: Mon, 18 Apr 2005 17:57:58 +0900
From: Christian Balzer <email address hidden>
To: Torsten Landschoff <email address hidden>
cc: Steve Langasek <email address hidden>, <email address hidden>
Subject: Re: Bug#304735: slapd 2.2.23 database corruption

Hello,

Monday is nearly over here and neither today nor over the weekend
any corruption or inconsistencies were observed (and I checked
each record that was modified in the last 3 days).

So using BDB instead of LDBM indeed seems to have fixed things for
me.

I guess the choice as far as the Debian package is concerned is
now to either get a working LDBM backend from upstream or forcibly
migrate users away from LDBM when Sarge hits the limelight...

Even with the default 256KB cache of BDB things worked quite well
and db_stat -m showed pretty nice cache hit rates.
For the record and in case somebody wants to use this data, my
DB_CONFIG now reads like this (after many tests on my test server):
---
set_cachesize 0 134217728 1
set_flags DB_LOG_AUTOREMOVE
set_flags DB_TXN_NOSYNC
---
Yes, these servers have 2GB RAM and so I was very generous with the
cache. It helps quite a bit, that alone made full load with ldapadd
6 times faster. The DB_TXN_NOSYNC speeds that up another 8 times,
so instead of 53 minutes it takes 1 minute to load the entire LDIF.
Inserting it with slapcat -q now takes 22 seconds, I'm reminded of
the god ole ldif2ldbm days.
I know that DB_LOG_AUTOREMOVE doesn't work the way it should for the
moment, but here's hoping for the future. ;)

I'm unsure about DB_TXN_NOSYNC in production, basically only writing
out changes when the server gets shut down is somewhat hair raising.
OTOH it speeds up things and I never had either slapd or the whole
server crash. In which case I could create a good instance in the
22 seconds mentioned up there.

Regards,

Christian Balzer
--
Christian Balzer Network/Systems Engineer NOC
<email address hidden> Global OnLine Japan/Fusion Network Services
http://www.gol.com/

Message-Id: <E1DNS4s-0002BU-00@batzmaru.gol.ad.jp>
Date: Mon, 18 Apr 2005 17:57:58 +0900
From: Christian Balzer <chibi@gol.com>
To: Torsten Landschoff <torsten@debian.org>
cc: Steve Langasek <vorlon@debian.org>, 304735@bugs.debian.org
Subject: Re: Bug#304735: slapd 2.2.23 database corruption

Hello,

Monday is nearly over here and neither today nor over the weekend
any corruption or inconsistencies were observed (and I checked
each record that was modified in the last 3 days).

So using BDB instead of LDBM indeed seems to have fixed things for
me.

I guess the choice as far as the Debian package is concerned is
now to either get a working LDBM backend from upstream or forcibly
migrate users away from LDBM when Sarge hits the limelight...

Even with the default 256KB cache of BDB things worked quite well
and db_stat -m showed pretty nice cache hit rates.
For the record and in case somebody wants to use this data, my
DB_CONFIG now reads like this (after many tests on my test server):
---
set_cachesize 0 134217728 1
set_flags DB_LOG_AUTOREMOVE
set_flags DB_TXN_NOSYNC
---
Yes, these servers have 2GB RAM and so I was very generous with the
cache. It helps quite a bit, that alone made full load with ldapadd
6 times faster. The DB_TXN_NOSYNC speeds that up another 8 times,
so instead of 53 minutes it takes 1 minute to load the entire LDIF.
Inserting it with slapcat -q now takes 22 seconds, I'm reminded of
the god ole ldif2ldbm days. 
I know that DB_LOG_AUTOREMOVE doesn't work the way it should for the 
moment, but here's hoping for the future. ;)

I'm unsure about DB_TXN_NOSYNC in production, basically only writing
out changes when the server gets shut down is somewhat hair raising.
OTOH it speeds up things and I never had either slapd or the whole
server crash. In which case I could create a good instance in the
22 seconds mentioned up there.

Regards,

Christian Balzer
-- 
Christian Balzer        Network/Systems Engineer                NOC
chibi@gol.com   	Global OnLine Japan/Fusion Network Services
http://www.gol.com/

Revision history for this message

In Debian Bug tracker #304735, juan (bugs-niluje) wrote on 2005-05-15: slapd: data corruption with LDBM backend

#21

Package: slapd
Version: 2.2.23-1
Followup-For: Bug #304735

We have been running slapd 2.1 for over 2 years without any problems, when upgrading to 2.2 we had serious / random data loss : entries were missing but visible with a slapcat.
After migrating to BDB (it's slower), no data loss was noticed for over 2 weeks.

I suggest you mark the ldbm backend as broken.

-- System Information:
Debian Release: 3.1
APT prefers testing
APT policy: (500, 'testing')
Architecture: i386 (i686)
Kernel: Linux 2.6.11
Locale: LANG=C, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)

Versions of packages slapd depends on:
ii coreutils [fileutils] 5.2.1-2 The GNU core utilities
ii debconf 1.4.30.13 Debian configuration management sy
ii fileutils 5.2.1-2 The GNU file management utilities
ii libc6 2.3.2.ds1-21 GNU C Library: Shared libraries an
ii libdb4.2 4.2.52-18 Berkeley v4.2 Database Libraries [
ii libiodbc2 3.52.2-3 iODBC Driver Manager
ii libldap-2.2-7 2.2.23-1 OpenLDAP libraries
ii libltdl3 1.5.6-6 A system independent dlopen wrappe
ii libperl5.8 5.8.4-8 Shared Perl library
ii libsasl2 2.1.19-1.5 Authentication abstraction library
ii libslp1 1.0.11a-2 OpenSLP libraries
ii libssl0.9.7 0.9.7e-3 SSL shared libraries
ii libwrap0 7.6.dbs-8 Wietse Venema's TCP wrappers libra
ii perl [libmime-base64-perl] 5.8.4-8 Larry Wall's Practical Extraction
ii psmisc 21.5-1 Utilities that use the proc filesy

Revision history for this message

In Debian Bug tracker #304735, martinlanghoff (martin-catalyst) wrote on 2005-05-21: How stable is BDB?

#22

As recently as November 2004, I was seeing serious lockups and dataloss
with BDB backends, due to upstream bugs in the BDB integration, and all
our LDAP setups ended up using LDBM due to reliability concerns.

These BDB reliability concerns are tracked in Bug #190165
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=190165

And look at the pile of bugs indicating slapd lockups when using BDB:
http://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=slapd

Yes, these are old bugs, but they are still open. Is there any
indication from upstream that the problem is fixed?

Now, with LDBM broken as well, I am not sure what to do, really.
Switching to BDB is really risky -- I haven't seen the it work reliably
at all. It has been severely broken in every version I tried in the
2.0.x and 2.2.x series of OpenLDAP, both from OpenLDAP and from the
corresponding Debian packages.

regards,

martin
--
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)916-7224 MOB: +64(21)364-017
Make things as simple as possible, but no simpler - Einstein
-----------------------------------------------------------------------

Revision history for this message

In Debian Bug tracker #304735, Torsten Landschoff (torsten) wrote on 2005-05-21: Re: Bug#304735: How stable is BDB?

#23

On Sat, May 21, 2005 at 04:58:13PM +1200, Martin Langhoff (CatalystIT) wrote:
> As recently as November 2004, I was seeing serious lockups and dataloss
> with BDB backends, due to upstream bugs in the BDB integration, and all
> our LDAP setups ended up using LDBM due to reliability concerns.

Understandably.

> These BDB reliability concerns are tracked in Bug #190165
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=190165
>
> And look at the pile of bugs indicating slapd lockups when using BDB:
> http://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=slapd

*sigh* Yes.

> Yes, these are old bugs, but they are still open. Is there any
> indication from upstream that the problem is fixed?

As I am not running real production systems I have never seen these
lockups. (I used to but that system is still running woody with an aged
version of OpenLDAP and I am no longer responsible). Therefore I can't
really acknowledge that. Given the number of yields in the code which
seem to work around locking problems I don't have a good feeling...

> Now, with LDBM broken as well, I am not sure what to do, really.
> Switching to BDB is really risky -- I haven't seen the it work reliably
> at all. It has been severely broken in every version I tried in the
> 2.0.x and 2.2.x series of OpenLDAP, both from OpenLDAP and from the
> corresponding Debian packages.

As Stanford is using it I'd expect it to be reliable enough for
production use. So going to BDB is probably the best bet.

Greetings

Torsten

Revision history for this message

In Debian Bug tracker #304735, Steve Langasek (vorlon) wrote on 2005-05-21:

#24

On Sat, May 21, 2005 at 04:58:13PM +1200, Martin Langhoff (CatalystIT) wrote:
> As recently as November 2004, I was seeing serious lockups and dataloss
> with BDB backends, due to upstream bugs in the BDB integration, and all
> our LDAP setups ended up using LDBM due to reliability concerns.

> These BDB reliability concerns are tracked in Bug #190165
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=190165

> And look at the pile of bugs indicating slapd lockups when using BDB:
> http://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=slapd

> Yes, these are old bugs, but they are still open. Is there any
> indication from upstream that the problem is fixed?

> Now, with LDBM broken as well, I am not sure what to do, really.
> Switching to BDB is really risky -- I haven't seen the it work reliably
> at all. It has been severely broken in every version I tried in the
> 2.0.x and 2.2.x series of OpenLDAP, both from OpenLDAP and from the
> corresponding Debian packages.

In 2.2, LDBM is a royal mess; even in 2.1, AIUI, there were too many
problems to consider it releasable.

And whereas BDB was unusable in the 2.1 packages, it's reported to be vastly
improved in 2.2 when using the correct version of libdb. The remaining
corruption bug that is listed in the BTS as applying to 2.2 has at its root
a misconfigured server; the bug is still RC because a lack of performance
tuning shouldn't result in database corruption, but its impact appears to be
minimal unless your server is in an unusable state anyway.

--
Steve Langasek
postmodern programmer

Revision history for this message

In Debian Bug tracker #304735, Micah Anderson (micah-debian) wrote on 2005-05-22: DB_TXN_NOSYN

#25

Setting DB_TXN_NOSYN in DB_CONFIG when loading your database is a good
thing to do to speed up performance, this is because writes will not
be flushed or logs written on transaction commit. If you *leave this
setting set* and your application fails, or you loose power on your
system, it is possible some number of the most recently committed
transactions may be undone during recovery. The number of transactions
at risk is governed by how many log updates can fit into the log
buffer, how often the operating system flushes dirty buffers to disk,
and how often the log is checkpointed. Even worse, if there is
database or transaction log file loss or corruption (for example, if a
disk drive fails), then catastrophic recovery (read: from backups) is
necessary, and Berkeley DB recovery will only be able to restore the
system to the state of the last archived log file. In this case,
information may also have been lost.

After your data has been loaded, DB_TXN_NOSYN should *be removed* from
DB_CONFIG and slapd restarted. Although many parameters in DB_CONFIG
(such as cachesize) can only be set once as they apply to the creation
of the database environment and to change them you need to destroy and
recreate it, the DB_TXN_NOSYNC flag can be changed between runs.

Setting DB_TXN_NOSYNC should *only* be done to increase performance at
the cost of sacrificing transactional durability, except in replicated
environments (see http://www.sleepycat.com/docs/ref/rep/trans.html).

What many people do (if you search the openldap mailing list you will
confirm this), is to set the following in the DB_CONFIG:

# Just use these settings when doing slapadd, comment them out
# and reload slapd afterwards, or risk data loss
#set_flags DB_TXN_NOSYNC
#set_flags DB_TXN_NOT_DURABLE

this way they are commented out, and there is a warning. When you go
and do your slapadd you uncomment them, fire things up and then
slapadd. When you are done you comment them out and reload slapd. You
*can* leave these enabled, but you are risking data loss.

See for reference:
http://www.openldap.org/faq/index.cgi?_highlightWords=db_txn_nosync&file=1072
http://www.openldap.org/faq/index.cgi?_highlightWords=db_txn_nosync&file=893

It is also highly recommended that you tune your BDB environment
according to your needs or you will experience slowdown, see the
following:
http://www.stanford.edu/services/directory/openldap/configuration/bdb-config-42.html
http://www.openldap.org/faq/data/cache/1075.html

Micah

Setting DB_TXN_NOSYN in DB_CONFIG when loading your database is a good
thing to do to speed up performance, this is because writes will not
be flushed or logs written on transaction commit. If you *leave this
setting set* and your application fails, or you loose power on your
system, it is possible some number of the most recently committed
transactions may be undone during recovery. The number of transactions
at risk is governed by how many log updates can fit into the log
buffer, how often the operating system flushes dirty buffers to disk,
and how often the log is checkpointed. Even worse, if there is
database or transaction log file loss or corruption (for example, if a
disk drive fails), then catastrophic recovery (read: from backups) is
necessary, and Berkeley DB recovery will only be able to restore the
system to the state of the last archived log file. In this case,
information may also have been lost.

After your data has been loaded, DB_TXN_NOSYN should *be removed* from
DB_CONFIG and slapd restarted. Although many parameters in DB_CONFIG
(such as cachesize) can only be set once as they apply to the creation
of the database environment and to change them you need to destroy and
recreate it, the DB_TXN_NOSYNC flag can be changed between runs.

Setting DB_TXN_NOSYNC should *only* be done to increase performance at
the cost of sacrificing transactional durability, except in replicated
environments (see http://www.sleepycat.com/docs/ref/rep/trans.html).

What many people do (if you search the openldap mailing list you will
confirm this), is to set the following in the DB_CONFIG:

# Just use these settings when doing slapadd, comment them out
# and reload slapd afterwards, or risk data loss
#set_flags DB_TXN_NOSYNC
#set_flags DB_TXN_NOT_DURABLE

this way they are commented out, and there is a warning. When you go
and do your slapadd you uncomment them, fire things up and then
slapadd. When you are done you comment them out and reload slapd. You
*can* leave these enabled, but you are risking data loss.

See for reference: 
http://www.openldap.org/faq/index.cgi?_highlightWords=db_txn_nosync&file=1072
http://www.openldap.org/faq/index.cgi?_highlightWords=db_txn_nosync&file=893

It is also highly recommended that you tune your BDB environment
according to your needs or you will experience slowdown, see the
following:
http://www.stanford.edu/services/directory/openldap/configuration/bdb-config-42.html
http://www.openldap.org/faq/data/cache/1075.html

Micah

Revision history for this message

In Debian Bug tracker #304735, martinlanghoff (martin-catalyst) wrote on 2005-05-22: Re: Bug#304735: How stable is BDB?

#26

Steve Langasek wrote:
> In 2.2, LDBM is a royal mess; even in 2.1, AIUI, there were too many
> problems to consider it releasable.

I've gone back to my servers, to check exact versions of what I am running.

On my Opteron running a Debian-amd64 Sarge, it seems I stuck with slapd
2.1.30-3, which was awfully broken with BDB, but hasn't seen a single
problem with LDBM. The installed BDB packages are for 4.2.52-17 and
never worked correctly. This is rock solid with 90K user accounts and
35K groups on LDBM, it has been running for 6 months with very
aggressive scripts performing daily updates.

My x86 LDAP server is running slapd 2.1.30-3. BDB is 4.2.52-17 as well.
Identical problems with BDB. Rock solid with LDBM.

> And whereas BDB was unusable in the 2.1 packages, it's reported to be vastly
> improved in 2.2 when using the correct version of libdb.

Steve, do you maintain any slapd's under heavy usage? Is anyone running
2.2 with BDB in production with large trees? How stable is it in real life?

It sounds like a really risky move to drop LDBM support which has been
solid (and faster) for years for a BDB support that has been touted as
stable for one or two years, while having trivial crashes.

Not your fault at all -- but I am starting to lose faith in upstream's
definition of "stable".

> The remaining
> corruption bug that is listed in the BTS as applying to 2.2 has at its root
> a misconfigured server; the bug is still RC because a lack of performance
> tuning shouldn't result in database corruption, but its impact appears to be
> minimal unless your server is in an unusable state anyway.

DB corruption upon missing DB_CONFIG was a BDB bug, *and* starting with
4.3 the BDB people had stated that they had fixed things so that a
missing DB_CONFIG file did not lead to corruption. As you would expect,
the defaults were to be set to safe (if slow/untuned) values.

It should not be happening any more, and if it is, then using BDB sounds
like a mistake.

And, just to clarify, with the 2.1 series of slapd, DB_CONFIG tweaking
reduced the chance of lockups, but didn't remove them at all. In fact,
it was so trivial to get the whole thing locked up that at one point I
had a pair of shellscrips that did it quite reliably if run
concurrently. I'll see if I can find them.

regards,

martin
--
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)916-7224 MOB: +64(21)364-017
Make things as simple as possible, but no simpler - Einstein
-----------------------------------------------------------------------

Steve Langasek wrote:
> In 2.2, LDBM is a royal mess; even in 2.1, AIUI, there were too many
> problems to consider it releasable.

I've gone back to my servers, to check exact versions of what I am running.

On my Opteron running a Debian-amd64 Sarge, it seems I stuck with slapd 
2.1.30-3, which was awfully broken with BDB, but hasn't seen a single 
problem with LDBM. The installed BDB packages are for 4.2.52-17 and 
never worked correctly. This is rock solid with 90K user accounts and 
35K groups on LDBM, it has been running for 6 months with very 
aggressive scripts performing daily updates.

My x86 LDAP server is running slapd 2.1.30-3. BDB is 4.2.52-17 as well. 
Identical problems with BDB. Rock solid with LDBM.

> And whereas BDB was unusable in the 2.1 packages, it's reported to be vastly
> improved in 2.2 when using the correct version of libdb.

Steve, do you maintain any slapd's under heavy usage? Is anyone running 
2.2 with BDB in production with large trees? How stable is it in real life?

It sounds like a really risky move to drop LDBM support which has been 
solid (and faster) for years for a BDB support that has been touted as 
stable for one or two years, while having trivial crashes.

Not your fault at all -- but I am starting to lose faith in upstream's 
definition of "stable".

> The remaining
> corruption bug that is listed in the BTS as applying to 2.2 has at its root
> a misconfigured server; the bug is still RC because a lack of performance
> tuning shouldn't result in database corruption, but its impact appears to be
> minimal unless your server is in an unusable state anyway.

DB corruption upon missing DB_CONFIG was a BDB bug, *and* starting with 
4.3 the BDB people had stated that they had fixed things so that a 
missing DB_CONFIG file did not lead to corruption. As you would expect, 
the defaults were to be set to safe (if slow/untuned) values.

It should not be happening any more, and if it is, then using BDB sounds 
like a mistake.

And, just to clarify, with the 2.1 series of slapd, DB_CONFIG tweaking 
reduced the chance of lockups, but didn't remove them at all. In fact, 
it was so trivial to get the whole thing locked up that at one point I 
had a pair of shellscrips that did it quite reliably if run 
concurrently. I'll see if I can find them.

regards,

martin
-- 
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ  Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/           PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)916-7224                              MOB: +64(21)364-017
       Make things as simple as possible, but no simpler - Einstein
-----------------------------------------------------------------------

Revision history for this message

In Debian Bug tracker #304735, Torsten Landschoff (torsten) wrote on 2005-05-22:

#27

On Sun, May 22, 2005 at 10:15:19PM +1200, Martin Langhoff (CatalystIT) wrote:

> And, just to clarify, with the 2.1 series of slapd, DB_CONFIG tweaking
> reduced the chance of lockups, but didn't remove them at all. In fact,
> it was so trivial to get the whole thing locked up that at one point I
> had a pair of shellscrips that did it quite reliably if run
> concurrently. I'll see if I can find them.

Hey, that would be really great for testing. Hope you can dig them up!

Greetings

Torsten

Revision history for this message

In Debian Bug tracker #304735, martinlanghoff (martin-catalyst) wrote on 2005-05-23:

#28

Torsten wrote:
>> And, just to clarify, with the 2.1 series of slapd, DB_CONFIG tweaking
>> reduced the chance of lockups, but didn't remove them at all. In fact,
>> it was so trivial to get the whole thing locked up that at one point I
>> had a pair of shellscrips that did it quite reliably if run
>> concurrently. I'll see if I can find them.
>
> Hey, that would be really great for testing. Hope you can dig them up!

I did a bit research, and it is so trivial that it doesn't require even
a shellscript.

For any given directory with many users, I was trying to delete all the
accounts doing something along the lines of:

ldapsearch (bind options) (objectType=posixAccount) | grep '^dn ' |
xargs ldapdelete (bind options)

What you get is a deadlock: ldapsearch locks the search results until
it's done, so ldapdelete cannot delete anything. Funny enough,
ldapdelete doesn't time out, so without external intervention it'll just
hang there. As soon as you kill or cancel ldapdelete, all is back to
normal. Or at least that's what happens with LDBM and OpenLDAP 2.1.x on
as many environments as I've seen (Debian Sarge, various SuSE boxes).

With BDB ldapdelete succeeds, and the ldapsearch query leaves locks on
nonexistent objects behind. Searches die as soon as they come across the
ghost records. slapcat locks up. slapd does weird stuff.

When running these tests, I usually run a couple of "while(1) do slapcat
> /dev/null done" and "while (1) do ldapsearch /pattern/ > dev/null
done" to ensure we have some concurrency.

And even if slapd doesn't misbehave immediately, when you stop slapd and
run dbstats on the database you see that there are locks. And there
should be none.

I haven't re-tested this with the latest slapd. I try to test it this
week as time allows. But I assume you guys have some sample LDAP data.

BTW, I spent a bit of my boring Sunday reading the OpenLDAP mailing list
archive. There are plenty of reports of BDB corruption, and people are
recommending that you run dbrecover as part of your slapd init script,
as db corruption and slapd lockups are frequent.

This is a sample msg:
http://www.openldap.org/lists/openldap-software/200505/msg00267.html

And I spotted a few complaining that the RH init scripts were faulty
because they didn't call db_recover. Hmmmm.

cheers,

martin
--
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)916-7224 MOB: +64(21)364-017
Make things as simple as possible, but no simpler - Einstein
-----------------------------------------------------------------------

Torsten wrote:
>> And, just to clarify, with the 2.1 series of slapd, DB_CONFIG tweaking 
>> reduced the chance of lockups, but didn't remove them at all. In fact, 
>> it was so trivial to get the whole thing locked up that at one point I 
>> had a pair of shellscrips that did it quite reliably if run 
>> concurrently. I'll see if I can find them.
> 
> Hey, that would be really great for testing. Hope you can dig them up!

I did a bit research, and it is so trivial that it doesn't require even 
a shellscript.

For any given directory with many users, I was trying to delete all the 
accounts doing something along the lines of:

ldapsearch (bind options) (objectType=posixAccount) | grep '^dn ' | 
xargs ldapdelete (bind options)

What you get is a deadlock: ldapsearch locks the search results until 
it's done, so ldapdelete cannot delete anything. Funny enough, 
ldapdelete doesn't time out, so without external intervention it'll just 
hang there. As soon as you kill or cancel ldapdelete, all is back to 
normal. Or at least that's what happens with LDBM and OpenLDAP 2.1.x on 
as many environments as I've seen (Debian Sarge, various SuSE boxes).

With BDB ldapdelete succeeds, and the ldapsearch query leaves locks on 
nonexistent objects behind. Searches die as soon as they come across the 
ghost records. slapcat locks up. slapd does weird stuff.

When running these tests, I usually run a couple of "while(1) do slapcat 
 > /dev/null done" and "while (1) do ldapsearch /pattern/ > dev/null 
done" to ensure we have some concurrency.

And even if slapd doesn't misbehave immediately, when you stop slapd and 
run dbstats on the database you see that there are locks. And there 
should be none.

I haven't re-tested this with the latest slapd. I try to test it this 
week as time allows. But I assume you guys have some sample LDAP data.

BTW, I spent a bit of my boring Sunday reading the OpenLDAP mailing list 
archive. There are plenty of reports of BDB corruption, and people are 
recommending that you run dbrecover as part of your slapd init script, 
as db corruption and slapd lockups are frequent.

This is a sample msg:
http://www.openldap.org/lists/openldap-software/200505/msg00267.html

And I spotted a few complaining that the RH init scripts were faulty 
because they didn't call db_recover. Hmmmm.

cheers,

martin
-- 
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ  Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/           PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)916-7224                              MOB: +64(21)364-017
       Make things as simple as possible, but no simpler - Einstein
-----------------------------------------------------------------------

Revision history for this message

In Debian Bug tracker #304735, Toni Mueller (support-oeko-net) wrote on 2005-05-29: LDBM and 2.2.x

#29

Hello,

fwiw, I have 2.2.23 with ldbm backend running w/o a hitch on OpenBSD,
but breaking down completely on Debian "almost" sarge.

Esp. I can browse the directory and find whatever I want, but don't see
these same records when I use "search" (with gq or ldapsearch, that
is).

Also, looking at http://www.openldap.org/software/release/changes.html
suggests that a number of important bugs have been killed between
2.2.23 and now, although that might be too late for sarge.

Btw, aren't there any regression test suites to run before promoting
such packages, and/or can we probably move to having the debian/* stuff
in a publicly accessible CVS or something (I prefer ARCH), to be able
to recreate older packages at any date (analogous to the BSD ports
system)?

Thanks for listening!

Best,
--Toni++

Revision history for this message

In Debian Bug tracker #304735, Torsten Landschoff (torsten) wrote on 2005-05-29: Re: Bug#304735: LDBM and 2.2.x

#30

Hi Toni,

On Sun, May 29, 2005 at 04:11:48PM +0200, Toni Mueller wrote:
> fwiw, I have 2.2.23 with ldbm backend running w/o a hitch on OpenBSD,
> but breaking down completely on Debian "almost" sarge.

Interesting. Which OpenLDAP version are you using?

> Esp. I can browse the directory and find whatever I want, but don't see
> these same records when I use "search" (with gq or ldapsearch, that
> is).

Sucky.

> Also, looking at http://www.openldap.org/software/release/changes.html
> suggests that a number of important bugs have been killed between
> 2.2.23 and now, although that might be too late for sarge.

Hmm, good question. It shouldn't be too hard to just build the new
upstream version but I wonder if our release manager would really like
this.

> Btw, aren't there any regression test suites to run before promoting
> such packages, and/or can we probably move to having the debian/* stuff
> in a publicly accessible CVS or something (I prefer ARCH), to be able
> to recreate older packages at any date (analogous to the BSD ports
> system)?

Look here:

http://svn.debian.org/wsvn/pkg-openldap/openldap/debian/

Greetings

Torsten

Revision history for this message

In Debian Bug tracker #304735, Toni Mueller (support-oeko-net) wrote on 2005-05-29:

#31

Hi Torsten,

On Sun, 29.05.2005 at 17:06:32 +0200, Torsten Landschoff <email address hidden> wrote:
> On Sun, May 29, 2005 at 04:11:48PM +0200, Toni Mueller wrote:
> > fwiw, I have 2.2.23 with ldbm backend running w/o a hitch on OpenBSD,
> > but breaking down completely on Debian "almost" sarge.
> Interesting. Which OpenLDAP version are you using?

that's openldap-server-2.2.23 (but I had severe problems getting it to
run properly using bdb).

> Sucky.

You name it :-/

Time to get another LDAP server software package...

> Hmm, good question. It shouldn't be too hard to just build the new
> upstream version but I wonder if our release manager would really like
> this.

That's exactly what I thought. Also, none of the bugs listed really
spring to my mind as being the solution of any of the problems
mentioned in this bug.

> Look here:
> http://svn.debian.org/wsvn/pkg-openldap/openldap/debian/

Ok, thank you!

(I thought of having this for *all* Debian packages because the problem
arises frequently, imho).

Best,
--Toni++

Revision history for this message

In Debian Bug tracker #304735, Steve Langasek (vorlon) wrote on 2005-05-31:

#32

severity 304735 normal
thanks

Hi folks,

Since slapd 2.2.23-8 has been accepted into the archive with the code to
convert ldbm directories to bdb on upgrade, I believe this bug is no longer
RC. The remaining issues here, which are variously requests for slapd to
support a working ldbm backend or for bdb's performance to be as good as
ldbm's was (which for all I know it could be already) are somewhere between
wishlist and normal; setting to normal for now.

--
Steve Langasek
postmodern programmer

Revision history for this message

Adam Conrad (adconrad) wrote on 2005-05-31:

#33

These should all be dealt with now that we'ved synced with the latest version
from sid (2.2.23-8)

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-09-22:

#34

Message-Id: <email address hidden>
Date: Sun, 15 May 2005 18:44:38 +0200
From: juan <email address hidden>
To: Debian Bug Tracking System <email address hidden>
Subject: slapd: data corruption with LDBM backend

Package: slapd
Version: 2.2.23-1
Followup-For: Bug #304735

We have been running slapd 2.1 for over 2 years without any problems, when upgrading to 2.2 we had serious / random data loss : entries were missing but visible with a slapcat.
After migrating to BDB (it's slower), no data loss was noticed for over 2 weeks.

I suggest you mark the ldbm backend as broken.

-- System Information:
Debian Release: 3.1
APT prefers testing
APT policy: (500, 'testing')
Architecture: i386 (i686)
Kernel: Linux 2.6.11
Locale: LANG=C, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)

Versions of packages slapd depends on:
ii coreutils [fileutils] 5.2.1-2 The GNU core utilities
ii debconf 1.4.30.13 Debian configuration management sy
ii fileutils 5.2.1-2 The GNU file management utilities
ii libc6 2.3.2.ds1-21 GNU C Library: Shared libraries an
ii libdb4.2 4.2.52-18 Berkeley v4.2 Database Libraries [
ii libiodbc2 3.52.2-3 iODBC Driver Manager
ii libldap-2.2-7 2.2.23-1 OpenLDAP libraries
ii libltdl3 1.5.6-6 A system independent dlopen wrappe
ii libperl5.8 5.8.4-8 Shared Perl library
ii libsasl2 2.1.19-1.5 Authentication abstraction library
ii libslp1 1.0.11a-2 OpenSLP libraries
ii libssl0.9.7 0.9.7e-3 SSL shared libraries
ii libwrap0 7.6.dbs-8 Wietse Venema's TCP wrappers libra
ii perl [libmime-base64-perl] 5.8.4-8 Larry Wall's Practical Extraction
ii psmisc 21.5-1 Utilities that use the proc filesy

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-09-22:

#35

Message-ID: <email address hidden>
Date: Sat, 21 May 2005 16:58:13 +1200
From: "Martin Langhoff (CatalystIT)" <email address hidden>
To: <email address hidden>
Subject: How stable is BDB?

As recently as November 2004, I was seeing serious lockups and dataloss
with BDB backends, due to upstream bugs in the BDB integration, and all
our LDAP setups ended up using LDBM due to reliability concerns.

These BDB reliability concerns are tracked in Bug #190165
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=190165

And look at the pile of bugs indicating slapd lockups when using BDB:
http://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=slapd

Yes, these are old bugs, but they are still open. Is there any
indication from upstream that the problem is fixed?

Now, with LDBM broken as well, I am not sure what to do, really.
Switching to BDB is really risky -- I haven't seen the it work reliably
at all. It has been severely broken in every version I tried in the
2.0.x and 2.2.x series of OpenLDAP, both from OpenLDAP and from the
corresponding Debian packages.

regards,

martin
--
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)916-7224 MOB: +64(21)364-017
Make things as simple as possible, but no simpler - Einstein
-----------------------------------------------------------------------

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-09-22:

#36

Message-ID: <email address hidden>
Date: Sat, 21 May 2005 11:39:54 +0200
From: Torsten Landschoff <email address hidden>
To: "Martin Langhoff (CatalystIT)" <email address hidden>,
<email address hidden>
Subject: Re: Bug#304735: How stable is BDB?

--rS8CxjVDS/+yyDmU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, May 21, 2005 at 04:58:13PM +1200, Martin Langhoff (CatalystIT) wrot=
e:
> As recently as November 2004, I was seeing serious lockups and dataloss=
=20
> with BDB backends, due to upstream bugs in the BDB integration, and all=
=20
> our LDAP setups ended up using LDBM due to reliability concerns.

Understandably.

> These BDB reliability concerns are tracked in Bug #190165=20
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=3D190165
>=20
> And look at the pile of bugs indicating slapd lockups when using BDB:
> http://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=3Dslapd

*sigh* Yes.

> Yes, these are old bugs, but they are still open. Is there any=20
> indication from upstream that the problem is fixed?

As I am not running real production systems I have never seen these
lockups. (I used to but that system is still running woody with an aged
version of OpenLDAP and I am no longer responsible). Therefore I can't
really acknowledge that. Given the number of yields in the code which
seem to work around locking problems I don't have a good feeling...

> Now, with LDBM broken as well, I am not sure what to do, really.=20
> Switching to BDB is really risky -- I haven't seen the it work reliably=
=20
> at all. It has been severely broken in every version I tried in the=20
> 2.0.x and 2.2.x series of OpenLDAP, both from OpenLDAP and from the=20
> corresponding Debian packages.
=20
As Stanford is using it I'd expect it to be reliable enough for
production use. So going to BDB is probably the best bet.=20

Greetings

Torsten

--rS8CxjVDS/+yyDmU
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCjwHqdQgHtVUb5EcRAgavAJ4nfl5pq2gg2LomJz2JLb9PU1gG2QCbBS9G
sL7m7AGW2UGVRDQoURf29Y4=
=yqzi
-----END PGP SIGNATURE-----

--rS8CxjVDS/+yyDmU--

Message-ID: <20050521093954.GC7950@stargate.galaxy>
Date: Sat, 21 May 2005 11:39:54 +0200
From: Torsten Landschoff <torsten@debian.org>
To: "Martin Langhoff (CatalystIT)" <martin@catalyst.net.nz>,
	304735@bugs.debian.org
Subject: Re: Bug#304735: How stable is BDB?

--rS8CxjVDS/+yyDmU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, May 21, 2005 at 04:58:13PM +1200, Martin Langhoff (CatalystIT) wrot=
e:
> As recently as November 2004, I was seeing serious lockups and dataloss=
=20
> with BDB backends, due to upstream bugs in the BDB integration, and all=
=20
> our LDAP setups ended up using LDBM due to reliability concerns.

Understandably.

> These BDB reliability concerns are tracked in Bug #190165=20
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=3D190165
>=20
> And look at the pile of bugs indicating slapd lockups when using BDB:
> http://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=3Dslapd

*sigh* Yes.

> Yes, these are old bugs, but they are still open. Is there any=20
> indication from upstream that the problem is fixed?

As I am not running real production systems I have never seen these
lockups. (I used to but that system is still running woody with an aged
version of OpenLDAP and I am no longer responsible). Therefore I can't
really acknowledge that. Given the number of yields in the code which
seem to work around locking problems I don't have a good feeling...

> Now, with LDBM broken as well, I am not sure what to do, really.=20
> Switching to BDB is really risky -- I haven't seen the it work reliably=
=20
> at all. It has been severely broken in every version I tried in the=20
> 2.0.x and 2.2.x series of OpenLDAP, both from OpenLDAP and from the=20
> corresponding Debian packages.
=20
As Stanford is using it I'd expect it to be reliable enough for
production use. So going to BDB is probably the best bet.=20

Greetings

Torsten

--rS8CxjVDS/+yyDmU
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCjwHqdQgHtVUb5EcRAgavAJ4nfl5pq2gg2LomJz2JLb9PU1gG2QCbBS9G
sL7m7AGW2UGVRDQoURf29Y4=
=yqzi
-----END PGP SIGNATURE-----

--rS8CxjVDS/+yyDmU--

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-09-22:

#37

Message-ID: <email address hidden>
Date: Sat, 21 May 2005 04:30:36 -0700
From: Steve Langasek <email address hidden>
To: "Martin Langhoff (CatalystIT)" <email address hidden>,
<email address hidden>
Subject: Re: Bug#304735: How stable is BDB?

--tT3UgwmDxwvOMqfu
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, May 21, 2005 at 04:58:13PM +1200, Martin Langhoff (CatalystIT) wrot=
e:
> As recently as November 2004, I was seeing serious lockups and dataloss=
=20
> with BDB backends, due to upstream bugs in the BDB integration, and all=
=20
> our LDAP setups ended up using LDBM due to reliability concerns.

> These BDB reliability concerns are tracked in Bug #190165=20
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=3D190165

> And look at the pile of bugs indicating slapd lockups when using BDB:
> http://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=3Dslapd

> Yes, these are old bugs, but they are still open. Is there any=20
> indication from upstream that the problem is fixed?

> Now, with LDBM broken as well, I am not sure what to do, really.=20
> Switching to BDB is really risky -- I haven't seen the it work reliably=
=20
> at all. It has been severely broken in every version I tried in the=20
> 2.0.x and 2.2.x series of OpenLDAP, both from OpenLDAP and from the=20
> corresponding Debian packages.

In 2.2, LDBM is a royal mess; even in 2.1, AIUI, there were too many
problems to consider it releasable.

And whereas BDB was unusable in the 2.1 packages, it's reported to be vastly
improved in 2.2 when using the correct version of libdb. The remaining
corruption bug that is listed in the BTS as applying to 2.2 has at its root
a misconfigured server; the bug is still RC because a lack of performance
tuning shouldn't result in database corruption, but its impact appears to be
minimal unless your server is in an unusable state anyway.

--=20
Steve Langasek
postmodern programmer

--tT3UgwmDxwvOMqfu
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFCjxvcKN6ufymYLloRAl6FAJ9h47KAvg6489gma5H+XTzzBQ4pyACeNuk8
qLwsC+2Hpbg2NNCzUq++jRM=
=+PDK
-----END PGP SIGNATURE-----

--tT3UgwmDxwvOMqfu--

Message-ID: <20050521113036.GH4867@mauritius.dodds.net>
Date: Sat, 21 May 2005 04:30:36 -0700
From: Steve Langasek <vorlon@debian.org>
To: "Martin Langhoff (CatalystIT)" <martin@catalyst.net.nz>,
	304735@bugs.debian.org
Subject: Re: Bug#304735: How stable is BDB?

--tT3UgwmDxwvOMqfu
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, May 21, 2005 at 04:58:13PM +1200, Martin Langhoff (CatalystIT) wrot=
e:
> As recently as November 2004, I was seeing serious lockups and dataloss=
=20
> with BDB backends, due to upstream bugs in the BDB integration, and all=
=20
> our LDAP setups ended up using LDBM due to reliability concerns.

> These BDB reliability concerns are tracked in Bug #190165=20
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=3D190165

> And look at the pile of bugs indicating slapd lockups when using BDB:
> http://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=3Dslapd

> Yes, these are old bugs, but they are still open. Is there any=20
> indication from upstream that the problem is fixed?

> Now, with LDBM broken as well, I am not sure what to do, really.=20
> Switching to BDB is really risky -- I haven't seen the it work reliably=
=20
> at all. It has been severely broken in every version I tried in the=20
> 2.0.x and 2.2.x series of OpenLDAP, both from OpenLDAP and from the=20
> corresponding Debian packages.

In 2.2, LDBM is a royal mess; even in 2.1, AIUI, there were too many
problems to consider it releasable.

And whereas BDB was unusable in the 2.1 packages, it's reported to be vastly
improved in 2.2 when using the correct version of libdb.  The remaining
corruption bug that is listed in the BTS as applying to 2.2 has at its root
a misconfigured server; the bug is still RC because a lack of performance
tuning shouldn't result in database corruption, but its impact appears to be
minimal unless your server is in an unusable state anyway.

--=20
Steve Langasek
postmodern programmer

--tT3UgwmDxwvOMqfu
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFCjxvcKN6ufymYLloRAl6FAJ9h47KAvg6489gma5H+XTzzBQ4pyACeNuk8
qLwsC+2Hpbg2NNCzUq++jRM=
=+PDK
-----END PGP SIGNATURE-----

--tT3UgwmDxwvOMqfu--

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-09-22:

#38

Download full text (3.1 KiB)

Message-ID: <email address hidden>
Date: Sat, 21 May 2005 21:22:30 -0500
From: Micah Anderson <email address hidden>
To: <email address hidden>
Cc: <email address hidden>
Subject: DB_TXN_NOSYN

--vkogqOf2sHV7VnPd
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Setting DB_TXN_NOSYN in DB_CONFIG when loading your database is a good
thing to do to speed up performance, this is because writes will not
be flushed or logs written on transaction commit. If you *leave this
setting set* and your application fails, or you loose power on your
system, it is possible some number of the most recently committed
transactions may be undone during recovery. The number of transactions
at risk is governed by how many log updates can fit into the log
buffer, how often the operating system flushes dirty buffers to disk,
and how often the log is checkpointed. Even worse, if there is
database or transaction log file loss or corruption (for example, if a
disk drive fails), then catastrophic recovery (read: from backups) is
necessary, and Berkeley DB recovery will only be able to restore the
system to the state of the last archived log file. In this case,
information may also have been lost.

After your data has been loaded, DB_TXN_NOSYN should *be removed* from
DB_CONFIG and slapd restarted. Although many parameters in DB_CONFIG
(such as cachesize) can only be set once as they apply to the creation
of the database environment and to change them you need to destroy and
recreate it, the DB_TXN_NOSYNC flag can be changed between runs.

Setting DB_TXN_NOSYNC should *only* be done to increase performance at
the cost of sacrificing transactional durability, except in replicated
environments (see http://www.sleepycat.com/docs/ref/rep/trans.html).

What many people do (if you search the openldap mailing list you will
confirm this), is to set the following in the DB_CONFIG:

# Just use these settings when doing slapadd, comment them out
# and reload slapd afterwards, or risk data loss
#set_flags DB_TXN_NOSYNC
#set_flags DB_TXN_NOT_DURABLE

this way they are commented out, and there is a warning. When you go
and do your slapadd you uncomment them, fire things up and then
slapadd. When you are done you comment them out and reload slapd. You
*can* leave these enabled, but you are risking data loss.

See for reference:=20
http://www.openldap.org/faq/index.cgi?_highlightWords=3Ddb_txn_nosync&file=
=3D1072
http://www.openldap.org/faq/index.cgi?_highlightWords=3Ddb_txn_nosync&file=
=3D893

It is also highly recommended that you tune your BDB environment
according to your needs or you will experience slowdown, see the
following:
http://www.stanford.edu/services/directory/openldap/configuration/bdb-confi=
g-42.html
http://www.openldap.org/faq/data/cache/1075.html

Micah

--vkogqOf2sHV7VnPd
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFCj+zm9n4qXRzy1ioRAl8kAJ9M6UqE1n+oI0esMwX+1Mi95PHlNgCfTdp0
ghWQb6Ki4NBuTsQCCfzwT64=
=97if
-----END PGP SIGNATURE--...

Message-ID: <20050522022230.GH21743@riseup.net>
Date: Sat, 21 May 2005 21:22:30 -0500
From: Micah Anderson <micah@riseup.net>
To: 304735@bugs.debian.org
Cc: 304735-submitter@bugs.debian.org
Subject: DB_TXN_NOSYN

--vkogqOf2sHV7VnPd
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Setting DB_TXN_NOSYN in DB_CONFIG when loading your database is a good
thing to do to speed up performance, this is because writes will not
be flushed or logs written on transaction commit. If you *leave this
setting set* and your application fails, or you loose power on your
system, it is possible some number of the most recently committed
transactions may be undone during recovery. The number of transactions
at risk is governed by how many log updates can fit into the log
buffer, how often the operating system flushes dirty buffers to disk,
and how often the log is checkpointed. Even worse, if there is
database or transaction log file loss or corruption (for example, if a
disk drive fails), then catastrophic recovery (read: from backups) is
necessary, and Berkeley DB recovery will only be able to restore the
system to the state of the last archived log file. In this case,
information may also have been lost.

After your data has been loaded, DB_TXN_NOSYN should *be removed* from
DB_CONFIG and slapd restarted. Although many parameters in DB_CONFIG
(such as cachesize) can only be set once as they apply to the creation
of the database environment and to change them you need to destroy and
recreate it, the DB_TXN_NOSYNC flag can be changed between runs.

Setting DB_TXN_NOSYNC should *only* be done to increase performance at
the cost of sacrificing transactional durability, except in replicated
environments (see http://www.sleepycat.com/docs/ref/rep/trans.html).

What many people do (if you search the openldap mailing list you will
confirm this), is to set the following in the DB_CONFIG:

# Just use these settings when doing slapadd, comment them out
# and reload slapd afterwards, or risk data loss
#set_flags DB_TXN_NOSYNC
#set_flags DB_TXN_NOT_DURABLE

this way they are commented out, and there is a warning. When you go
and do your slapadd you uncomment them, fire things up and then
slapadd. When you are done you comment them out and reload slapd. You
*can* leave these enabled, but you are risking data loss.

See for reference:=20
http://www.openldap.org/faq/index.cgi?_highlightWords=3Ddb_txn_nosync&file=
=3D1072
http://www.openldap.org/faq/index.cgi?_highlightWords=3Ddb_txn_nosync&file=
=3D893

It is also highly recommended that you tune your BDB environment
according to your needs or you will experience slowdown, see the
following:
http://www.stanford.edu/services/directory/openldap/configuration/bdb-confi=
g-42.html
http://www.openldap.org/faq/data/cache/1075.html

Micah

--vkogqOf2sHV7VnPd
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFCj+zm9n4qXRzy1ioRAl8kAJ9M6UqE1n+oI0esMwX+1Mi95PHlNgCfTdp0
ghWQb6Ki4NBuTsQCCfzwT64=
=97if
-----END PGP SIGNATURE-----

--vkogqOf2sHV7VnPd--

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2005-09-22:

#39

Message-ID: <email address hidden>
Date: Sun, 22 May 2005 22:15:19 +1200
From: "Martin Langhoff (CatalystIT)" <email address hidden>
To: Steve Langasek <email address hidden>
CC: <email address hidden>
Subject: Re: Bug#304735: How stable is BDB?