openldap segfault on startup with delta-syncrepl MMR

Bug #1287730 reported by PierreF
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openldap (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

We have setup a LDAP master-master replication using delta-sync. On the 2-node cluster, one server fail to restart with a segfault immediately after the startup.

Every time we restart the server, it fail with:

/# slapd -u openldap -g openldap -d stats -f /etc/ldap/slapd.conf
5315e2a1 @(#) $OpenLDAP: slapd (Sep 19 2013 22:39:38) $
        buildd@panlong:/build/buildd/openldap-2.4.28/debian/build/servers/slapd
5315e2a1 hdb_db_open: database "cn=accesslog": unclean shutdown detected; attempting recovery.
5315e2a3 hdb_db_open: database "dc=qa,dc=example,dc=net": unclean shutdown detected; attempting recovery.
5315e2a6 <= bdb_inequality_candidates: (entryCSN) not indexed
5315e2db slapd starting
Segmentation fault (core dumped)

On syslog:

slapd[4506]: segfault at 5e ip 00007f13dd954f29 sp 00007f13c0a4c800 error 4 in slapd[7f13dd8bb000+128000]

Using a coredump and slapd-dbg, the stacktrace of segfault is:

#0 syncrepl_op_modify (op=0x7f13c0a4d280, rs=<optimized out>) at ../../../../servers/slapd/syncrepl.c:2132
#1 0x00007f13dd9640fa in overlay_op_walk (op=0x7f13c0a4d280, rs=0x7f13c0a4cd50, which=op_modify, oi=0x7f13de46e2f0, on=<optimized out>) at ../../../../servers/slapd/backover.c:661
#2 0x00007f13dd9642bb in over_op_func (op=0x7f13c0a4d280, rs=<optimized out>, which=<optimized out>) at ../../../../servers/slapd/backover.c:723
#3 0x00007f13dd956b6f in syncrepl_message_to_op (si=0x7f13de46f390, op=0x7f13c0a4d280, msg=0x7f13a8109660) at ../../../../servers/slapd/syncrepl.c:2316
#4 0x00007f13dd95b6ad in do_syncrep2 (si=0x7f13de46f390, op=0x7f13c0a4d280) at ../../../../servers/slapd/syncrepl.c:986
#5 do_syncrepl (ctx=<optimized out>, arg=0x7f13de46f180) at ../../../../servers/slapd/syncrepl.c:1522
#6 0x00007f13dd4559aa in ?? () from /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
#7 0x00007f13dc38de9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#8 0x00007f13dc0ba3fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#9 0x0000000000000000 in ?? ()

This bug look really like upstream bug ITS#7354 [1] :
* failure at same function, at same line (same "if ( ml->sml_flags == SLAP_MOD_INTERNAL )")
* same wrong value in "ml" variable (0x40)
* same setup as title in ITS ticket (delta-syncrepl with master-master)

I don't know how to reproduce the situation, so it's pretty hard to do test. The fix for this issue is very short, it's a one line fix [2].

Note: If the one-line fix is not backported, another solution could be to update openldap version to 2.4.33, which include this fix. Sadly trusty only have 2.4.31.

[1]: http://www.openldap.org/its/index.cgi/Software%20Bugs?id=7354;selectid=7354;usearchives=1
[2]: http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=commitdiff;h=3f71f756013a61b6a3cf7c529e1ec42675f5e040

Tags: patch
Revision history for this message
PierreF (pierre-fersing) wrote :

I've attached the debdiff patch for trusty.

I'm building a backport for precise to test if slapd can start with this patch applied (the server on which the issue occure is running precise).

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "lp1287730.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
PierreF (pierre-fersing) wrote :

I've done the following thing to be sure the patch fix the issue:

* backported trusty version (2.4.31-1+nmu2ubuntu5) on precise. Run this version, problem still occur
* backported trusty + patch (e.g. the 2.4.31-1+nmu2ubuntu5 + the debdiff attached) on precise: Run this version, slapd start successfully.

So I confirm that the patch fix this issue.

Revision history for this message
Martin Pitt (pitti) wrote :

Thanks! Uploaded with adding some DEP-3 patch headers.

Changed in openldap (Ubuntu):
status: New → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote :

Note that this is currently held in -proposed as the package now fails to build on ppc64el due to a test failure.

Revision history for this message
Martin Pitt (pitti) wrote :

Unsubscribing sponsors as there's nothing else to sponsor. Please re-subscribe if you happen to have an idea/patch about the failure (I can do test builds on ppc64el, too).

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openldap - 2.4.31-1+nmu2ubuntu8

---------------
openldap (2.4.31-1+nmu2ubuntu8) trusty; urgency=medium

  * Bump database_format_changed value to 2.4.31-1+nmu2ubuntu5 for db5.3.
 -- Adam Conrad <email address hidden> Mon, 17 Mar 2014 12:50:18 -0600

Changed in openldap (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.