openldap segfault on startup with delta-syncrepl MMR
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
openldap (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
We have setup a LDAP master-master replication using delta-sync. On the 2-node cluster, one server fail to restart with a segfault immediately after the startup.
Every time we restart the server, it fail with:
/# slapd -u openldap -g openldap -d stats -f /etc/ldap/
5315e2a1 @(#) $OpenLDAP: slapd (Sep 19 2013 22:39:38) $
5315e2a1 hdb_db_open: database "cn=accesslog": unclean shutdown detected; attempting recovery.
5315e2a3 hdb_db_open: database "dc=qa,
5315e2a6 <= bdb_inequality_
5315e2db slapd starting
Segmentation fault (core dumped)
On syslog:
slapd[4506]: segfault at 5e ip 00007f13dd954f29 sp 00007f13c0a4c800 error 4 in slapd[7f13dd8bb
Using a coredump and slapd-dbg, the stacktrace of segfault is:
#0 syncrepl_op_modify (op=0x7f13c0a4d280, rs=<optimized out>) at ../../.
#1 0x00007f13dd9640fa in overlay_op_walk (op=0x7f13c0a4d280, rs=0x7f13c0a4cd50, which=op_modify, oi=0x7f13de46e2f0, on=<optimized out>) at ../../.
#2 0x00007f13dd9642bb in over_op_func (op=0x7f13c0a4d280, rs=<optimized out>, which=<optimized out>) at ../../.
#3 0x00007f13dd956b6f in syncrepl_
#4 0x00007f13dd95b6ad in do_syncrep2 (si=0x7f13de46f390, op=0x7f13c0a4d280) at ../../.
#5 do_syncrepl (ctx=<optimized out>, arg=0x7f13de46f180) at ../../.
#6 0x00007f13dd4559aa in ?? () from /usr/lib/
#7 0x00007f13dc38de9a in start_thread () from /lib/x86_
#8 0x00007f13dc0ba3fd in clone () from /lib/x86_
#9 0x0000000000000000 in ?? ()
This bug look really like upstream bug ITS#7354 [1] :
* failure at same function, at same line (same "if ( ml->sml_flags == SLAP_MOD_INTERNAL )")
* same wrong value in "ml" variable (0x40)
* same setup as title in ITS ticket (delta-syncrepl with master-master)
I don't know how to reproduce the situation, so it's pretty hard to do test. The fix for this issue is very short, it's a one line fix [2].
Note: If the one-line fix is not backported, another solution could be to update openldap version to 2.4.33, which include this fix. Sadly trusty only have 2.4.31.
[1]: http://
[2]: http://
I've attached the debdiff patch for trusty.
I'm building a backport for precise to test if slapd can start with this patch applied (the server on which the issue occure is running precise).