slapd becomes non-responsive after several weeks runtime

Bug #585208 reported by Craig Ringer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openldap (Ubuntu)
Expired
Low
Unassigned

Bug Description

I use an Ubuntu 10.04 (lucid) server for, among other things, LDAP authentication.

slapd:
  Installed: 2.4.21-0ubuntu5
  Candidate: 2.4.21-0ubuntu5
  Version table:
 *** 2.4.21-0ubuntu5 0
        500 http://ftp.iinet.net.au/pub/ubuntu/ lucid/main Packages
        500 http://au.archive.ubuntu.com/ubuntu/ lucid/main Packages
        100 /var/lib/dpkg/status

Distributor ID: Ubuntu
Description: Ubuntu 10.04 LTS
Release: 10.04
Codename: lucid

With Ubuntu 9.10 it was stable and well-behaved. Since upgrading to 10.04, however, I've been seeing issues where slapd stops responding to ldap queries after several weeks of uptime. slapd has to be killed and re-launched to get it working again.

I haven't found any way to reproduce this except waiting. The delay before it happens is variable, from a few days to a few weeks.

All other daemons on the server are stable and well behaved. The issue appears to be restricted to slapd.

After the second time this happened, I tried to dump a core from slapd so I could debug it after I'd restarted it to restore services. However, gcore reported the following error:

/root/slapd.core.gMMj3g:1: Error in sourced command file:
Cannot access memory at address 0x1fe8513
gcore: failed to create slapd.core.25525

and when I attached gdb directly to attempt to just get a backtrace, it failed in a similar manner:

Attaching to process 25525
Reading symbols from /usr/sbin/slapd...Reading symbols from /usr/lib/debug/usr/sbin/slapd...done.
done.
Cannot access memory at address 0x1fe8513

... making it rather hard to collect debug information. A backtrace requested after that error is essentially useless, as it contains no symbols, only '???' and addresses.

This time when it happened I collected some information from slapd's /proc entry before re-starting the process. It's all I could come up with, and it didn't tell me much. I'll attach the files from /proc here.

Ideas? Can this be caused by a slapd bug? Or should I be looking for a kernel bug?

Revision history for this message
Craig Ringer (ringerc) wrote :

The attachment contains /proc/$pid/{environ,maps,pagemap,smaps,status} from a failed slapd instance.

I couldn't read /proc/$pid/mem; it reported "no such process".

Revision history for this message
Adam Sommer (asommer) wrote :

Thanks for reporting this bug, and helping make Ubuntu better. Just to double check... there is nothing in /var/log/syslog when slapd stops responding?

Another idea might be to run slapd in a terminal with the -d -1 option in order to capture it's debug output.

Thanks,
Adam

Chuck Short (zulcss)
Changed in openldap (Ubuntu):
importance: Undecided → Low
status: New → Incomplete
Revision history for this message
Craig Ringer (ringerc) wrote :

Did you consider giving me more than a few hours to respond? Or collect more debug info?

Did either of you even bother looking at the attached details? Did you notice the "several weeks" part, where it's not particularly easy to reproduce this issue on demand?

This bug should not be closed.

Revision history for this message
Craig Ringer (ringerc) wrote :

Sorry, that was unnecessarily grumpy. It's been a long day of fighting bugs in innumerable different things, etc.

Revision history for this message
Adam Sommer (asommer) wrote :

I did look through the attached files, but like you said it doesn't seem to tell you much. I suggested running slapd in debug mode knowing that most of the output won't be needed, but when slapd stops responding there should be clear errors in the output... or at least something to point in the right direction. I realize that running slapd like that isn't the most "clean" think to do, but I don't have a better idea.

Setting the status to incomplete doesn't close the bug, it simply means that to move forward more information from the poster is required. We truly wish to help you solve this bug, so if it takes a few weeks, months, etc that isn't a problem.

Thanks,
Adam

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for openldap (Ubuntu) because there has been no activity for 60 days.]

Changed in openldap (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.