slapd package configuration aborts due to "ordered_value_sort failed on attr olcAccess" error during Hardy -> Lucid upgrade

Bug #538516 reported by Nathan Stratton Treadway
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
openldap (Ubuntu)
Fix Released
High
Thierry Carrez
Lucid
Fix Released
High
Thierry Carrez

Bug Description

I recently upgraded my server from Hardy to Lucid, using "do-release-upgrade -d" from the command line.

When the upgrade process attempted to install the new version of the slapd package, the package installation/configuration failed due to problems with the DBD database files (as I reported in bug #536958).

Once I resolved that problem, I re-ran "dpkg --pending --configure", and the configuration script was able to successfully convert my slapd.conf file to the slapd.d configuration directory. However, a second later, I received the following error message:
  Starting OpenLDAP: slapd - failed.
  The operation failed but no output was produced. For hints on what went
  wrong please refer to the system's logfiles (e.g. /var/log/syslog) or
  [...]
  invoke-rc.d: initscript slapd, action "start" failed.
  dpkg: error processing slapd (--configure):

Sure enough, the syslog file contained the following:
  Mar 11 20:43:23 suza slapd[7087]: @(#) $OpenLDAP: slapd 2.4.21 (Feb 18 2010 06:12:56) $#012#011buildd@yellow:/build/buildd/openldap-2.4.21/debian/build/servers/slapd
  Mar 11 20:43:23 suza slapd[7087]: config error processing olcDatabase={0}config,cn=config: ordered_value_sort failed on attr olcAccess#012
  Mar 11 20:43:23 suza slapd[7087]: slapd stopped.

Since the slapd.postinst returns an exit status in this situation, the slapd package is left in half-configured status.

Tags: hardy2lucid

Related branches

Revision history for this message
Nathan Stratton Treadway (nathanst) wrote :

I found that running "slaptest -F /etc/ldap/slapd.d" generated that same error message.

To investigate further, I used the command line
    slaptest -F /etc/ldap/slapd.d -d 1 2>&1 | grep "\.ldif"
to track down the full path of the file that contained the offending line, which turned out to be
   /etc/ldap/slapd.d/cn=config/olcDatabase={0}config.ldif

I am attaching a copy of that file, as it was created by the slapd.postinst script.

Eventually I was able to track the error down to the following line from that file:
  olcAccess: to * by dn.exact=cn=localroot,cn=config manage by * break

When I edited that line to read:
  olcAccess: {1}to * by dn.exact=cn=localroot,cn=config manage by * break
and then re-ran the "slaptest" command, the error went away.

I then tried running "dpkg --pending --configure" again... but the postinst script errored out because /var/backups/*-2.4.9-0ubuntu0.8.04.2.ldapdb already existed.

I moved the old backup file out of the way and tried again... only to get the "Starting OpenLDAP: slapd - failed." message again. It turned out that the postinst script had re-converted the slapd.conf file and then re-added the oldAccess line back to the config file, and so slapd was still erroring out.

So I went ahead and edited the grep and sed lines in /var/lib/dpkg/info/slapd.postinst (inside the "if previous_version_older 2.4.11-0ubuntu1" block) so that the text of the line added there used there included the "{1}".

Then I moved the backup file out of the way and reran "dpkg --pending --configure"... and this time slapd started up successfully, and the slapd package was left in the "installed" state.

Revision history for this message
Nathan Stratton Treadway (nathanst) wrote :

I did some additional testing and believe that all Hardy -> Lucid upgrades will hit this bug.

Specifically, I installed the "slapd" package on Hardy box, one that had never had any openldap packages installed. I let the package installation script create the default slapd.conf file there, and then copied the resulting file over to the machine that is now running Lucid. I then created an empty slapd.d directory, ran "slaptest -f slapd.conf -F slapd.d", and compared the new slapd.d directory tree with the /etc/ldap/slapd.d tree that was generated from my system local slapd.conf file.

Sure enough, the *{0}config.ldif file generated from the stock slapd.conf fle contained the same
  olcAccess: {0}to * by * none
line that was causing the conflict with the "olcAccess: to * by ..." line being added by the slapd.postinst script. (So in other words, even a stock, uncustomized slapd.conf file would trigger this error upon upgrade to Lucid's slapd.)

I see from the changelog.Debian.gz file for slapd that the postinst script started edited this config file in the Karmic timeframe:

  openldap (2.4.17-1ubuntu3) karmic; urgency=low
     [...]
     * Add cn=localroot,cn=config authz mapping on upgrades.

   -- Mathias Gug < <email address hidden>> Tue, 11 Aug 2009 14:48:56 -0400

Out of curiousity, I ran "slaptest -f slapd.conf -F ..." on my Hardy box, and then compared the *{0}config.ldif file generated there with the one generated on Lucid.. and saw that the "olcAccess: {0}to * by * none" line was NOT generated there.

So, I think that the issue here is that between 2.4.17 and 2.4.21, the *{0}config.ldif file generated by "slaptest -f ... -F ..." changed in such a way that it's no longer compatible with the "cn=localroot" lines that the postinst script is adding.

There was no problem for machines that were upgraded first to Intrepid (when the configuration data migration took place) and then to Karmic (when the "cn=localroot" lines were added to the previously-generated *{0}config.ldif file)... but anyone migrating directly from Hardy will run into problems since by openldap 2.4.21 the two steps are incompatible....

Revision history for this message
Nathan Stratton Treadway (nathanst) wrote :

A few other notes:

Bug #526230 "On upgrade modifies multiple olcAccess definition are not handled correclty" is definitely related to this one. However, #526230 deals with a Jaunty->Karmic upgrade, and specifically mentions that the pre-upgrade configuration had multiple oldAccess lines (so presumably it had been customized locally). I created a separate bug here in case there is simple tweak to the slapd.postinst script that would allow the Hardy->Lucid upgrade to work, but which wouldn't fix #526230. On the other hand, a more comprensive solution of some sort could certainly resolve both bugs at the same time.

Also, I should mention that my goal when I added the "{1}" to the text of the new dn.exact=cn=localroot line was simply to make the smallest possible change needed get "dpkg" to think that the package installation had succeeded (so that it would stop trying to reconfigure the package every time I installed some other package, etc.).

I haven't actually tried doing anything with my LDAP database yet, but I as far as I understand the workings of the oldAccess lines, the dn.exact=cn=localroot line as it now exists is actually completely ignored, since the "{0}to * by * none" line will prevent any lines with higher sequence numbers from being processed.... So presumably the actual fix will have to take some other approach to getting past this error....

Revision history for this message
Chuck Short (zulcss) wrote :

Thanks for the bug report, we'll try to get this fixed for lucid.

Regards
chuck

Changed in openldap (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Steve Langasek (vorlon)
Changed in openldap (Ubuntu Lucid):
milestone: none → ubuntu-10.04-beta-2
Thierry Carrez (ttx)
Changed in openldap (Ubuntu Lucid):
assignee: nobody → Mathias Gug (mathiaz)
Revision history for this message
Jay (jay-wharfs) wrote :

I think this is a repetition of

https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/450645

Other bug has been assigned low importance - this is a major problem and has been around since karmic.

Be good to see some resolution of the various ldap issues in ubuntu at the minute.

Thierry Carrez (ttx)
Changed in openldap (Ubuntu Lucid):
assignee: Mathias Gug (mathiaz) → Thierry Carrez (ttx)
Thierry Carrez (ttx)
Changed in openldap (Ubuntu Lucid):
status: Confirmed → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openldap - 2.4.21-0ubuntu3

---------------
openldap (2.4.21-0ubuntu3) lucid; urgency=low

  * debian/slapd.postinst, debian/slapd.scripts-common: Upgrade databases
    before trying to convert to slapd.d, to avoid upgrade failure from hardy
    (LP: #536958)
  * debian/slapd.postinst: Add a {1} numeric index to olcAccess entry in
    olcDatabase={0}config.ldif to avoid upgrade failures (LP: #538516, #526230)
 -- Thierry Carrez <email address hidden> Mon, 29 Mar 2010 13:31:47 +0200

Changed in openldap (Ubuntu Lucid):
status: In Progress → Fix Released
Revision history for this message
Jay (jay-wharfs) wrote :

As I commented earlier, I belive this is the same bug as in karmic, https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/450645.

Will this be fixed so you can dist upgrade an ldap from jaunty -> karmic -> lucid ... or will this remain broken for karmic ?

Thanks,

Jay

Revision history for this message
Nathan Stratton Treadway (nathanst) wrote :

I will try to actually run a test of this scenario sometime in the next few days, but at first glance it appears to me that simply adding "{1}" to both the "grep" and the "sed" lines of the postinst script will fix Hardy -> Lucid upgrades, but will cause new problems for other upgrade paths.

In particular, if the slapd package was upgraded 2.4.17/2.4.18 timeframe, an oldAccess line without any index would have already been added to the .ldif file, and then upon upgrade to Lucid, this updated postinst script would add the new "{1}" version of the line as well....

Revision history for this message
Thierry Carrez (ttx) wrote :

@Jay: once this is fixed, it can be backported for Karmic.

@Nathan: My understanding is that the olcAccess line added before would make the package fail to start until it is manually fixed to include a {1}. The idea here is to keep the package working on a hardy->lucid upgrade, not to automagically fix a broken karmic setup in karmic->lucid upgrades...

Revision history for this message
Nathan Stratton Treadway (nathanst) wrote :

Ah, never mind.

I was thinking that if the user upgraded from jaunty up to karmic and then again to lucid, both copies of the oldAccess line would be added to the file (i.e. one with no index, by the karmic upgrade, and one with "{1}", by the lucid upgrade) -- but I see now the postinst script checks to see what version of the package we're upgrading from before adding the lines, which would prevent the lucid upgrade from trying to edit the file a second time.

Revision history for this message
Nathan Stratton Treadway (nathanst) wrote :

Using this new version of the slapd.postinst script, the "cn=config" database ends up with these two oldAccess attributes:

$ sudo slapcat -b"cn=config" -s"olcDatabase={0}config,cn=config" | grep olcAccess
olcAccess: {0}to * by * none
olcAccess: {1}to * by dn.exact=cn=localroot,cn=config manage by * break

As far as I understand the OpenLDAP Access Control documentation, in this scenario the {0} line will always take precedence over the {1} line (so that the later will just be ignored). It seems like the two separate directives should instead be combined into one, something like:

olcAccess: {0}to * by dn.exact=cn=localroot,cn=config manage by * none

I haven't yet managed to find any discussion of the exact goals behind adding the various "localroot" access directives into the slapd configuration, so I'm not sure what sort of testing I can do to confirm that my understanding is correct.

But I figured I would go ahead and submit this comment now, in hopes that someone who knows more about why this logic was added to the script in version 2.4.17-1ubuntu3 can check to see if this new version of the script is still having the desired effect....

Revision history for this message
Thierry Carrez (ttx) wrote :

@Nathan: yes, rereading the slapd.access manpage I think you're right, the first match will define level of access:

<<Access control checking stops at the first match of the <what> and <who> clause, unless otherwise dictated by the <control> clause.>>

Also, given that:
<<Each <who> clause list is implicitly terminated by a "by * none stop" clause that results in stopping the access control with no access privileges granted>>
I think the right way is to completely replace the existing olcAccess: {0} line by
olcAccess: {0}to * by dn.exact=cn=localroot,cn=config manage break
and remove the new olcAccess: {1} line.

I'll file a new bug about this.

Revision history for this message
Thierry Carrez (ttx) wrote :

See bug 559070 (targeted to Lucid) for followup

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.