Local login fails without LDAP server

Bug #253937 reported by Steve
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libnss-ldap (Ubuntu)
Confirmed
Low
Unassigned

Bug Description

lsb_release -rd:
Description: Ubuntu 8.04.1
Release: 8.04

--

Basically it's the same thing going on as in #20994, but I can login through the console.

I've got a network configuration, where the users are authenticated an supported by a combination of LDAP, Samba and NFS (= PDC = primary domain controller).

As long as the server runs (and with it LDAP, Samba and NFS) everything is OK an works perfectly.

On all computers, no matter if server or client, there is a admin-account set up (uid: ubuntu). This account is available only locally (= /etc/passwd, /etc/shadow) and it's not in the LDAP directory, since it would cause trouble with the [u|g]id's and the home directories. -- "ubuntu" is a local admin, who survives no matter if the net or the server is down.

Now: if the server is down, this local admin account won't work anymore. It takes an age until anything happens. When something happens, it's an error message saying: "failed to initialize HAL"

Revision history for this message
Steve (tooroot) wrote :
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Thanks for your report.

Affecting to pam (pam-ldap would probably be more appropriate)

Revision history for this message
Steve Langasek (vorlon) wrote :

Thank you for taking the time to report this bug and help to improve Ubuntu.

Please send the contents of the following files:

/etc/pam.d/common-auth
/etc/nsswitch.conf
/etc/ldap.conf

Please take care to remove any passwords from /etc/ldap.conf before sending.

Revision history for this message
Steve (tooroot) wrote :
Revision history for this message
Steve (tooroot) wrote :
Revision history for this message
Steve (tooroot) wrote :

/etc/ldap.conf and /etc/ldap/ldap.conf have equal contents.

Revision history for this message
Steve (tooroot) wrote :

I've used auth-client-config to do all the PAM config.

Revision history for this message
Steve Langasek (vorlon) wrote :

Nothing looks amiss in the PAM or NSS configs. From the description, this is not a PAM problem at all, but an nss_ldap one: it's not the authentication which fails, but the resolution of users and groups afterwards.

I believe the relevant section of /etc/ldap/ldap.conf is this:

# Search timelimit
#timelimit 30

# Bind/connect timelimit
#bind_timelimit 30

# Reconnect policy: hard (default) will retry connecting to
# the software with exponential backoff, soft will fail
# immediately.
bind_policy soft

# Idle timelimit; client will close connections
# (nss_ldap only) if the server has not been contacted
# for the number of seconds specified below.
#idle_timelimit 3600

Note that, per nss_ldap(5), the default time limit on connections to the LDAP server (the bind_timelimit) is 30 seconds. That's a 30 second timeout for *each* process that needs to look up a username or group name. If you are concerned about usability when the LDAP server is unavailable, you probably want to lower this timeout or run a cache such as nscd. (The libnss-ldap package Recommends: nscd).

Revision history for this message
Steve (tooroot) wrote :

Hi,

the thing is, that I've already tried setting those timeouts. And there was no result at all. The "lookup" time didn't change.

Wouldn't it be a lot more intelligent in libnss-ldap to do a *one-time* check if the LDAP server is reachable, and if not there's just no output. Just something simple like a ping. Only local files will be used.

This is the behaviour I expected to happen -- no server, no data. Simple. Short.

Cheers

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 253937] Re: Local login fails without LDAP server

On Wed, Aug 06, 2008 at 02:34:20PM -0000, Steve wrote:
> Wouldn't it be a lot more intelligent in libnss-ldap to do a *one-time*
> check if the LDAP server is reachable, and if not there's just no
> output. Just something simple like a ping. Only local files will be
> used.

If you are only using libnss-ldap without nscd, there is nowhere in the
model for this reachability information to be stored. If you use nscd,
results will be cached in the event the server is down.

But adjusting the timeout limits should also have an effect - were you
changing the 'timelimit' or the 'bind_timelimit' setting? In normal
circumstances, I would expect the 'bind_timelimit' to be the one that
applies for such failures; 'timelimit' only matters if your server *is*
alive but is taking a pathologically long time to reply to queries.

Chuck Short (zulcss)
Changed in libnss-ldap (Ubuntu):
status: New → Incomplete
importance: Undecided → Low
Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :

On 2008-08-06, Steve had this tidbit of wisdom:
> If you are only using libnss-ldap without nscd, there is nowhere in the
> model for this reachability information to be stored. If you use nscd,
> results will be cached in the event the server is down.

Well, yes and no. Enumeration of NSS database, such as happen when you invoke initgroups(), would still block. As such, GDM would still take forever to start a desktop session, even if you are running nscd. In fact, nscd is of practically no help if the network directory server goes down.

> But adjusting the timeout limits should also have an effect - were you
> changing the 'timelimit' or the 'bind_timelimit' setting? In normal
> circumstances, I would expect the 'bind_timelimit' to be the one that
> applies for such failures; 'timelimit' only matters if your server *is*
> alive but is taking a pathologically long time to reply to queries.

Even setting bind_timelimit (with or without "bind_policy soft") will not help much, as every NSS query will still need to wait for the timeout, and all these timeout do add up pretty quickly (we measured 45 minutes to open a GNOME session with "bind_timelimit 5" on hardy).

It is a pretty complex problem. I have pushed a blueprint to resolve that, reliable-nss-caching, and mathiaz packaged the sssd client from the FreeIPA project in karmic to address that issue. We need to test it and make sure it actually resolve the issue in a resilient and scalable fashion.

Chuck Short (zulcss)
Changed in libnss-ldap (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Derek Simkowiak (ubuntu-cool-st) wrote :

Same thing here on Ubuntu 9.10.

Here is a (possibly) relevant nss_ldap thread:

http://old.nabble.com/No-timeout-for-nss_ldap--td14576190.html

Unfortunately, that thread ends with "I am looking at fixing this now and providing some time outs on the soft
path as well. Will keep you informed." That was back in 2008.

In my opinion this is an important bug. One of the major reasons for using LDAP+nss is for high availability in corporate networks... and this bug breaks that completely.

Revision history for this message
Derek Simkowiak (ubuntu-cool-st) wrote :
Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :

Derek Simkowiak wrote:
> In my opinion this is an important bug. One of the major reasons for
> using LDAP+nss is for high availability in corporate networks... and
> this bug breaks that completely.

You are starting with a wrong assumption: using nss_ldap will not
provide you with any type of high-availability. In fact, it may have
the opposite effect, as authentication becomes dependent on the
availability of network and LDAP directory service.

That being said, the bug is not really one; it is more of an
architectural shortcoming. And it is not specific to Ubuntu: any Unix
(including pretty much every other Linux distributions) that implement
NSS as stateless library is bound to have the same problem. NSS was
written as an abstraction layer that assumed the database, traditionally
file such as /etc/passwd, are always available and cheap to query.
These assumptions break down when the database had to be queried over
the network.

There is no proper fix, outside of ripping off NSS entirely for
something new (which is not practical, as you can guess). All you can
do is to mitigate the problem. Tweaking the various limit in
/etc/ldap.conf is useless; even very short timeouts do add up. nscd,
which is buggy as hell anyway, will block the second it has to query the
network database because it does not keep state in the first place.

The only solution that can provide some relief is to have a daemon sit
between the library and the network database to cache network queries
and to keep state of the network database. And continue to return
results without blocking when it is not, unlike nscd. Solaris had such
a thing for quite some time, with good results. In Ubuntu, the
libnss-ldapd and sssd package, and the nssov slapd overlay, provide just
that. I have not experienced with any of them, so I cannot make an
enlightened recommendation, but they all tries to address exactly the
problem being discussed here. I suggest you investigate them, and
reports bug you find along the way.

--
Etienne Goyer
Technical Account Manager - Canonical Ltd
Ubuntu Certified Instructor - LPIC-3

 ~= Ubuntu: Linux for Human Beings =~

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.