autofs: Assertion 'set_remove(iterator->links, link) == link' failed at src/shared/userdb.c:314, function userdb_on_query_reply(). Aborting.

Bug #1880193 reported by Michael Andreev
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
autofs (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Invalid
Undecided
Unassigned
systemd (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Invalid
Undecided
Unassigned

Bug Description

autofs has a periodic error on mounting shares in Ubuntu 20.04 (it happens about 1 time out of 5):
"Assertion 'set_remove(iterator->links, link) == link' failed at src/shared/userdb.c:314, function userdb_on_query_reply(). Aborting.
Aborted (core dumped)"

`autofs.service` restart (or `automount` app restart) fixes this issue. However if some of home dirs (like `Desktop` or `Documents`) are mounted by `autofs`, user can't login into Ubuntu Desktop Environment (PC freezes on login with black screen). Since this error can prevent user log in, it might be considered as critical bug.

It happens both in `autofs` systemd service and by direct execution of `automount` app (`automount -f -d` command).

May be it's an underlying error in `systemd` library (I found the line, mentioned in error, in its source codes).

This issue has place in Ubuntu 20.04 (it works correctly in Ubuntu 18.04):

> lsb_release -rd
Description: Ubuntu 20.04 LTS
Release: 20.04

Packages versions:

> apt-cache policy autofs systemd
autofs:
  Installed: 5.1.6-2
  Candidate: 5.1.6-2
  Version table:
 *** 5.1.6-2 500
        500 http://ru.archive.ubuntu.com/ubuntu focal/main amd64 Packages
        100 /var/lib/dpkg/status
systemd:
  Installed: 245.4-4ubuntu3
  Candidate: 245.4-4ubuntu3
  Version table:
 *** 245.4-4ubuntu3 500
        500 http://ru.archive.ubuntu.com/ubuntu focal/main amd64 Packages
        100 /var/lib/dpkg/status

Steps to reproduce:
1. Ubuntu 20.04 clean install
2. `apt install realmd sssd sssd-tools libnss-sss libpam-sss adcli samba-common-bin`
3. `realm join DOMAIN.NAME`
4. Enable makehomedir by command: `pam-auth-update`
5. `apt install cifs-utils`
6. `apt install autofs`
7. Add next line inside [domain/DOMAIN.EXT] section into /etc/sssd/sssd.conf: `krb5_ccname_template = FILE:%d/krb5cc_%U`
8. Reboot
9. Login as domain user and try to open directory, mounted by `autofs` (in my configuration shares are provided by AD).
10. `autofs.service` stops with the error above about 1 time out of 5 (not always).

Found workaround:
Add `Restart=always` into `[Service]` section in `/lib/systemd/system/autofs.service` file (in other words configure auto-restart on failures for autofs service).

Attachments:
1. Full log of `automount -f -d` command.

Revision history for this message
Michael Andreev (michael-andreev) wrote :
Robie Basak (racb)
tags: added: regression-release
Changed in autofs (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
tags: added: rls-ff-incoming
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Instead of using autofs, have you tried to simply use systemd .automount units which should be able to achieve similar behaviour?

Furthermore, .automount units might be better, as they also could be use not only as _system_ amountmount units but as _user_ systemd .automount users.

See http://manpages.ubuntu.com/manpages/focal/en/man5/systemd.automount.5.html

Revision history for this message
Michael Andreev (michael-andreev) wrote :

No, I didn't try to use systemd .automount units. Unfortunately it doesn't fit to me, because my AutoFS Maps are served by Active Directory (Samba AD on Debian 10). Then Linux clients receive these maps from AD and mount them dynamically (I don't configure mounts on each particular PC). As I understand systemd .automount units doesn't allow to get mounts list from Active Directory.

There's some manual that describes configuration of AutoFS Maps in AD (very similar to my configuration):
https://care.qumulo.com/hc/en-us/articles/115014470007-Active-Directory-AutoFS-maps-to-AD-bound-Linux-clients-with-SSSD

Revision history for this message
Michael Andreev (michael-andreev) wrote :

Just FYI, it works correctly on Ubuntu 18.04 and Fedora 32.

Changed in autofs (Ubuntu):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
tags: added: id-5ef4c2690a8fc93823bfb457
Changed in autofs (Ubuntu Focal):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
tags: removed: rls-ff-incoming
Steve Langasek (vorlon)
Changed in autofs (Ubuntu Focal):
importance: Undecided → Medium
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (4.4 KiB)

From systemd 245 release notes (https://lwn.net/Articles/814068/):

----
  * A new component "userdb" has been added, along with a small daemon
          "systemd-userdb.service" and a client tool "userdbctl". The framework
          allows defining rich user and group records in a JSON format,
          extending on the classic "struct passwd" and "struct group"
          structures. Various components in systemd have been updated to
          process records in this format, including systemd-logind and
          pam-systemd. The user records are intended to be extensible, and
          allow setting various resource management, security and runtime
          parameters that shall be applied to processes and sessions of the
          user as they log in. This facility is intended to allow associating
          such metadata directly with user/group records so that they can be
          produced, extended and consumed in unified form. We hope that
          eventually frameworks such as sssd will generate records this way, so
          that for the first time resource management and various other
          per-user settings can be configured in LDAP directories and then
          provided to systemd (specifically to systemd-logind and pam-system)
          to apply on login. For further details see:

          https://systemd.io/USER_RECORD
          https://systemd.io/GROUP_RECORD
          https://systemd.io/USER_GROUP_API
----

and yet we don't have userdbctl tool or the daemon

https://www.freedesktop.org/software/systemd/man/userdbctl.html

looks like an ongoing effort of unifying user/group information coming from
pam-systemd to logind management scheme within systemd.

I believe making all information coming from pam-systemd to logind available
through this varlink interface is what is causing the issue and where the problem
relies.

----

Nevertheless...

Error is coming from the userdb codeset, from the assertion:

        assert_se(set_remove(iterator->links, link) == link);

when userdb code is being called by the varlink protocol.

Many subsystems within systemd now have an embedded varlink server to provide
IPC through simple json protocol. The journal daemon creates a varlink server on its
own through systemd-journald -> server_init -> server_open_varlink() ->
varlink_server_listen_fd() being one example.

The execution path for this error is either coming from:

(1)

process_connection() -> varlink_process() -> varlink_dispatch_reply() -> reply_callback()

and the reply_callback is a pointer to userdb_on_query_reply(), since this callback is set with varlink_bind_reply().

if (IN_SET(v->state, VARLINK_AWAITING_REPLY, VARLINK_AWAITING_REPLY_MORE)) {
    varlink_set_state(v, VARLINK_PROCESSING_REPLY);

if (v->reply_callback)
    r = v->reply_callback(v, parameters, error, flags, v->userdata)

OR

(2) from an error coming from:

varlink_dispatch_disconnect()
varlink_dispatch_method()
varlink_dispatch_reply()
varlink_dispatch_timeout()

all of them calling varlink_dispatch_local_error().

These errors come from varlink_process() main logic, processing the varlink protocol.

- A timeout in connection would trigger varlink_dispat...

Read more...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

With that said... perhaps if you can reproduce this easily you could reproduce with:

systemd.log_level=debug systemd.log_target=syslog systemd.dump_core=true

added to your kernel cmdline, and provide me related debug text.

And also, updating your systemd to latest in Focal:

245.4-4ubuntu3.2

and providing me the core dump from the user session abort() generated by:

"Assertion 'set_remove(iterator->links, link) == link' failed at src/shared/userdb.c:314, function userdb_on_query_reply(). Aborting.

Aborted (core dumped)"

That would help me out trying to identify the issue.

Changed in autofs (Ubuntu Focal):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in autofs (Ubuntu):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in autofs (Ubuntu Focal):
status: New → Triaged
Changed in systemd (Ubuntu):
status: New → Triaged
Changed in systemd (Ubuntu Focal):
status: New → Triaged
Changed in autofs (Ubuntu Focal):
importance: Medium → Undecided
Changed in autofs (Ubuntu):
importance: Medium → Undecided
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I'll continue to reply here as you continue to provide me feedback about this issue...

Thank you.

-rafaeldtinoco

Revision history for this message
Michael Andreev (michael-andreev) wrote :

Thank you a lot for this investigation, I'll try to reproduce with these parameters and with new version on this week.

tags: added: fr-331
Revision history for this message
Michael Andreev (michael-andreev) wrote :

This error has gone with the latest updates, but, unfortunately, I still have the same transient issues with autofs. I see some other strange things in logs and will continue investigation on my side. I think this ticket can be closed (I will open a new ticket if any). Thank you a lot.

Revision history for this message
Paride Legovini (paride) wrote :

Hello Michael, thanks for the followup and for filing the bug in the first place.

For the moment I'm changing the status of this report to Incomplete across the packages/series it targets. If this specific issue can't be reproduced anymore please set the statuses to Invalid (I'd prefer it to Fix Released as we didn't identify what actually fixed it), and go ahead filing a new bug for the remaining issues you're facing. On the other hand if you still think this should be investigated please comment back with your findings, change the bug status back to New and we'll look at it again. Thanks!

Changed in autofs (Ubuntu):
status: Triaged → Incomplete
Changed in autofs (Ubuntu Focal):
status: Triaged → Incomplete
Changed in systemd (Ubuntu):
status: Triaged → Incomplete
Changed in systemd (Ubuntu Focal):
status: Triaged → Incomplete
Changed in autofs (Ubuntu):
status: Incomplete → Invalid
Changed in autofs (Ubuntu Focal):
status: Incomplete → Invalid
Changed in systemd (Ubuntu):
status: Incomplete → Invalid
Changed in systemd (Ubuntu Focal):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.