Scheduled events (e.g. LDAP connection retries) are affected by clock adjustments

Bug #1641875 reported by Victor Tapia
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sssd (Ubuntu)
Fix Released
Medium
Victor Tapia
Trusty
Fix Released
Medium
Victor Tapia
Xenial
Fix Released
Medium
Victor Tapia
Yakkety
Fix Released
Medium
Victor Tapia
Zesty
Fix Released
Medium
Victor Tapia

Bug Description

[Impact]

 * SSSD is affected by clock shifts because it's built on libtevent, that doesn't use monotonic clocks.
 * After an event has been scheduled, if the time drifts to the past SSSD won't recover and the event will have to wait the shifted time to be executed.

[Test Case]

 * Modify the /etc/hosts file to force a failed resolution and restart sssd.
 * Within the first ~1 minute retry window, change the date (date --set) to a previous time (e.g. 2h).
 * "Tail" the log file. The event will be executed at the original schedule, past those 2h.

[Regression Potential]

 * None

[Other Info]

 * Version 1.15 implements a watchdog that detects time shifts and resets itselft but doesn't reschedule the scheduled events.
 * There's a network monitor based on netlink that detects interface changes and restarts the providers, rescheduling the scheduled events.
 * Such restart can be triggered with SIGUSR2. Sending that signal after the watchdog restarts fixes this issue.
 * Upstream bug: https://fedorahosted.org/sssd/ticket/3285

[Original Description]

When SSSD fails to connect to a provider (LDAP, for instance) it creates a timed event with tevent_add_timer() in order to retry in ~1 min. Tevent relies on CLOCK_REALTIME, using absolute epoch time, so when the time changes (e.g. NTP sync) the scheduled event is affected.

Reproducer:

1. Modify the /etc/hosts file to force a failed resolution and restart sssd

(Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'ldap' in DNS
(Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [resolv_gethostbyname_done] (0x0040): querying hosts database failed [5]: Input/output error
(Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not contact DNS servers
...
(Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [check_online_callback] (0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline]

2. Within the ~1 minute window, change the date (date --set) to a previous time (2 hour in my example). Note: if /etc/resolv.conf or a network interface is modified, SSSD providers will restart, but the scheduled retry will be kept.

(Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [recreate_ares_channel] (0x0100): Initializing new c-ares channel
(Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [recreate_ares_channel] (0x0100): Destroying the old c-ares channel
(Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [set_server_common_status] (0x0100): Marking server 'ldap' as 'name not resolved'
(Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'ldap' as 'neutral'
(Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
...
(Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not contact DNS servers
(Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [set_server_common_status] (0x0100): Marking server 'ldap' as 'not working'
...
(Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [check_online_callback] (0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline]

3. Once it gets to the programmed date (2 hours later), it retries:

(Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [set_server_common_status] (0x0100): Marking server 'ldap' as 'name not resolved'
(Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'ldap' as 'neutral'
(Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
...
(Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not contact DNS servers
(Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [set_server_common_status] (0x0100): Marking server 'ldap' as 'not working'
...
(Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [check_online_callback] (0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline]

Victor Tapia (vtapia)
Changed in sssd (Ubuntu Trusty):
assignee: nobody → Victor Tapia (vtapia)
Changed in sssd (Ubuntu Xenial):
assignee: nobody → Victor Tapia (vtapia)
Changed in sssd (Ubuntu Yakkety):
assignee: nobody → Victor Tapia (vtapia)
Changed in sssd (Ubuntu Zesty):
assignee: nobody → Victor Tapia (vtapia)
Victor Tapia (vtapia)
description: updated
Victor Tapia (vtapia)
Changed in sssd (Ubuntu Trusty):
importance: Undecided → Medium
Changed in sssd (Ubuntu Yakkety):
importance: Undecided → Medium
Changed in sssd (Ubuntu Zesty):
importance: Undecided → Medium
Changed in sssd (Ubuntu Xenial):
importance: Undecided → Medium
tags: added: sts-sponsor sts-sru
Revision history for this message
Victor Tapia (vtapia) wrote :
Louis Bouchard (louis)
Changed in sssd (Ubuntu Zesty):
status: New → In Progress
Changed in sssd (Ubuntu Yakkety):
status: New → Triaged
Changed in sssd (Ubuntu Xenial):
status: New → Triaged
Changed in sssd (Ubuntu Trusty):
status: New → Triaged
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sssd - 1.15.0-3ubuntu3

---------------
sssd (1.15.0-3ubuntu3) zesty; urgency=medium

  * d/p/restart_providers_on_timeshift.patch: Use SIGUSR2 after watchdog
    detects time shift to execute pending scheduled tasks that could
    be stuck (LP: #1641875)

 -- Victor Tapia <email address hidden> Wed, 15 Feb 2017 17:05:05 +0100

Changed in sssd (Ubuntu Zesty):
status: In Progress → Fix Released
Revision history for this message
Victor Tapia (vtapia) wrote :
Revision history for this message
Victor Tapia (vtapia) wrote :
Revision history for this message
Louis Bouchard (louis) wrote :

Fixed debdiff used for sponsorship (fixed versionning to stay coherent with other releases)

tags: removed: sts-sponsor
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Victor, or anyone else affected,

Accepted sssd into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/sssd/1.13.4-3ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in sssd (Ubuntu Yakkety):
status: Triaged → Fix Committed
tags: added: verification-needed
Changed in sssd (Ubuntu Xenial):
status: Triaged → Fix Committed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Victor, or anyone else affected,

Accepted sssd into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/sssd/1.13.4-1ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in sssd (Ubuntu Trusty):
status: Triaged → Fix Committed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Victor, or anyone else affected,

Accepted sssd into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/sssd/1.11.8-0ubuntu0.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Victor Tapia (vtapia) wrote :
Revision history for this message
Victor Tapia (vtapia) wrote :

# VERIFICATION FOR X&Y

After running the following script (repro.sh), the log shows the LDAP resolution is rescheduled:

#!/bin/bash

change_time() {
        sudo systemctl stop systemd-timesyncd
        HOUR=$(date +%H)
        MIN=$(date +%M)
        NEW=$((10#$HOUR - 1))

        sudo date --set "${NEW}:${MIN}"
        echo "Setting time to ${NEW}:${MIN}"
}

sudo systemctl restart sssd
sleep 20
change_time

Excerpts from /var/log/sssd/sssd_openstacklocal.log:

(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [get_server_status] (0x1000): Status of server 'ldap' is 'not working'
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [get_server_status] (0x1000): Status of server 'ldap' is 'not working'
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0020): No available servers for service 'LDAP'
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [be_resolve_server_done] (0x1000): Server resolution failed: 5
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error])
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [be_mark_offline] (0x2000): Going offline!
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [be_mark_offline] (0x2000): Initialize check_if_online_ptask.
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [be_ptask_create] (0x0400): Periodic task [Check if online (periodic)] was created
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 63 seconds from now [1488462456]
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [be_run_offline_cb] (0x0080): Going offline. Running callbacks.
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [sdap_id_op_connect_done] (0x4000): notify offline to op #1
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [sdap_dom_enum_ex_connected] (0x0400): Backend is marked offline, retry later!
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [be_ptask_done] (0x0400): Task [enumeration]: finished successfully
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [be_ptask_schedule] (0x0400): Task [enumeration]: scheduling task 300 seconds from last execution time [1488462691]

(Thu Mar 2 12:46:00 2017) [sssd[be[openstacklocal]]] [watchdog_fd_read_handler] (0x0040): Time shift detected, restarting watchdog!
* Some initialization log entries
(Thu Mar 2 12:46:40 2017) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
(Thu Mar 2 12:46:40 2017) [sssd[be[openstacklocal]]] [get_server_status] (0x1000): Status of server 'ldap' is 'name not resolved'
(Thu Mar 2 12:46:40 2017) [sssd[be[openstacklocal]]] [get_port_status] (0x1000): Port status of port 389 for server 'ldap' is 'neutral'
(Thu Mar 2 12:46:40 2017) [sssd[be[openstacklocal]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds

tags: added: verification-done-xenial verification-done-yakkety
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Victor, or anyone else affected,

Accepted sssd into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/sssd/1.11.8-0ubuntu0.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Victor Tapia (vtapia) wrote :

# VERIFICATION FOR T

Same script as for X&Y, works as expected:

(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [get_server_status] (0x1000): Status of server 'ldap' is 'not working'
(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [get_server_status] (0x1000): Status of server 'ldap' is 'not working'
(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0020): No available servers for service 'LDAP'
(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [be_resolve_server_done] (0x1000): Server resolution failed: 5
(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error])
(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [be_mark_offline] (0x2000): Going offline!
(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [be_ptask_create] (0x0400): Periodic task [Check if online (periodic)] was created
(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 69 seconds from now [1488774754]
...
(Mon Mar 6 03:31:00 2017) [sssd[be[openstacklocal]]] [watchdog_fd_read_handler] (0x0040): Time shift detected, restarting watchdog!
...
(Mon Mar 6 03:31:41 2017) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
(Mon Mar 6 03:31:41 2017) [sssd[be[openstacklocal]]] [get_server_status] (0x1000): Status of server 'ldap' is 'name not resolved'
(Mon Mar 6 03:31:41 2017) [sssd[be[openstacklocal]]] [get_port_status] (0x1000): Port status of port 389 for server 'ldap' is 'neutral'
(Mon Mar 6 03:31:41 2017) [sssd[be[openstacklocal]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds
(Mon Mar 6 03:31:41 2017) [sssd[be[openstacklocal]]] [get_server_status] (0x1000): Status of server 'ldap' is 'name not resolved'
(Mon Mar 6 03:31:41 2017) [sssd[be[openstacklocal]]] [resolv_is_address] (0x4000): [ldap] does not look like an IP address

tags: added: verification-done
removed: verification-done-xenial verification-done-yakkety verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sssd - 1.13.4-3ubuntu0.1

---------------
sssd (1.13.4-3ubuntu0.1) yakkety; urgency=medium

  * d/p/restart_providers_on_timeshift.patch: Implement watchdog and
    use SIGUSR2 after watchdog detects time shift to execute pending
    scheduled tasks that could be stuck (LP: #1641875)

 -- Victor Tapia <email address hidden> Thu, 23 Feb 2017 11:20:43 +0100

Changed in sssd (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for sssd has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sssd - 1.13.4-1ubuntu1.2

---------------
sssd (1.13.4-1ubuntu1.2) xenial; urgency=medium

  * d/p/restart_providers_on_timeshift.patch: Implement watchdog and
    use SIGUSR2 after watchdog detects time shift to execute pending
    scheduled tasks that could be stuck (LP: #1641875)

 -- Victor Tapia <email address hidden> Thu, 23 Feb 2017 11:46:31 +0100

Changed in sssd (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sssd - 1.11.8-0ubuntu0.5

---------------
sssd (1.11.8-0ubuntu0.5) trusty; urgency=medium

  * d/p/BUILD-Fix-linking-with-librt.patch: Upstream fix for FTBS on
    ppc64el/arm64 after the implementation of timer functions in watchdog
    (LP: #1641875).

 -- Victor Tapia <email address hidden> Tue, 28 Feb 2017 10:20:19 +0100

Changed in sssd (Ubuntu Trusty):
status: Fix Committed → Fix Released
Louis Bouchard (louis)
tags: added: sts-sru-done
removed: sts-sru
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

the patch has not been sent upstream yet?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.