centos-8 multinode and undercloud jobs are hanging on the undercloud install

Bug #1865574 reported by Ronelle Landy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

The hang happens sporadically and is therefore hard to track down but it has been reproduced on multinode and undercloud jobs in both RDO Cloud (periodic) and upstream Zuul (check).

The tripleo undercloud install task fails with:

 2020-03-02 18:25:08.374990 | primary | TASK [tripleo_undercloud_install : undercloud install] *************************
2020-03-02 18:25:08.389970 | primary | Monday 02 March 2020 18:25:08 +0000 (0:00:00.086) 0:06:16.122 **********
2020-03-02 20:30:03.809836 | primary | fatal: [undercloud]: FAILED! => {
2020-03-02 20:30:03.809906 | primary | "changed": true
2020-03-02 20:30:03.809918 | primary | }
2020-03-02 20:30:03.809925 | primary |
2020-03-02 20:30:03.809933 | primary | MSG:
2020-03-02 20:30:03.809940 | primary |
2020-03-02 20:30:03.809947 | primary | async task did not complete within the requested time - 7200s

Example logs:

https://logserver.rdoproject.org/11/25411/9/check/periodic-tripleo-ci-centos-8-singlenode-featureset027-master/b1b44fb/job-output.txt

https://logserver.rdoproject.org/11/25411/9/check/periodic-tripleo-ci-centos-8-singlenode-featureset027-master/b1b44fb/logs/undercloud/home/zuul/undercloud_install.log.txt.gz

Note that it not always the same tasks that times out:
https://logserver.rdoproject.org/11/25411/9/check/periodic-tripleo-ci-centos-8-multinode-1ctlr-featureset010-master/2f4c09a/logs/undercloud/home/zuul/undercloud_install.log.txt.gz

Revision history for this message
Ronelle Landy (rlandy) wrote :
Changed in tripleo:
status: New → Triaged
milestone: none → ussuri-3
importance: Undecided → Critical
tags: added: ci
tags: added: promotion-blocker
Revision history for this message
Alex Schultz (alex-schultz) wrote :
Download full text (4.5 KiB)

I think there might be a few different issues at play. I've seen it fail at the start of the undercloud install where we exec the tripleo deploy command. I've also see the heat-config-notify fail as part of the later software deployments at the end of the undercloud install. I think these might be related to the a keystoneclient -> python3-keyring -> python3-secretstorage -> python3-jeepney issue. What I've seen from the traceback is that it's hanging when trying to communicate with the DBUS.

^CTraceback (most recent call last):
  File "/bin/openstack", line 6, in <module>
    from openstackclient.shell import main
  File "/usr/lib/python3.6/site-packages/openstackclient/shell.py", line 28, in <module>
    from openstackclient.common import clientmanager
  File "/usr/lib/python3.6/site-packages/openstackclient/common/clientmanager.py", line 184, in <module>
    'openstack.cli.base',
  File "/usr/lib/python3.6/site-packages/openstackclient/common/clientmanager.py", line 150, in get_plugin_modules
    __import__(ep.module_name)
  File "/usr/lib/python3.6/site-packages/openstackclient/identity/client.py", line 18, in <module>
    from keystoneclient.v2_0 import client as identity_client_v2
  File "/usr/lib/python3.6/site-packages/keystoneclient/v2_0/__init__.py", line 1, in <module>
    from keystoneclient.v2_0.client import Client # noqa
  File "/usr/lib/python3.6/site-packages/keystoneclient/v2_0/client.py", line 21, in <module>
    from keystoneclient import httpclient
  File "<frozen importlib._bootstrap>", line 1020, in _handle_fromlist
  File "/usr/lib/python3.6/site-packages/keystoneclient/__init__.py", line 72, in __getattr__
    return importlib.import_module('keystoneclient.%s' % name)
  File "/usr/lib64/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/lib/python3.6/site-packages/keystoneclient/httpclient.py", line 43, in <module>
    import keyring
  File "/usr/lib/python3.6/site-packages/keyring/__init__.py", line 1, in <module>
    from .core import (
  File "/usr/lib/python3.6/site-packages/keyring/core.py", line 192, in <module>
    init_backend()
  File "/usr/lib/python3.6/site-packages/keyring/core.py", line 96, in init_backend
    filter(limit, backend.get_all_keyring()),
  File "/usr/lib/python3.6/site-packages/keyring/util/__init__.py", line 22, in wrapper
    func.always_returns = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/keyring/backend.py", line 219, in get_all_keyring
    return list(rings)
  File "/usr/lib/python3.6/site-packages/keyring/util/__init__.py", line 33, in suppress_exceptions
    for callable in callables:
  File "/usr/lib/python3.6/site-packages/keyring/util/properties.py", line 26, in __get__
    return self.fget.__get__(None, owner)()
  File "/usr/lib/python3.6/site-packages/keyring/backend.py", line 68, in viable
    cls.priority
  File "/usr/lib/python3.6/site-packages/keyring/util/properties.py", line 26, in __get__
    return self.fget.__get__(None, owner)()
  File "/usr/lib/python3.6/site-packages/keyring/backends/SecretService.py", line 40, in priority
    bus = secretstorage.dbus_init()
...

Read more...

Revision history for this message
Alex Schultz (alex-schultz) wrote :
Download full text (4.8 KiB)

Here's a reproducer:

[centos@undercloud test]$ cat a.py
#!/usr/bin/env python3
print('a import keystoneclient')
import keystoneclient.v3 as ksclient
import os
import subprocess

def launch():
    print('a launch()')
    p = os.getcwd()
    cmd = ['sudo', '--preserve-env', 'python3', '{}/b.py'.format(p)]
    try:
        print('a subprocess')
        subprocess.check_call(cmd)
    except Exception as e:
        print('a exception')
        raise

if __name__ == '__main__':
    print('a main()')
    launch()
[centos@undercloud test]$ cat b.py
#!/usr/bin/env python3
print('b importing keystone client')
import keystoneclient.v3 as ksclient

def launch():
    print("b launch()")

if __name__ == '__main__':
    print('b main()')
    launch()

This can result in something like this:

[centos@undercloud test]$ ./a.py
a import keystoneclient
a main()
a launch()
a subprocess
b importing keystone client
^CTraceback (most recent call last):
  File "/home/centos/test/b.py", line 3, in <module>
    import keystoneclient.v3 as ksclient
  File "/usr/lib/python3.6/site-packages/keystoneclient/v3/__init__.py", line 2, in <module>
    from keystoneclient.v3.client import Client # noqa
  File "/usr/lib/python3.6/site-packages/keystoneclient/v3/client.py", line 23, in <module>
    from keystoneclient import httpclient
  File "<frozen importlib._bootstrap>", line 1020, in _handle_fromlist
  File "/usr/lib/python3.6/site-packages/keystoneclient/__init__.py", line 72, in __getattr__
    return importlib.import_module('keystoneclient.%s' % name)
  File "/usr/lib64/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/lib/python3.6/site-packages/keystoneclient/httpclient.py", line 43, in <module>
    import keyring
  File "/usr/lib/python3.6/site-packages/keyring/__init__.py", line 1, in <module>
    from .core import (
  File "/usr/lib/python3.6/site-packages/keyring/core.py", line 192, in <module>
    init_backend()
  File "/usr/lib/python3.6/site-packages/keyring/core.py", line 96, in init_backend
    filter(limit, backend.get_all_keyring()),
  File "/usr/lib/python3.6/site-packages/keyring/util/__init__.py", line 22, in wrapper
    func.always_returns = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/keyring/backend.py", line 219, in get_all_keyring
    return list(rings)
  File "/usr/lib/python3.6/site-packages/keyring/util/__init__.py", line 33, in suppress_exceptions
    for callable in callables:
  File "/usr/lib/python3.6/site-packages/keyring/util/properties.py", line 26, in __get__
    return self.fget.__get__(None, owner)()
  File "/usr/lib/python3.6/site-packages/keyring/backend.py", line 68, in viable
    cls.priority
  File "/usr/lib/python3.6/site-packages/keyring/util/properties.py", line 26, in __get__
    return self.fget.__get__(None, owner)()
  File "/usr/lib/python3.6/site-packages/keyring/backends/SecretService.py", line 40, in priority
    bus = secretstorage.dbus_init()
  File "/usr/lib/python3.6/site-packages/secretstorage/__init__.py", line 37, in dbus_init
    connection = connect_and_authenticate()
  File "/usr/lib/python3.6/site-packages...

Read more...

Revision history for this message
Alex Schultz (alex-schultz) wrote :

also substituting 'import keyring' for 'import keystoneclient' also triggers it.

Revision history for this message
Alex Schultz (alex-schultz) wrote :
Download full text (4.2 KiB)

[centos@undercloud test]$ cat a.py
#!/usr/bin/env python3
print('a import keyring')
import keyring
import os
import subprocess

def launch():
    print('a launch()')
    p = os.getcwd()
    cmd = ['sudo', '--preserve-env', 'python3', '{}/b.py'.format(p)]
    try:
        print('a subprocess')
        subprocess.check_call(cmd)
    except Exception as e:
        print('a exception')
        raise

if __name__ == '__main__':
    print('a main()')
    launch()
[centos@undercloud test]$ cat b.py
#!/usr/bin/env python3
print('b importing keyring')
import keyring

def launch():
    print("b launch()")

if __name__ == '__main__':
    print('b main()')
    launch()

[centos@undercloud test]$ ./a.py
a import keyring
a main()
a launch()
a subprocess
b importing keyring
b main()
b launch()
[centos@undercloud test]$ ./a.py
a import keyring
a main()
a launch()
a subprocess
b importing keyring
b main()
b launch()
[centos@undercloud test]$ ./a.py
a import keyring
a main()
a launch()
a subprocess
b importing keyring
b main()
b launch()
[centos@undercloud test]$ ./a.py
a import keyring
a main()
a launch()
a subprocess
b importing keyring
^CTraceback (most recent call last):
  File "/home/centos/test/b.py", line 3, in <module>
    import keyring
  File "/usr/lib/python3.6/site-packages/keyring/__init__.py", line 1, in <module>
    from .core import (
  File "/usr/lib/python3.6/site-packages/keyring/core.py", line 192, in <module>
    init_backend()
  File "/usr/lib/python3.6/site-packages/keyring/core.py", line 96, in init_backend
    filter(limit, backend.get_all_keyring()),
  File "/usr/lib/python3.6/site-packages/keyring/util/__init__.py", line 22, in wrapper
    func.always_returns = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/keyring/backend.py", line 219, in get_all_keyring
    return list(rings)
  File "/usr/lib/python3.6/site-packages/keyring/util/__init__.py", line 33, in suppress_exceptions
    for callable in callables:
  File "/usr/lib/python3.6/site-packages/keyring/util/properties.py", line 26, in __get__
    return self.fget.__get__(None, owner)()
  File "/usr/lib/python3.6/site-packages/keyring/backend.py", line 68, in viable
    cls.priority
  File "/usr/lib/python3.6/site-packages/keyring/util/properties.py", line 26, in __get__
    return self.fget.__get__(None, owner)()
  File "/usr/lib/python3.6/site-packages/keyring/backends/SecretService.py", line 40, in priority
    bus = secretstorage.dbus_init()
  File "/usr/lib/python3.6/site-packages/secretstorage/__init__.py", line 37, in dbus_init
    connection = connect_and_authenticate()
  File "/usr/lib/python3.6/site-packages/jeepney/integrate/blocking.py", line 102, in connect_and_authenticate
    conn = DBusConnection(sock)
  File "/usr/lib/python3.6/site-packages/jeepney/integrate/blocking.py", line 40, in __init__
    hello_reply = self.bus_proxy.Hello()
  File "/usr/lib/python3.6/site-packages/jeepney/integrate/blocking.py", line 84, in inner
    return self._connection.send_and_get_reply(msg)
  File "/usr/lib/python3.6/site-packages/jeepney/integrate/blocking.py", line 67, in send_and_get_reply
    self.recv_messages()
  File "/usr/lib/python3.6/site-packages/jeepn...

Read more...

Revision history for this message
Alex Schultz (alex-schultz) wrote :

It seems to be related to the sudo --preserve-env. Dropping that allows it to continue. Also running without sudo seems to fine as well. Opened https://github.com/mitya57/secretstorage/issues/22

Revision history for this message
Alex Schultz (alex-schultz) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :

<jpena> EmilienM, mwhahaha: so the idea would be to change https://github.com/redhat-openstack/rdoinfo/blob/master/buildsys-tags/cloud8-openstack-ussuri-testing.yml#L188-L190 to use version 2.3.1-7, right?
[11:31:43] <mwhahaha> yea
[11:33:46] <jpena> that's ok for me. I can propose the change, although it will be effective tomorrow (need to pass CI, merge, and then tomorrow morning we'll be updating the repos to include it)
[11:38:52] <rdogerrit> Javier Peña created rdoinfo master: Downgrade python-SecretStorage to 2.3.1-7 https://review.rdoproject.org/r/25647

Revision history for this message
Alan Pevec (apevec) wrote :

python-SecretStorage was downgraded to 2.3.1 in centos8 deps repo, can you confirm and close this issues?

Revision history for this message
Arx Cruz (arxcruz) wrote :
Revision history for this message
Arx Cruz (arxcruz) wrote :

This is the result from elastic-recheck query:

total hits: 7
build_branch
  100% master
build_change
  57% 710179
  14% 710856
  14% 710876
  14% 710961
build_hostids
  14% 3cc53f22ef02ed70c1aac36a118dd7de4761e33f9cbc4c43ca939e06 3784143789c2135c90ac691ca5c91af63e12b8446ede82b95c1a6720
  14% 4e2637211431bc60d4bd5dd39f3b9f2c8dc5e0ceb76440a99339ebc7 1d1ee31b37d05d80ba6e4ac21167cb12908d069825973735268f7d7e
  14% 6501a2934d131c272b705e9ff4ca639f296813cdb277a4e39c01cdae 1a53be2b24e865c47fda3bec00e16f56c25b531dbe4694a4228b05f5
  14% 91c5ef05df17bc837d1a41b804c90487c2cb99686e3829f6fd378ba3
  14% ac33562fdf27151d63b0c44782a112b6fe849380d9d2b504fa966187 ac33562fdf27151d63b0c44782a112b6fe849380d9d2b504fa966187
build_name
  57% tripleo-ci-centos-8-containers-multinode
  14% tripleo-ci-centos-8-containers-undercloud-minion
  14% tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates
  14% tripleo-ci-centos-8-undercloud-containers
build_node
  100% centos-8
build_queue
  100% check
build_status
  100% FAILURE
build_zuul_url
  100% N/A
filename
  100% job-output.txt
log_url
  14% https://1faa1506db9db98d8787-99d0a6bba3a1a8c92ad3774145ae0bff.ssl.cf5.rackcdn.com/710179/6/check/tripleo-ci-centos-8-containers-multinode/a337701/job-output.txt
  14% https://2e3da89ec438fbfff201-485520e0725c5c3e0cb7e09d4d5f1a24.ssl.cf1.rackcdn.com/710179/7/check/tripleo-ci-centos-8-containers-multinode/082dad3/job-output.txt
  14% https://6adeb6214fae12df9c2b-c6bea2bbab8521883465a10c6ac17cf9.ssl.cf2.rackcdn.com/710856/4/check/tripleo-ci-centos-8-undercloud-containers/56e1b45/job-output.txt
  14% https://8c4653dff114d5897250-284808df0480792a2bc66ed6147fab6e.ssl.cf2.rackcdn.com/710179/3/check/tripleo-ci-centos-8-containers-multinode/c67b2bc/job-output.txt
  14% https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_3d7/710961/1/check/tripleo-ci-centos-8-containers-undercloud-minion/3d7f90f/job-output.txt
node_provider
  14% inap-mtl01
  14% limestone-regionone
  14% ovh-bhs1
  14% rax-dfw
  14% rax-iad
port
  14% 36166
  14% 40010
  14% 47078
  14% 50746
  14% 50764
project
  100% openstack/tripleo-ci
tags
  100% job-output.txt console console.html
voting
  71% 1
  28% 0
zuul_attempts
  100% 1
zuul_executor
  28% ze10.openstack.org
  14% ze01.openstack.org
  14% ze03.openstack.org
  14% ze04.openstack.org
  14% ze09.openstack.org

Revision history for this message
wes hayutin (weshayutin) wrote :
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.