get_data in DataSourceOpenStack.py can time out if metadata service is slow

Bug #1657130 reported by Lars Kellogg-Stedman
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
Medium
Unassigned
cloud-init (Ubuntu)
Fix Released
Medium
Unassigned
Xenial
Fix Released
Medium
Unassigned
Yakkety
Fix Released
Medium
Unassigned

Bug Description

=== Begin SRU Template ===
[Impact]
On heavily loaded openstack metadata services, cloud-init may hit a timeout
and not properly retry when waiting longer or retring would allow it to
succeed.

cloud-init contained a setting to configure this but it was not used in all
cases. The change here enabled usage of timeout and retry for.

[Test Case]
1. Launch an instance on openstack.
2. Verify inconsistent use of 'timeout' in /var/log/cloud-init.log
  $ grep http://169.254.169.254/openstack /var/log/cloud-init.log | grep 0/ | head -n 2
  2017-03-03 16:51:23,824 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'url': 'http://169.254.169.254/openstack', 'allow_redirects': True, 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'method': 'GET', 'timeout': 10.0} configuration
  2017-03-03 16:51:24,384 - url_helper.py[DEBUG]: [0/6] open 'http://169.254.169.254/openstack' with {'url': 'http://169.254.169.254/openstack', 'allow_redirects': True, 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'method': 'GET', 'timeout': 5.0} configuration

3. enable proposed, update, upgrade
4. clean
   rm -Rf /var/lib/cloud /var/log/cloud-init*
5. reboot
6. re-check step 2, expect see 'timeout' is consistent.

[Regression Potential]
low chance for regression. Slower boot times but more reliable on a non-perform
ant metadata service.

=== End SRU Template ===

cloud-init sometimes times out and fails to fetch metadata in the OpenStack environment when the Controller node is under high workload.

The default timeout value is 5 seconds and it may be too small in some cases where the Controller node is too busy to respond to the metadata request from the instance in time.

There is a 'timeout' configuration setting, as in...

  datasource:
    OpenStack:
      timeout: 30

...but this value is not used by the get_data method in cloudinit/sources/DataSourceOpenStack.py, because get_data is called from cloudinit/sources/__init__.py with no keyword arguments:

                LOG.debug("Seeing if we can get any data from %s", cls)
                s = cls(sys_cfg, distro, paths)
                if s.get_data():
                    myrep.message = "found %s data from %s" % (mode, name)
                    return (s, type_utils.obj_name(cls))

Related branches

Revision history for this message
Lars Kellogg-Stedman (larsks) wrote :
Scott Moser (smoser)
Changed in cloud-init:
importance: Undecided → Medium
status: New → Fix Released
Changed in cloud-init (Ubuntu):
status: New → Confirmed
status: Confirmed → Fix Released
importance: Undecided → Medium
Changed in cloud-init (Ubuntu Xenial):
status: New → Confirmed
Changed in cloud-init (Ubuntu Yakkety):
status: New → Confirmed
Changed in cloud-init (Ubuntu Xenial):
importance: Undecided → Medium
Changed in cloud-init (Ubuntu Yakkety):
importance: Undecided → Medium
Changed in cloud-init:
status: Fix Released → Fix Committed
Scott Moser (smoser)
description: updated
Revision history for this message
Chris Halse Rogers (raof) wrote : Please test proposed package

Hello Lars, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-48-g1c795b9-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Xenial):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Chris Halse Rogers (raof) wrote :

Hello Lars, or anyone else affected,

Accepted cloud-init into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-48-g1c795b9-0ubuntu1~16.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Yakkety):
status: Confirmed → Fix Committed
Revision history for this message
Scott Moser (smoser) wrote :

$ dpkg-query --show cloud-init
cloud-init 0.7.9-0ubuntu1~16.04.2
$ lsb_release -sc
xenial
$ grep http://169.254.169.254/openstack /var/log/cloud-init.log | grep 0/ | head -n 2
2017-03-08 19:49:21,111 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'timeout': 10.0, 'url': 'http://169.254.169.254/openstack', 'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'allow_redirects': True} configuration
2017-03-08 19:49:21,580 - url_helper.py[DEBUG]: [0/6] open 'http://169.254.169.254/openstack' with {'timeout': 5.0, 'url': 'http://169.254.169.254/openstack', 'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'allow_redirects': True} configuration
$ rel=$(lsb_release -sc)
$ line=$(awk '$1 == "deb" && $2 ~ /ubuntu.com/ {
> printf("%s %s %s-proposed main universe\n", $1, $2, rel); exit(0) };
> ' "rel=$rel" /etc/apt/sources.list)
$ echo "$line" | sudo tee /etc/apt/sources.list.d/proposed.list
sudo: unable to resolve host xenial-20170308-194839
deb http://nova.clouds.archive.ubuntu.com/ubuntu/ xenial-proposed main universe
$ sudo apt-get update -q && sudo apt-get install cloud-init -q
...
Setting up cloud-init (0.7.9-48-g1c795b9-0ubuntu1~16.04.1) ...

$ dpkg-query --show cloud-init
cloud-init 0.7.9-48-g1c795b9-0ubuntu1~16.04.1

$ sudo rm -Rf /var/log/cloud-init* /var/lib/cloud && sudo reboot
$ sudo reboot

## go back in. Notice the difference here we have the 'timeout' of 10.0 in
## both requests.
$ grep http://169.254.169.254/openstack /var/log/cloud-init.log | grep 0/ | head -n 2
2017-03-08 19:54:55,726 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'timeout': 10.0, 'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'url': 'http://169.254.169.254/openstack', 'allow_redirects': True} configuration
2017-03-08 19:54:56,243 - url_helper.py[DEBUG]: [0/6] open 'http://169.254.169.254/openstack' with {'timeout': 10.0, 'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'url': 'http://169.254.169.254/openstack', 'allow_redirects': True} configuration

Revision history for this message
Scott Moser (smoser) wrote :

## Show the original system, and the issue.
## See that the second request headers below shows 'timeout' of 5.0.
## it should have done both of the requests listed with 10.0

$ dpkg-query --show cloud-init
cloud-init 0.7.9-0ubuntu1~16.10.1
$ lsb_release -sc
yakkety
$ grep http://169.254.169.254/openstack /var/log/cloud-init.log | grep 0/ | head -n 2
2017-03-08 19:49:18,879 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'allow_redirects': True, 'url': 'http://169.254.169.254/openstack', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'method': 'GET', 'timeout': 10.0} configuration
2017-03-08 19:49:19,350 - url_helper.py[DEBUG]: [0/6] open 'http://169.254.169.254/openstack' with {'allow_redirects': True, 'url': 'http://169.254.169.254/openstack', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'method': 'GET', 'timeout': 5.0} configuration
$ rel=$(lsb_release -sc)
$ line=$(awk '$1 == "deb" && $2 ~ /ubuntu.com/ { printf("%s %s %s-proposed main universe\n", $1, $2, rel); exit(0) };' rel=$rel /etc/apt/sources.list)
$ echo "$line" | sudo tee /etc/apt/sources.list.d/proposed.list
deb http://nova.clouds.archive.ubuntu.com/ubuntu/ yakkety-proposed main universe
$ sudo apt-get update -q && sudo apt-get install cloud-init -q
...
Setting up cloud-init (0.7.9-48-g1c795b9-0ubuntu1~16.10.1) ...

$ sudo rm -Rf /var/lib/cloud/ /var/log/cloud-init* && sudo reboot

### go back in see that both requests had timeout 10.0
$ grep http://169.254.169.254/openstack /var/log/cloud-init.log | grep 0/ | head -n 2
2017-03-08 19:59:43,702 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'url': 'http://169.254.169.254/openstack', 'timeout': 10.0, 'allow_redirects': True} configuration
2017-03-08 19:59:44,160 - url_helper.py[DEBUG]: [0/6] open 'http://169.254.169.254/openstack' with {'method': 'GET', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'url': 'http://169.254.169.254/openstack', 'timeout': 10.0, 'allow_redirects': True} configuration

tags: added: verification-done-xenial verification-done-yakkety
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.9-48-g1c795b9-0ubuntu1~16.04.1

---------------
cloud-init (0.7.9-48-g1c795b9-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * debian/rules: install Z99-cloudinit-warnings.sh to /etc/profile.d
  * debian/patches/ds-identify-behavior-xenial.patch: adjust default
    behavior of ds-identify for SRU (LP: #1669675, #1660385).
  * New upstream snapshot.
    - Support warning if the used datasource is not in ds-identify's list
      (LP: #1669675).
    - DatasourceEc2: add warning message when not on AWS. (LP: #1660385)
    - Z99-cloudinit-warnings: Add profile.d script for showing warnings on
    - Z99-cloud-locale-test.sh: convert tabs to spaces, remove unneccesary
      execute bit in permissions.
    - (RedHat) net: correct errors in cloudinit/net/sysconfig.py
      [Lars Kellogg-Stedman]
    - ec2_utils: fix MetadataLeafDecoder that returned bytes on empty
    - Fix eni rendering of multiple IPs per interface [Ryan Harper]
      (LP: #1657940)
    - Add 3 ecdsa-sha2-nistp* ssh key types now that they are standardized
      [Lars Kellogg-Stedman]
    - EC2: Do not cache security credentials on disk [Andrew Jorgensen]
      (LP: #1638312)
    - OpenStack: Use timeout and retries from config in get_data.
      [Lars Kellogg-Stedman] (LP: #1657130)
    - Fixed Misc issues related to VMware customization. [Sankar Tanguturi]
    - (RedHat) Use dnf instead of yum when available [Lars Kellogg-Stedman]
    - Get early logging logged, including failures of cmdline url.
    - test / doc / build environment changes
      - Remove style checking during build and add latest style checks to
        tox [Joshua Powers]
      - code-style: make master pass pycodestyle (2.3.1) cleanly, currently
        [Joshua Powers]
      - Fix small typo and change iso-filename for consistency
      - tools/mock-meta: support python2 or python3 and ipv6 in both.
      - tests: remove executable bit on test_net, so it runs, and fix it.
      - tests: No longer monkey patch httpretty for python 3.4.2
      - reset httppretty for each test [Lars Kellogg-Stedman]
      - build: fix running Make on a branch with tags other than master
      - doc: Fix typos and clarify some aspects of the part-handler
        [Erik M. Bray]
      - doc: add some documentation on OpenStack datasource.
      - Fix minor docs typo: perserve > preserve [Jeremy Bicha]
      - validate-yaml: use python rather than explicitly python3

 -- Scott Moser <email address hidden> Mon, 06 Mar 2017 16:34:10 -0500

Changed in cloud-init (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.9-48-g1c795b9-0ubuntu1~16.10.1

---------------
cloud-init (0.7.9-48-g1c795b9-0ubuntu1~16.10.1) yakkety; urgency=medium

  * debian/rules: install Z99-cloudinit-warnings.sh to /etc/profile.d
  * debian/patches/ds-identify-behavior-yakkety.patch: adjust default
    behavior of ds-identify for SRU (LP: #1669675, #1660385).
  * New upstream snapshot.
    - Support warning if the used datasource is not in ds-identify's list
      (LP: #1669675).
    - DatasourceEc2: add warning message when not on AWS. (LP: #1660385)
    - Z99-cloudinit-warnings: Add profile.d script for showing warnings on
    - Z99-cloud-locale-test.sh: convert tabs to spaces, remove unneccesary
      execute bit in permissions.
    - (RedHat) net: correct errors in cloudinit/net/sysconfig.py
      [Lars Kellogg-Stedman]
    - ec2_utils: fix MetadataLeafDecoder that returned bytes on empty
    - Fix eni rendering of multiple IPs per interface [Ryan Harper]
      (LP: #1657940)
    - Add 3 ecdsa-sha2-nistp* ssh key types now that they are standardized
      [Lars Kellogg-Stedman]
    - EC2: Do not cache security credentials on disk [Andrew Jorgensen]
      (LP: #1638312)
    - OpenStack: Use timeout and retries from config in get_data.
      [Lars Kellogg-Stedman] (LP: #1657130)
    - Fixed Misc issues related to VMware customization. [Sankar Tanguturi]
    - (RedHat) Use dnf instead of yum when available [Lars Kellogg-Stedman]
    - Get early logging logged, including failures of cmdline url.
    - test / doc / build environment changes
      - Remove style checking during build and add latest style checks to
        tox [Joshua Powers]
      - code-style: make master pass pycodestyle (2.3.1) cleanly, currently
        [Joshua Powers]
      - Fix small typo and change iso-filename for consistency
      - tools/mock-meta: support python2 or python3 and ipv6 in both.
      - tests: remove executable bit on test_net, so it runs, and fix it.
      - tests: No longer monkey patch httpretty for python 3.4.2
      - reset httppretty for each test [Lars Kellogg-Stedman]
      - build: fix running Make on a branch with tags other than master
      - doc: Fix typos and clarify some aspects of the part-handler
        [Erik M. Bray]
      - doc: add some documentation on OpenStack datasource.
      - Fix minor docs typo: perserve > preserve [Jeremy Bicha]
      - validate-yaml: use python rather than explicitly python3

 -- Scott Moser <email address hidden> Mon, 06 Mar 2017 16:37:28 -0500

Changed in cloud-init (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Revision history for this message
Scott Moser (smoser) wrote : Fixed in Cloud-init 17.1

This bug is believed to be fixed in cloud-init in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.