'timed out' to console message and no instance data found in start

Bug #869492 reported by Scott Moser
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Fix Released
High
Scott Moser

Bug Description

I launched an instance on a nova cloud, and it was unreachable.
Upon investigation of the console log, I found:
| cloud-init start-local running: Thu, 06 Oct 2011 19:57:29 +0000. up 4.02 seconds
| no instance data found in start-local
| ci-info: lo : 1 127.0.0.1 255.0.0.0
| ci-info: eth0 : 1 10.55.60.69 255.255.255.0 02:16:3e:5b:c5:b8
| ci-info: route-0: 0.0.0.0 10.55.60.1 0.0.0.0 eth0 UG
| ci-info: route-1: 10.55.60.0 0.0.0.0 255.255.255.0 eth0 U
| cloud-init start running: Thu, 06 Oct 2011 19:57:29 +0000. up 4.24 seconds
| timed out
| no instance data found in start
| mountall: Event failed
| * Stopping Handle applying cloud-config^[[74G[ OK ]

After a reboot, investigation of cloud-init log showed:
 | 2011-10-06 19:57:29,793 - DataSourceEc2.py[DEBUG]: Searching the following metadata urls: ['http://169.254.169.254']
 | 2011-10-06 19:57:31,797 - __init__.py[DEBUG]: Did not find data source. searched classes: ['DataSourceNoCloudNet', 'DataSourceOVFNet', 'DataSourceEc2']

The 'Did not find' came from cloudinit/__init__.py, meaning DataSourceEc2.py returned.

The strange thing is that it seems DataSourceEc2 did not retry at all. I'm not sure why it would not. I believe that we're simply hitting a timeout on the Metadata service on the attempt to get instance-id. Previously (as seen in bug 745930) this would be handled and retried. The intent was that we would see a long series of retries. We've since added a 'timeout=timeout' where the value of timeout defaults to '2' to the urlopen call. It seems that for whatever reason, we're not catching that timeout and trying again.

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: cloud-init 0.6.1-0ubuntu20
ProcVersionSignature: User Name 3.0.0-12.19-virtual 3.0.4
Uname: Linux 3.0.0-12-virtual x86_64
ApportVersion: 1.23-0ubuntu1
Architecture: amd64
Date: Thu Oct 6 20:18:55 2011
Ec2AMI: ami-000000ac
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: nova
Ec2InstanceType: m1.large
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
PackageArchitecture: all
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: cloud-init
UpgradeStatus: No upgrade log present (probably fresh install)

related bugs:
  * bug 851159: ec2 metadata service is very slow
  * bug 745930: timeout on metadata service in natty after retry

Related branches

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

I can reproduce this from inside this instance with the following program:
#!/usr/bin/python
import urllib2, sys, socket
def doget(iurl, timeout):
    reason = None
    try:
        req = urllib2.Request(iurl)
        resp = urllib2.urlopen(req, timeout=timeout)
    except urllib2.HTTPError as e:
        reason = "http error [%s]" % e.code
    except urllib2.URLError as e:
        reason = "url error [%s]" % e.reason
    except socket.timeout as e:
        reason = "socket timeout [%s]" % e
    if reason:
        print "Fail: %s" % reason

doget(sys.argv[1],float(sys.argv[2]))

I run it and see the following output:
$ ./test-timeout.py http://169.254.169.254/latest/meta-data/instance-id 1
Fail: socket timeout [timed out]

It seams that urllib2 does not catch socket timeouts, so we'll have to add that catch, and I think we should probably up the default timeout.

description: updated
Dave Walker (davewalker)
Changed in cloud-init (Ubuntu):
milestone: none → ubuntu-11.10
tags: added: server-o-rs
Changed in cloud-init (Ubuntu):
status: New → Fix Committed
importance: Undecided → High
assignee: nobody → Scott Moser (smoser)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.6.1-0ubuntu22

---------------
cloud-init (0.6.1-0ubuntu22) oneiric; urgency=low

  * DataSourceEc2: catch a socket timeout when with a slow metadata
    service (LP: #869492).
 -- Scott Moser <email address hidden> Thu, 06 Oct 2011 17:11:29 -0400

Changed in cloud-init (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.