hostname not resolvable on azure

Bug #1202758 reported by Scott Moser
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
Medium
Scott Moser
cloud-init (Ubuntu)
Fix Released
Medium
Scott Moser

Bug Description

$ sudo hostname
sudo: unable to resolve host smfoo3
smfoo3

I think that this is because we've disabled MonitorHostName in walinuxagent.

that code would actually:
 * get the hostname of the system
 * monitor every X seconds for changs in the output of 'hostname'
 * if achange was found, it would bounce the eth0 interface (ifdown eth0; ifup eth0) in order to have dhclient convey the hostname to the server.

the odd part of that is that the platform what *gave* us the hostname (inside the ovf-env.xml).

it woudl seem like cloud-init should do something like that also.

ProblemType: Bug
DistroRelease: Ubuntu 13.10
Package: cloud-init 0.7.3~bzr829-0ubuntu1
ProcVersionSignature: User Name 3.10.0-3.12-generic 3.10.1
Uname: Linux 3.10.0-3-generic x86_64
ApportVersion: 2.11-0ubuntu1
Architecture: amd64
Date: Thu Jul 18 17:13:29 2013
MarkForUpload: True
PackageArchitecture: all
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: cloud-init
UpgradeStatus: No upgrade log present (probably fresh install)

Related branches

Revision history for this message
Scott Moser (smoser) wrote :
Scott Moser (smoser)
Changed in cloud-init (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Scott Moser (smoser) wrote :

I really have no good ideas on how to do this.
The simplist solution I think woudl be to have a tool like:
 dhcp-advertize-hostname --interface=eth0 --hostname=foo

and just invoke it. That tool would just do a dhcp request with the hostname set. I'm just not familiar enough with how dhcp works to say if we can actually do that or not. It seems at very least that we'd risk invalidating the lease.

This code would only be run on azure, so that would minimize the issue.

bouncing the interface like walinuxagent does is really less than ideal.

Revision history for this message
Scott Moser (smoser) wrote :

fixed in revno 847.

Changed in cloud-init:
status: New → Fix Committed
importance: Undecided → Medium
assignee: nobody → Scott Moser (smoser)
Changed in cloud-init (Ubuntu):
assignee: nobody → Scott Moser (smoser)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.3~bzr849-0ubuntu2

---------------
cloud-init (0.7.3~bzr849-0ubuntu2) saucy; urgency=low

  * debian/control: fix bad dependency on python-jsonpatch
    by build-depending on python-json-patch, so dh_python2
    can find the right package (LP: #1205358).
 -- Scott Moser <email address hidden> Fri, 26 Jul 2013 10:47:59 -0400

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Scott Moser (smoser) wrote :

ok. just verified as fixed inside saucy daily image on azure.

I'm not sure if this is an outlier or not, but the cloud-init log shows that bouncing the interface took 22 seconds.

| 2013-07-28 23:13:45,164 - DataSourceAzure.py[DEBUG]: pubhname: publishing hostname [phostname=ubuntu hostname=smoser0728s policy=True interface=eth0]
| 2013-07-28 23:13:45,164 - util.py[DEBUG]: Running command ['sh', '-xc', 'i=$interface; x=0; ifdown $i || x=$?; ifup $i || x=$?; exit $x'] with allowed return codes [0] (shell=False, capture=True)
| 2013-07-28 23:14:07,412 - DataSourceAzure.py[DEBUG]: output: . err: + i=eth0
| + x=0
| + ifdown eth0
| Internet Systems Consortium DHCP Client 4.2.4
| Copyright 2004-2012 Internet Systems Consortium.
| All rights reserved.
| For info, please visit https://www.isc.org/software/dhcp/
|
| Listening on LPF/eth0/00:15:5d:47:7a:14
| Sending on LPF/eth0/00:15:5d:47:7a:14
| Sending on Socket/fallback
| DHCPRELEASE on eth0 to 10.74.0.146 port 67 (xid=0x1646c8d7)
| + ifup eth0
| Cannot change scatter-gather
| Internet Systems Consortium DHCP Client 4.2.4
| Copyright 2004-2012 Internet Systems Consortium.
| All rights reserved.
| For info, please visit https://www.isc.org/software/dhcp/
|
| Listening on LPF/eth0/00:15:5d:47:7a:14
| Sending on LPF/eth0/00:15:5d:47:7a:14
| Sending on Socket/fallback
| DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3 (xid=0x1cbb3cc0)
| DHCPREQUEST of 10.74.52.4 on eth0 to 255.255.255.255 port 67 (xid=0x1cbb3cc0)
| DHCPOFFER of 10.74.52.4 from 10.74.52.1
| DHCPACK of 10.74.52.4 from 10.74.52.1
| bound to 10.74.52.4 -- renewal in 4294967295 seconds.
| + exit 0
|
| 2013-07-28 23:14:07,413 - DataSourceAzure.py[DEBUG]: invoking agent: ['service', 'walinuxagent', 'start']

I'm not really sure what to think about that. we basically added 18 seconds to boot in order to publish the hostname :-(.
I wonder if there is any other way to do this.

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

Ugh. Bouncing the interface seems ripe for problems, although that should be mitigated by the fact that services won't be up. I will pursue talking to MS about seeing if there is a better way to do this.

Revision history for this message
Scott Moser (smoser) wrote :

Interesting, i launched 10 instances and grepped cloud-init timestamps.
it seems pretty reliable that the ifdownup cycle is reported to take between 12 and 22 seconds.

$ cat /etc/cloud/cloud.cfg.d/my.cfg
datasource:
 Azure:
  hostname_bounce:
   # policy can be 'on', 'off' or 'force'
   policy: force
   # the method 'bounce' command.
   command: ["bash", "-xc", 'date; cat /proc/uptime; time echo ifdown $interface; cat /proc/uptime; date; cat /proc/uptime; time echo ifup $interface; cat /proc/uptime; date']

I put the above in that file, and then
echo "ubuntu" | time sudo sh -c 'tee /etc/hostname; hostname -F /etc/hostname; ifdown eth0; ifup eth0;' ; sudo rm -Rf /var/lib/cloud/ /var/log/cloud-init.log

and reboot.

The issue seems to be ntpdate somehow getting in the way, although I'm not exactly sure how.
If I put the .cfg file in place above, and then do:

chmod ugo-x /etc/network/if-up.d/ntpdate
sudo rm /var/lib/cloud/instance/obj.pkl ; time sudo cloud-init init
..
real 0m1.273s
user 0m0.791s
sys 0m0.257s

then this will take wall clock ~1.2 seconds.
doing the same with execute permissions on that file takes much longer.
$ sudo chmod ugo+x /etc/network/if-up.d/ntpdate; sudo rm /var/lib/cloud/instance/obj.pkl ; time sudo cloud-init init
...
real 0m7.689s
user 0m0.804s
sys 0m0.267s

just for reference, actually counting in my head, there is a measurable difference in time (ie, its not just a clock-getting-set-while-counting-thing).

You can run the above without rebooting too, and you'll see the issue.

Revision history for this message
Scott Moser (smoser) wrote :

I'm going to fix this issue simply by not capturing output. I had done that initially because there is no /dev/console output on azure. However, if the user wants to see output, they should just have their supplied command be 'sh' and redirect to a file.

Revision history for this message
Scott Moser (smoser) wrote :

fixed in 0.7.3

Changed in cloud-init:
status: Fix Committed → Fix Released
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.