More graceful error needed when using flat mode and --flat_injected with incompatible guests

Bug #678395 reported by guanxiaohua2k6
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Low
Unassigned

Bug Description

When I start an instance, the following error happened.

2010-11-21 21:31:51-0800 [-] (root): ERROR instance instance-2147483647: Failed to spawn
2010-11-21 21:31:51-0800 [-] Traceback (most recent call last):
2010-11-21 21:31:51-0800 [-] File "/usr/lib/pymodules/python2.6/nova/compute/manager.py", line 135, in run_instance
2010-11-21 21:31:51-0800 [-] yield self.driver.spawn(instance_ref)
2010-11-21 21:31:51-0800 [-] File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 891, in _inlineCallbacks
2010-11-21 21:31:51-0800 [-] result = result.throwExceptionIntoGenerator(g)
2010-11-21 21:31:51-0800 [-] File "/usr/lib/python2.6/dist-packages/twisted/python/failure.py", line 338, in throwExceptionIntoGenerator
2010-11-21 21:31:51-0800 [-] return g.throw(self.type, self.value, self.tb)
2010-11-21 21:31:51-0800 [-] File "/usr/lib/pymodules/python2.6/nova/virt/libvirt_conn.py", line 329, in spawn
2010-11-21 21:31:51-0800 [-] yield self._create_image(instance, xml)
2010-11-21 21:31:51-0800 [-] File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 891, in _inlineCallbacks
2010-11-21 21:31:51-0800 [-] result = result.throwExceptionIntoGenerator(g)
2010-11-21 21:31:51-0800 [-] File "/usr/lib/python2.6/dist-packages/twisted/python/failure.py", line 338, in throwExceptionIntoGenerator
2010-11-21 21:31:51-0800 [-] return g.throw(self.type, self.value, self.tb)
2010-11-21 21:31:51-0800 [-] File "/usr/lib/pymodules/python2.6/nova/virt/libvirt_conn.py", line 466, in _create_image
2010-11-21 21:31:51-0800 [-] execute=execute)
2010-11-21 21:31:51-0800 [-] File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 893, in _inlineCallbacks
2010-11-21 21:31:51-0800 [-] result = g.send(result)
2010-11-21 21:31:51-0800 [-] File "/usr/lib/pymodules/python2.6/nova/compute/disk.py", line 147, in inject_data
2010-11-21 21:31:51-0800 [-] yield _inject_net_into_fs(net, tmpdir, execute=execute)
2010-11-21 21:31:51-0800 [-] ProcessExecutionError: Unexpected error while running command.
2010-11-21 21:31:51-0800 [-] Command: sudo tee /var/lib/nova/tmp/tmpuaJWRG/etc/network/interfaces
2010-11-21 21:31:51-0800 [-] Exit code: 1
2010-11-21 21:31:51-0800 [-] Stdout: '# This file describes the network interfaces available on your system\n# and how to activate them. For more information, see inter
faces(5).\n\n# The loopback network interface\nauto lo\niface lo inet loopback\n\n# The primary network interface\nauto eth0\niface eth0 inet static\n address 10
.0.0.2\n netmask 255.255.255.240\n broadcast 10.0.0.15\n gateway 10.0.0.1\n dns-nameservers 8.8.4.4\n\n\n'
2010-11-21 21:31:51-0800 [-] Stderr: 'tee: /var/lib/nova/tmp/tmpuaJWRG/etc/network/interfaces: No such file or directory\n'

After I saw the source, there may be something wrong with the source "/usr/lib/pymodules/python2.6/nova/compute/disk.py" . In method _inject_net_into_fs(), the source is as following.

def _inject_net_into_fs(net, fs, execute=None):
    netfile = os.path.join(os.path.join(os.path.join(
            fs, 'etc'), 'network'), 'interfaces')
    yield execute('sudo tee %s' % netfile, net)

Before tee command is executed, the folder was not created. So the error message "No such file or directory" was thrown. Therefore, it will be needed to create folder before tee.

Revision history for this message
guanxiaohua2k6 (guanxiaohua2k6) wrote :

The version of nova is 2011.1~bzr397-0ubuntu0ppa1~maverick1.

Also I attached patch to fix the bug. Please confirm it.

Revision history for this message
Soren Hansen (soren) wrote :

I'm not entirely sure what to think about this.

If /etc/network doesn't exist, you're not dealing with a Debian based distro, so the injected network configuration won't work anyway. Just creating /etc/network might mask the problem, but you'll still end up with an instance you can't access.

Revision history for this message
guanxiaohua2k6 (guanxiaohua2k6) wrote :

 Firstly this bug is related with https://bugs.launchpad.net/nova/+bug/678393.

After I installed nova on multiple machine, and created networks using "nova-mange network create ...", I failed with bug 678393. So I updated the column bridge of table networks manually. And I tried start an instance again, it failed with the messages as this bug described.

I have read the source of disk.py, it is clear that folder "/var/lib/nova/tmp/tmpuaJWRG/etc/network/" wasn't created in method _inject_net_into_fs(). So it caused the following command "tee" failed.

Contrast to _inject_net_into_fs(), in the method _inject_key_into_fs() just above, the corresponding folder is created before command tee is executed. As a reference, paste the code in following.

def _inject_key_into_fs(key, fs, execute=None):
    sshdir = os.path.join(os.path.join(fs, 'root'), '.ssh')
    yield execute('sudo mkdir -p %s' % sshdir) # existing dir doesn't matter
    yield execute('sudo chown root %s' % sshdir)
    yield execute('sudo chmod 700 %s' % sshdir)
    keyfile = os.path.join(sshdir, 'authorized_keys')
    yield execute('sudo tee -a %s' % keyfile, '\n' + key.strip() + '\n')

Revision history for this message
Soren Hansen (soren) wrote : Re: [Bug 678395] Re: Command "sudo tee /var/lib/nova/tmp/tmpuaJWRG/etc/network/interfaces" failed when starting an instance

2010/11/22 guan <email address hidden>:
>  Firstly this bug is related with
> https://bugs.launchpad.net/nova/+bug/678393.
>
> After I installed nova on multiple machine, and created networks using
> "nova-mange network create ...", I failed with bug 678393. So I
> updated the column bridge of table networks manually. And I tried
> start an instance again, it failed with the messages as this bug
> described.

This is a completely separate issue. Let's keep them separate.

> I have read the source of disk.py, it is clear that folder
> "/var/lib/nova/tmp/tmpuaJWRG/etc/network/" wasn't created in method
> _inject_net_into_fs(). So it caused the following command "tee"
> failed.

I understand.

> Contrast to _inject_net_into_fs(), in the method _inject_key_into_fs()
> just above, the corresponding folder is created before command tee is
> executed. As a reference, paste the code in following.

I realise. The key injection code is compatible with every Linux distro
that I know of. The location of SSH keys is widely agreed upon. The
location and format of network configuration is not. It varies with the
linux distro. If /etc/network doesn't already exist, it means that
the image does not contain a Debian derived distribution, so injecting a
network configuration that will only work on Debian derived
distributions will not help you at all. So, I'm not sure what to do
about this problem.

We can either create /etc/network first and inject a network
configuration in there that *will* *not* *work*. Or we can just not
attempt to write the network configuration if the directory doesn't
exist. That will also remove the error, but will also result in a
non-functional network.

The really short summary is that the network injection code is a (IMO
dreadful) hack, and how to deal with its failure conditions isn't
obvious. I'm not saying it shouldn't be handled. I just don't know how.

--
Soren Hansen
Ubuntu Developer    http://www.ubuntu.com/
OpenStack Developer http://www.openstack.org/

Revision history for this message
Thierry Carrez (ttx) wrote : Re: Command "sudo tee /var/lib/nova/tmp/tmpuaJWRG/etc/network/interfaces" failed when starting an instance

The combination "Flat mode + --flat_injected +guests not supporting /etc/network/" is not supported. I agree we could more gracefully fail in that case, but that won't make it magically work. If you use guests that don't support /etc/network/interfaces, you should probably run your network node with --flat_injected=false or use another network mode.

If you agree, I'll rename this bug so that it's about a more graceful error handling.

Changed in nova:
status: New → Incomplete
Thierry Carrez (ttx)
summary: - Command "sudo tee /var/lib/nova/tmp/tmpuaJWRG/etc/network/interfaces"
- failed when starting an instance
+ More graceful error needed when using flat mode and --flat_injected with
+ incompatible guests
Changed in nova:
importance: Undecided → Low
status: Incomplete → Confirmed
Revision history for this message
Mark McLoughlin (markmc) wrote :

Creating /etc/network/interfaces if it doesn't exist was fixed long ago; see bug #666554

This bug morphed from that simple issue into needing to handle the case of people trying to use non-Debian images with flat injected

A fix might be for inject_net_into_fs to look at /etc/lsb-release and throw an ImageUnacceptable exception if it's not Debian/Ubuntu

We also need to make sure this exception is handled correctly - i.e. the libvirt driver just logs all exceptions from inject_data() as warnings. With the Xen driver, it looks like the exception would make it all the way back to ComputeManager and the instance spawn would fail correctly

tags: added: low-hanging-fruit
tags: added: flat-networking
Changed in nova:
assignee: nobody → D LALITHA RANI (deevi-rani)
status: Confirmed → Opinion
status: Opinion → In Progress
Changed in nova:
assignee: D LALITHA RANI (deevi-rani) → nobody
Thierry Carrez (ttx)
Changed in nova:
status: In Progress → Confirmed
Revision history for this message
Sankha Narayan Guria (sankha93-4) wrote :

What is the status of this bug? Does the above written code fix it?

I am new to OpenStack and would like to contribute in its development. So if this is open I would like to work on it.

Revision history for this message
Thierry Carrez (ttx) wrote :

I think this bug should be considered fixed, we no longer have a weird error message on non-supported images. That won't make non Debian/Ubuntu/whatever-supports-etc-network-interfaces magically work.

Not a big fan of throwing ImageUnacceptable exception if lsb_release is not Debian/Ubuntu: that could break images that support /e/n/i but overrided lsb_release.

Therefore I think this should be closed as a duplicate of 666554.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.