instance not started due to libvirt virNetDevGetIndex error

Bug #947771 reported by Han Li
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

The bug is noticed when I started more than 2 instances on the same compute node at the same time. One is shown as 'error', and the other is 'pending' forever and never gets started.

RESERVATION r-p0qdo73p soc-cloud default
INSTANCE i-00000015 ami-00000003 error hli (soc-cloud, soc-2) 0 m1.medium 2012-03-06T06:41:30Z nova aki-00000001 ari-00000002
INSTANCE i-00000016 ami-00000003 pending hli (soc-cloud, soc-2) 1 m1.medium 2012-03-06T06:41:30Z nova aki-00000001 ari-00000002

No error message can be found in /var/log/nova/*.log. However, the libvirtd log file (/var/log/libvirt/libvirtd.log on compute node) tells the cause:

2012-03-06 06:33:16.380+0000: 17728: error : virNetDevGetIndex:656 : Unable to get index for interface vnet3: No such device
2012-03-06 06:37:29.287+0000: 17729: error : virNetDevGetIndex:656 : Unable to get index for interface vnet3: No such device
2012-03-06 06:37:36.098+0000: 17730: error : virNetDevGetIndex:656 : Unable to get index for interface vnet2: No such device
2012-03-06 06:37:54.953+0000: 17727: error : virNetDevGetIndex:656 : Unable to get index for interface vnet0: No such device

I can terminate these two instances. But since this error, instances that are assigned to start in this compute node (I have multiple compute nodes) will end up in "pending".

I restarted the libvirt and nova-compute services, and everything is back to normal again.

I am using the nova package 2012.1~e4-0ubuntu1, and the libvirt-bin is 0.9.8-2ubuntu11.

This problem may be re-produced by trying to run multiple instances at one time:

INSTANCE_NUM=3 /// even 4 or 5
euca-run-instances -n $INSTANCE_NUM $IMAGE_ID

Revision history for this message
Chuck Short (zulcss) wrote :

Is this with kvm or something else?

Regards
chuck

Changed in nova:
status: New → Incomplete
Revision history for this message
Han Li (li-han-victor) wrote :

Hi Chuck,

I will re-describe the problem I have as follows.

1. The Arch.
There are three physical machines. One as controller, the other two as compute nodes. In the compute node, only nova-compute and libvirt-bin are installed. The other components are in the controller node.

2. Version
nova-compute 2012.1~rc1-0ubuntu2
libvirt-bin 0.9.8-2ubuntu14

3. Related Config flags
--dhcpbridge=/usr/bin/nova-dhcpbridge
--libvirt_use_virtio_for_bridges
--connection_type=libvirt
--libvirt_type=kvm
--network_manager=nova.network.manager.FlatDHCPManager

4. How to repeat the problem
Start an instance. It is working alright at the beginning. Leave it there (running, but idle) for a few hours (Last time I waited for 48 hours), and then I discovered that I couldn't ping or ssh the instance.
If, however, you start another instance on the same physical machine, this newly started instance is still working alright at the beginning.

5. Log messages
The nova log files do not show anything exceptional. However, in the libvirt-bin log (/var/log/libvirt/libvirtd.log), I found this:

2012-03-26 02:43:19.972+0000: 12279: error : virNetDevGetIndex:656 : Unable to get index for interface vnet0: No such device
2012-03-26 11:02:47.463+0000: 12285: error : virNetDevGetIndex:656 : Unable to get index for interface vnet2: No such device

6. A temporary by-pass
Restart the services of nova-compute and libvirt-bin, and then restart the instance. Then I can ping/ssh the instance again.
However, this is not a long-term solution, because the network fail again every day.

Is there any related reported bug? Any suggestions?

Thierry Carrez (ttx)
Changed in nova:
status: Incomplete → New
Revision history for this message
Peng Yong (ppyy) wrote :

confirmed:

2012-04-05 08:35:18.881+0000: 1551: error : virNetDevGetIndex:656 : Unable to get index for interface vnet23: No such device
2012-04-05 08:35:22.259+0000: 1552: error : virNetDevGetIndex:656 : Unable to get index for interface vnet25: No such device
2012-04-05 08:35:24.520+0000: 1552: error : virNetDevGetIndex:656 : Unable to get index for interface vnet26: No such device
2012-04-05 08:35:26.889+0000: 1552: error : virNetDevGetIndex:656 : Unable to get index for interface vnet18: No such device
2012-04-05 08:35:29.120+0000: 1552: error : virNetDevGetIndex:656 : Unable to get index for interface vnet15: No such device
2012-04-05 08:35:31.420+0000: 1552: error : virNetDevGetIndex:656 : Unable to get index for interface vnet7: No such device

Revision history for this message
Han Li (li-han-victor) wrote :

Hi all,

This bug is gone when I use the ubuntu image created from:
http://docs.openstack.org/cactus/openstack-compute/admin/content/starting-images.html

image="ubuntu1010-UEC-localuser-image.tar.gz"
wget http://c0179148.cdn1.cloudfiles.rackspacecloud.com/ubuntu1010-UEC-localuser-image.tar.gz
uec-publish-tarball $image [bucket-name] [hardware-arch]

Not sure what the cause is. But I am happy to use the image above. My cloud is finally smooth.

Tom Fifield (fifieldt)
Changed in nova:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.