Instance in state not running after live migration

Bug #947326 reported by Christian Wittwer
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Vish Ishaya

Bug Description

I'm testing the live migration feature in Essex milestone 4. I can sucessfully migrate an instance, but after the migration the instance can not be migrated again.

----------------------------------------------------------------------------------------------------------------------------------------
root@unic-prd-os-controller:~# nova-manage vm live_migration instance-00000007 unic-prd-os-compute6
2012-03-05 18:40:41 INFO nova.rpc.common [req-38c187eb-ca3e-4d4c-b576-0fed30a74e40 None None] Connected to AMQP server on 10.2.30.2:5672
Migration of instance-00000007 initiated.Check its progress using euca-describe-instances.
----------------------------------------------------------------------------------------------------------------------------------------

Initial migration, after the instance is running (ps aux on unic-prd-os-compute6) and I can ping it.

----------------------------------------------------------------------------------------------------------------------------------------
root@unic-prd-os-controller:~# nova show 41dedf0f-9a46-4593-9ae7-15e0f73973ce
+-------------------------------+----------------------------------------------------------+
| Property | Value |
+-------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-SRV-ATTR:host | unic-prd-os-compute6 |
| OS-EXT-SRV-ATTR:instance_name | instance-00000007 |
| OS-EXT-STS:power_state | 8 |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| PrivateCloud network | 10.2.20.34 |
| accessIPv4 | |
| accessIPv6 | |
| config_drive | |
| created | 2012-03-05T16:37:46Z |
| flavor | m1.small |
| hostId | a24295a0fd51ae996ff8af9835bb03b83d10681885b54492fa3e2cf4 |
| id | 41dedf0f-9a46-4593-9ae7-15e0f73973ce |
| image | Centos5x64_v1 |
| key_name | |
| metadata | {} |
| name | testvm5 |
| progress | None |
| status | ACTIVE |
| tenant_id | e552d533c8fc43c1b759d79cb356c15d |
| updated | 2012-03-05T17:40:58Z |
| user_id | f400a2b6e60745f6b704d4aa51969d1b |
+-------------------------------+----------------------------------------------------------+
----------------------------------------------------------------------------------------------------------------------------------------

The state of the instance is active.

----------------------------------------------------------------------------------------------------------------------------------------
root@unic-prd-os-controller:~# nova-manage vm live_migration instance-00000007 unic-prd-os-compute5
2012-03-05 18:46:57 INFO nova.rpc.common [req-d6779fb1-bb74-4d16-a750-9658d207461e None None] Connected to AMQP server on 10.2.30.2:5672
Command failed, please check log for more info
2012-03-05 18:46:57 CRITICAL nova [req-d6779fb1-bb74-4d16-a750-9658d207461e None None] Remote error: InstanceNotRunning Instance i-00000007 is not running.
[u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 250, in _process_data\n rval = node_func(context=ctxt, **node_args)\n', u' File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 98, in _schedule\n self._set_instance_error(method, context, ex, *args, **kwargs)\n', u' File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__\n self.gen.next()\n', u' File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 93, in _schedule\n return real_meth(*args, **kwargs)\n', u' File "/usr/lib/python2.7/dist-packages/nova/scheduler/driver.py", line 202, in schedule_live_migration\n self._live_migration_src_check(context, instance_ref)\n', u' File "/usr/lib/python2.7/dist-packages/nova/scheduler/driver.py", line 241, in _live_migration_src_check\n raise exception.InstanceNotRunning(instance_id=instance_id)\n', u'InstanceNotRunning: Instance i-00000007 is not running.\n'].
----------------------------------------------------------------------------------------------------------------------------------------

The migration breaks because the instance is not running, which is not true.

Revision history for this message
Vish Ishaya (vishvananda) wrote :

It is showing power_state=8 which means libvirt is reporting it as failed, so it doesn't look like the power state is being reported properly. Perhaps the old host is updating the power state in the db somewhere and overwriting the power state reported by the new host?

Changed in nova:
status: New → Triaged
importance: Undecided → Medium
milestone: none → essex-rc1
Revision history for this message
Kei Masumoto (masumotok) wrote :

I will handle on this matter.

It seems like you have to wait next periodic_tasks() begins iin existing implementation.
But it might be a problem for some cases, so lets change power_state of migrated vm right after live migration finishes.

> Vish
Can you assign this bug to me if it is convinent to anyone?

Thierry Carrez (ttx)
Changed in nova:
assignee: nobody → Kei Masumoto (masumotok)
Changed in nova:
assignee: Kei Masumoto (masumotok) → Vish Ishaya (vishvananda)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/5038

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/5038
Committed: http://github.com/openstack/nova/commit/33def9e714fbd13a6dc4b755ade4841c971f7ae5
Submitter: Jenkins
Branch: master

commit 33def9e714fbd13a6dc4b755ade4841c971f7ae5
Author: Vishvananda Ishaya <email address hidden>
Date: Thu Mar 8 12:53:44 2012 -0800

    Fix live-migration in multi_host network

     * call teardown after live migration
     * call update a second time after migration for dhcp
     * moves the instance state update into post_live_migrate
     * completes the fix for bug 939060
     * fixes bug 947326

    Change-Id: I042567573b9bb46381c5447aa08e83cd1916b225

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: essex-rc1 → 2012.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.