nova-live-migration job fails during evacuate negative test

Bug #1903979 reported by Balazs Gibizer
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Lee Yarwood

Bug Description

2020-11-12 08:55:30.829939 | controller | + /opt/stack/nova/roles/run-evacuate-hook/files/test_negative_evacuate.sh:evacuate_and_wait_for_error:29 : echo 'Timed out waiting for server evacuate-test to go to ERROR status'

2020-11-12 08:55:30.829994 | controller | Timed out waiting for server evacuate-test to go to ERROR status

2020-11-12 08:55:30.831833 | controller | + /opt/stack/nova/roles/run-evacuate-hook/files/test_negative_evacuate.sh:evacuate_and_wait_for_error:30 : exit 4

Example run:
https://zuul.opendev.org/t/openstack/build/ca4ccb93df034e4fa32da7bb1a9fc336/log/job-output.txt

Logstash signature:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Timed%20out%20waiting%20for%20server%20evacuate-test%20to%20go%20to%20ERROR%20status%5C%22%20AND%20tags%3Aconsole

Changed in nova:
status: New → Confirmed
importance: Undecided → High
tags: added: evacuate gate-failure testing
Revision history for this message
Lee Yarwood (lyarwood) wrote :

Looks like libvirtd restarted allowing the negative evacuation test to result in an ACTIVE instance on the controller. The current task stops the following service and associated sockets:

roles/run-evacuate-hook/tasks/main.yaml

   34 - name: Stop libvirtd on "{{ inventory_hostname }}"
   35 become: true
   36 systemd:
   37 ¦ name: "{{ item }}"
   38 ¦ state: stopped
   39 with_items:
   40 ¦ - libvirtd.service
   41 ¦ - libvirtd.socket
   42 ¦ - libvirtd-admin.socket
   43 ¦ - libvirtd-ro.socket

I guess I'm missing something here?

Lee Yarwood (lyarwood)
Changed in nova:
assignee: nobody → Lee Yarwood (lyarwood)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/762623

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Andrey Volkov (avolkov) wrote :
Revision history for this message
Andrey Volkov (avolkov) wrote :
Download full text (3.8 KiB)

What we have here:

1. booting a VM on SUBNODE_HOSTNAME and disabling compute service on SUBNODE_HOSTNAME
openstack --os-compute-api-version 2.74 server create --image ec033028-30f6-497d-936a-4bea98cdafea --flavor 1 --nic net-id=209ff518-ff9f-46bb-ac0a-a20e76b35daa --host ubuntu-focal-rax-iad-0021734807 --wait evacuate-test

...

2. Evacuating that VM to CONTROLLER_HOSTNAME
nova --os-compute-api-version 2.67 evacuate --force evacuate-test ubuntu-focal-rax-iad-0021734805
where it actually becomes "active" instead of going to "error".
In case of success, it should fail on controlller with error "nova.exception.HypervisorUnavailable: Connection to the hypervisor is broken on host".

But indeed libvirtd was started again:

2020-11-12 08:44:00.128+0000: 102594: info : libvirt version: 6.0.0, package: 0ubuntu8.5 (Christian Ehrhardt <email address hidden> Thu, 08 Oct 2020 07:36:06 +0200)
2020-11-12 08:44:00.128+0000: 102594: info : hostname: ubuntu-focal-rax-iad-0021734805
2020-11-12 08:44:00.128+0000: 102594: debug : virNetlinkStartup:124 : Running global netlink initialization

Just before 'Run negative evacuate tests':

2020-11-12 08:44:02.021505 | TASK [run-evacuate-hook : Run negative evacuate tests]

Ansible report:

2020-11-12 08:43:59.147000 | LOOP [run-evacuate-hook : Stop libvirtd on "controller"]
2020-11-12 08:44:00.097319 | controller | changed: "libvirtd.service"
2020-11-12 08:44:00.099003 | controller | changed: All items complete
2020-11-12 08:44:00.099120 |
2020-11-12 08:44:00.714245 | controller | changed: "libvirtd.socket"
2020-11-12 08:44:01.298388 | controller | ok: "libvirtd-admin.socket"
2020-11-12 08:44:01.872071 | controller | ok: "libvirtd-ro.socket"

Syslog report:

Nov 12 08:43:59 ubuntu-focal-rax-iad-0021734805 sudo[102576]: zuul : TTY=unknown ; PWD=/home/zuul ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-ikljghyeytpdnrvynsdgogdmzisxdquq ; /usr/bin/python3
Nov 12 08:43:59 ubuntu-focal-rax-iad-0021734805 sudo[102576]: pam_unix(sudo:session): session opened for user root by (uid=0)
Nov 12 08:43:59 ubuntu-focal-rax-iad-0021734805 sudo[102576]: pam_unix(sudo:session): session closed for user root
Nov 12 08:44:00 ubuntu-focal-rax-iad-0021734805 sudo[102596]: zuul : TTY=unknown ; PWD=/home/zuul ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-qdjdltsjeqgfxczzgvknhrqebzarfpca ; /usr/bin/python3
Nov 12 08:44:00 ubuntu-focal-rax-iad-0021734805 sudo[102596]: pam_unix(sudo:session): session opened for user root by (uid=0)
Nov 12 08:44:00 ubuntu-focal-rax-iad-0021734805 sudo[102596]: pam_unix(sudo:session): session closed for user root
Nov 12 08:44:00 ubuntu-focal-rax-iad-0021734805 sudo[102682]: zuul : TTY=unknown ; PWD=/home/zuul ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-bxmxlpjmzyjadkbullovdfrbmiqvbibg ; /usr/bin/python3
Nov 12 08:44:00 ubuntu-focal-rax-iad-0021734805 sudo[102682]: pam_unix(sudo:session): session opened for user root by (uid=0)
Nov 12 08:44:01 ubuntu-focal-rax-iad-0021734805 kernel: iptables dropped: IN=eth0 OUT= MAC=bc:76:4e:20:17:0d:e4:c7:22:60:5a:c1:08:00 SRC=45.146.165.152 DST=104.130.239.160 LEN=60 TOS=0x00 PREC=0x00 TTL=48 ID=32541 DF PROTO=TCP SPT=38250 DPT=3389...

Read more...

Revision history for this message
Andrey Volkov (avolkov) wrote :

libvirtd.service has Restart=on-failure. This can be a reason libvirtd started immediately after stop.

Lee Yarwood (lyarwood)
Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 23.0.0.0rc1

This issue was fixed in the openstack/nova 23.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.