Revert resize tests are failing in jobs with iptables_hybrid fw driver

Bug #1833902 reported by Slawek Kaplonski
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
sean mooney

Bug Description

Tests:

tempest.api.compute.admin.test_migrations.MigrationsAdminTest.test_resize_server_revert_deleted_flavor
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert_with_volume_attached

are failing 100% times since last ~2 days.
And it happens only in jobs with iptables_hybrid fw driver but I don't know if this is really some source of issue or maybe just red herring.

Logstash query:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22tempest.api.compute.admin.test_migrations.MigrationsAdminTest.test_resize_server_revert_deleted_flavor%5C%22%20AND%20message%3A%5C%22FAILED%5C%22

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Looks that it is caused by change in Nova: https://review.opendev.org/#/c/644881/

I proposed revert of this patch https://review.opendev.org/#/c/667035/ and DNM patch in neutron to check if that will really help: https://review.opendev.org/#/c/667036/

Changed in neutron:
assignee: nobody → Slawek Kaplonski (slaweq)
Changed in nova:
assignee: nobody → Slawek Kaplonski (slaweq)
status: New → In Progress
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → High
Changed in nova:
assignee: Slawek Kaplonski (slaweq) → Matt Riedemann (mriedem)
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/667154

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/667035
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=83a5429db447b17110a7fc192a61879e84f5e5a2
Submitter: Zuul
Branch: master

commit 83a5429db447b17110a7fc192a61879e84f5e5a2
Author: Slawek Kaplonski <email address hidden>
Date: Mon Jun 24 07:50:37 2019 +0000

    Revert "Revert resize: wait for events according to hybrid plug"

    This reverts commit 19f9b37721d9bc13bc1ed35a4f368b1d21b10a5b.

    Commit 19f9b37721d9bc13bc1ed35a4f368b1d21b10a5b introduced
    a regression and caused errors on Neutron CI jobs which run
    on a single node with iptables_hybrid firewall driver.
    This is because the nova change made the resize revert flow
    wait for neutron-vif-plugged events in the ComputeManager
    when the port's host binding changes for iptables_hybrid
    ports but for those types of ports, Neutron does not send
    the event if the host does not change - which is the case
    for a same-host resize being reverted.

    Change-Id: I77b3639435c671bacca9cdfc1aa203e44a2fb042
    Closes-Bug: #1833902

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
Matt Riedemann (mriedem) wrote :

https://review.opendev.org/#/c/667177/ is the nova fix for the same host resize scenario (the revert of the revert plus the fix).

Changed in nova:
assignee: Matt Riedemann (mriedem) → sean mooney (sean-k-mooney)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.opendev.org/667036

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/667154
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=33e9740d004770d4d58458330e269b8e1ebf4136
Submitter: Zuul
Branch: master

commit 33e9740d004770d4d58458330e269b8e1ebf4136
Author: Matt Riedemann <email address hidden>
Date: Mon Jun 24 12:24:38 2019 -0400

    Add neutron-tempest-iptables_hybrid job to experimental queue

    Due to regression bug 1833902 we should have the option to
    run the neutron job that tests the OVS hybrid plug configuration
    on specific patches.

    We could eventually consider running this in the check queue
    like neutron-tempest-linuxbridge with a strict set of irrelevant
    files but that would require some thought, i.e. we'd want to run
    the job on changes to nova/compute/manager.py since the network
    vif model bind events code is used in the ComputeManager and is
    specific to OVS hybrid plug ports. Rather than get bogged down in
    thinking up what that set is, this change takes the easy path and
    just throws the job into the on-demand experimental queue.

    Change-Id: Id5172d97b349abe59b47d1284e97943091652419
    Related-Bug: #1833902

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)
Download full text (3.8 KiB)

Reviewed: https://review.opendev.org/667177
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7a7a223602ca5aa0aca8f65a6ab143f1d8f8ec1b
Submitter: Zuul
Branch: master

commit 7a7a223602ca5aa0aca8f65a6ab143f1d8f8ec1b
Author: Artom Lifshitz <email address hidden>
Date: Wed Mar 20 10:38:12 2019 -0400

    Revert resize: wait for events according to hybrid plug

    Since 4817165fc5938a553fafa1a69c6086f9ebe311af, when reverting a
    resized instance back to the source host, the libvirt driver waits for
    vif-plugged events when spawning the instance. When called from
    finish_revert_resize() in the source compute manager, libvirt's
    finish_revert_migration() does not pass vifs_already_plugged to
    _create_domain_and_network(), making the latter use the default False
    value.

    When the source compute manager calls
    network_api.migrate_instance_finish() in finish_revert_resize(), this
    updates the port binding back to the source host. If Neutron is
    configured to use OVS hybrid plug, it will send the vif-plugged event
    immediately after completing this request. This happens before the
    virt driver's finish_revert_migration() method is called. This causes
    the wait in the libvirt driver to time out because the event is
    received before Nova starts waiting for it.

    The neutron ovs l2 agent sends vif-plugged events when two conditions
    are met. First the port must be bound to the host managed by the
    l2 agent and second, the agent must have completed configuring the
    port on ovs. This involves assigning the port a local VLAN for tenant
    isolation, applying security group rules if required and applying
    QoS policies or other agent extensions like service function chaining.

    During the boot process, we bind the port first to the host
    then plug the interface into ovs which triggers the l2 agent to
    configure it resulting in the emission of the vif-plugged event.
    In the revert case, as noted above, since the vif is already plugged
    on the source node when hybrid-plug is used, binding the port to the
    source node fulfils the second condition to send the vif-plugged event.

    Events sent immediately after port binding update are hereafter known
    as "bind-time" events. For ports that do not use OVS hybrid plug,
    Neutron will continue to send vif-plugged events only when Nova
    actually plugs the VIF. These types of events are hereafter known as
    "plug-time" events. OVS hybrid plug is a per agent setting, so for
    a particular host, bind-time events are an all-or-nothing thing for the
    ovs backend: either all VIF_TYPE=ovs ports have them, or no ovs ports
    have them. In general, a host will only have one network backend.
    The only exception to this is SR-IOV. SR-IOV is commonly deployed on
    the same host as other network backends such as OVS or linuxbridge.
    SR-IOV ports with VNIC_TYPE=direct-physical will always have only
    bind-time events. If an instance mixes OVS ports with hybrid-plug=False
    with direct physical ports, it will have both kinds of events.

    For same host resize reverts we do not upd...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/670645

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/670648

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/671079

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/671303

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/stein)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/stein
Review: https://review.opendev.org/671079
Reason: No longer needed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)
Download full text (4.1 KiB)

Reviewed: https://review.opendev.org/670645
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7a3a8f325ef6eaa97de1cf74efbaa0079f61a9e6
Submitter: Zuul
Branch: stable/stein

commit 7a3a8f325ef6eaa97de1cf74efbaa0079f61a9e6
Author: Artom Lifshitz <email address hidden>
Date: Wed Mar 20 10:38:12 2019 -0400

    Revert resize: wait for events according to hybrid plug

    Since 4817165fc5938a553fafa1a69c6086f9ebe311af, when reverting a
    resized instance back to the source host, the libvirt driver waits for
    vif-plugged events when spawning the instance. When called from
    finish_revert_resize() in the source compute manager, libvirt's
    finish_revert_migration() does not pass vifs_already_plugged to
    _create_domain_and_network(), making the latter use the default False
    value.

    When the source compute manager calls
    network_api.migrate_instance_finish() in finish_revert_resize(), this
    updates the port binding back to the source host. If Neutron is
    configured to use OVS hybrid plug, it will send the vif-plugged event
    immediately after completing this request. This happens before the
    virt driver's finish_revert_migration() method is called. This causes
    the wait in the libvirt driver to time out because the event is
    received before Nova starts waiting for it.

    The neutron ovs l2 agent sends vif-plugged events when two conditions
    are met. First the port must be bound to the host managed by the
    l2 agent and second, the agent must have completed configuring the
    port on ovs. This involves assigning the port a local VLAN for tenant
    isolation, applying security group rules if required and applying
    QoS policies or other agent extensions like service function chaining.

    During the boot process, we bind the port first to the host
    then plug the interface into ovs which triggers the l2 agent to
    configure it resulting in the emission of the vif-plugged event.
    In the revert case, as noted above, since the vif is already plugged
    on the source node when hybrid-plug is used, binding the port to the
    source node fulfils the second condition to send the vif-plugged event.

    Events sent immediately after port binding update are hereafter known
    as "bind-time" events. For ports that do not use OVS hybrid plug,
    Neutron will continue to send vif-plugged events only when Nova
    actually plugs the VIF. These types of events are hereafter known as
    "plug-time" events. OVS hybrid plug is a per agent setting, so for
    a particular host, bind-time events are an all-or-nothing thing for the
    ovs backend: either all VIF_TYPE=ovs ports have them, or no ovs ports
    have them. In general, a host will only have one network backend.
    The only exception to this is SR-IOV. SR-IOV is commonly deployed on
    the same host as other network backends such as OVS or linuxbridge.
    SR-IOV ports with VNIC_TYPE=direct-physical will always have only
    bind-time events. If an instance mixes OVS ports with hybrid-plug=False
    with direct physical ports, it will have both kinds of events.

    For same host resize reverts we do n...

Read more...

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)
Download full text (4.3 KiB)

Reviewed: https://review.opendev.org/670648
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d9892abd2f096c05a2885a4902128941a24f5670
Submitter: Zuul
Branch: stable/rocky

commit d9892abd2f096c05a2885a4902128941a24f5670
Author: Artom Lifshitz <email address hidden>
Date: Wed Mar 20 10:38:12 2019 -0400

    Revert resize: wait for events according to hybrid plug

    Since 4817165fc5938a553fafa1a69c6086f9ebe311af, when reverting a
    resized instance back to the source host, the libvirt driver waits for
    vif-plugged events when spawning the instance. When called from
    finish_revert_resize() in the source compute manager, libvirt's
    finish_revert_migration() does not pass vifs_already_plugged to
    _create_domain_and_network(), making the latter use the default False
    value.

    When the source compute manager calls
    network_api.migrate_instance_finish() in finish_revert_resize(), this
    updates the port binding back to the source host. If Neutron is
    configured to use OVS hybrid plug, it will send the vif-plugged event
    immediately after completing this request. This happens before the
    virt driver's finish_revert_migration() method is called. This causes
    the wait in the libvirt driver to time out because the event is
    received before Nova starts waiting for it.

    The neutron ovs l2 agent sends vif-plugged events when two conditions
    are met. First the port must be bound to the host managed by the
    l2 agent and second, the agent must have completed configuring the
    port on ovs. This involves assigning the port a local VLAN for tenant
    isolation, applying security group rules if required and applying
    QoS policies or other agent extensions like service function chaining.

    During the boot process, we bind the port first to the host
    then plug the interface into ovs which triggers the l2 agent to
    configure it resulting in the emission of the vif-plugged event.
    In the revert case, as noted above, since the vif is already plugged
    on the source node when hybrid-plug is used, binding the port to the
    source node fulfils the second condition to send the vif-plugged event.

    Events sent immediately after port binding update are hereafter known
    as "bind-time" events. For ports that do not use OVS hybrid plug,
    Neutron will continue to send vif-plugged events only when Nova
    actually plugs the VIF. These types of events are hereafter known as
    "plug-time" events. OVS hybrid plug is a per agent setting, so for
    a particular host, bind-time events are an all-or-nothing thing for the
    ovs backend: either all VIF_TYPE=ovs ports have them, or no ovs ports
    have them. In general, a host will only have one network backend.
    The only exception to this is SR-IOV. SR-IOV is commonly deployed on
    the same host as other network backends such as OVS or linuxbridge.
    SR-IOV ports with VNIC_TYPE=direct-physical will always have only
    bind-time events. If an instance mixes OVS ports with hybrid-plug=False
    with direct physical ports, it will have both kinds of events.

    For same host resize reverts we do n...

Read more...

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/rocky)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/rocky
Review: https://review.opendev.org/671303
Reason: No longer needed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.2

This issue was fixed in the openstack/nova 19.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.2.2

This issue was fixed in the openstack/nova 18.2.2 release.

Matt Riedemann (mriedem)
no longer affects: neutron
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.0.0.0rc1

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.