Comment 1 for bug 1781710

Revision history for this message
Matt Riedemann (mriedem) wrote :

This is where we first hit the server group anti affinity filter during scheduling for this request to create 2 servers in the same anti-affinity group:

http://logs.openstack.org/44/564444/14/check/neutron-tempest-multinode-full/dba40b9/logs/screen-n-sch.txt.gz#_Jul_13_19_53_09_805696

Jul 13 19:53:09.805696 ubuntu-xenial-rax-dfw-0000714118 nova-scheduler[3485]: DEBUG nova.scheduler.filters.affinity_filter [None req-8b191d5b-4c76-47c4-a322-d9c9a5ee6d0b tempest-ServersOnMultiNodesTest-1141629396 tempest-ServersOnMultiNodesTest-1141629396] Group anti-affinity: check if the number of servers from group 243c3452-1fde-41ef-bf5a-1cddf1236a7f on host ubuntu-xenial-rax-dfw-0000714124 is less than 1. {{(pid=5417) host_passes /opt/stack/new/nova/nova/scheduler/filters/affinity_filter.py:122}}
Jul 13 19:53:09.806055 ubuntu-xenial-rax-dfw-0000714118 nova-scheduler[3485]: DEBUG nova.scheduler.filters.affinity_filter [None req-8b191d5b-4c76-47c4-a322-d9c9a5ee6d0b tempest-ServersOnMultiNodesTest-1141629396 tempest-ServersOnMultiNodesTest-1141629396] Group anti-affinity: check if the number of servers from group 243c3452-1fde-41ef-bf5a-1cddf1236a7f on host ubuntu-xenial-rax-dfw-0000714118 is less than 1. {{(pid=5417) host_passes /opt/stack/new/nova/nova/scheduler/filters/affinity_filter.py:122}}
Jul 13 19:53:09.808871 ubuntu-xenial-rax-dfw-0000714118 nova-scheduler[3485]: DEBUG nova.filters [None req-8b191d5b-4c76-47c4-a322-d9c9a5ee6d0b tempest-ServersOnMultiNodesTest-1141629396 tempest-ServersOnMultiNodesTest-1141629396] Filter ServerGroupAntiAffinityFilter returned 2 host(s) {{(pid=5417) get_filtered_objects /opt/stack/new/nova/nova/filters.py:104}}

This is where we hit the server group anti-affinity filter for the second instance:

http://logs.openstack.org/44/564444/14/check/neutron-tempest-multinode-full/dba40b9/logs/screen-n-sch.txt.gz#_Jul_13_19_53_09_917947

Jul 13 19:53:09.917947 ubuntu-xenial-rax-dfw-0000714118 nova-scheduler[3485]: DEBUG nova.scheduler.filters.affinity_filter [None req-8b191d5b-4c76-47c4-a322-d9c9a5ee6d0b tempest-ServersOnMultiNodesTest-1141629396 tempest-ServersOnMultiNodesTest-1141629396] Group anti-affinity: check if the number of servers from group 243c3452-1fde-41ef-bf5a-1cddf1236a7f on host ubuntu-xenial-rax-dfw-0000714118 is less than 1. {{(pid=5417) host_passes /opt/stack/new/nova/nova/scheduler/filters/affinity_filter.py:122}}
Jul 13 19:53:09.918313 ubuntu-xenial-rax-dfw-0000714118 nova-scheduler[3485]: DEBUG nova.scheduler.filters.affinity_filter [None req-8b191d5b-4c76-47c4-a322-d9c9a5ee6d0b tempest-ServersOnMultiNodesTest-1141629396 tempest-ServersOnMultiNodesTest-1141629396] Group anti-affinity: check if the number of servers from group 243c3452-1fde-41ef-bf5a-1cddf1236a7f on host ubuntu-xenial-rax-dfw-0000714124 is less than 1. {{(pid=5417) host_passes /opt/stack/new/nova/nova/scheduler/filters/affinity_filter.py:122}}
Jul 13 19:53:09.918709 ubuntu-xenial-rax-dfw-0000714118 nova-scheduler[3485]: DEBUG nova.filters [None req-8b191d5b-4c76-47c4-a322-d9c9a5ee6d0b tempest-ServersOnMultiNodesTest-1141629396 tempest-ServersOnMultiNodesTest-1141629396] Filter ServerGroupAntiAffinityFilter returned 2 host(s) {{(pid=5417) get_filtered_objects /opt/stack/new/nova/nova/filters.py:104}}
Jul 13 19:53:09.919082 ubuntu-xenial-rax-dfw-0000714118 nova-scheduler[3485]: DEBUG nova.filters [None req-8b191d5b-4c76-47c4-a322-d9c9a5ee6d0b tempest-ServersOnMultiNodesTest-1141629396 tempest-ServersOnMultiNodesTest-1141629396] Filter ServerGroupAffinityFilter returned 2 host(s) {{(pid=5417) get_filtered_objects /opt/stack/new/nova/nova/filters.py:104}}

I don't really see what could be an issue there, but we should pick a host for the first instance, "claim" resources on that host and update the group hosts, and then run the filters on the second instance and filter out the first host selected for the first instance.

https://github.com/openstack/nova/blob/21a368e1a6f22aa576719ec463d13280b9178f10/nova/scheduler/filter_scheduler.py#L324

Specifically this:

if spec_obj.instance_group is not None:
    spec_obj.instance_group.hosts.append(selected_host.host)
    # hosts has to be not part of the updates when saving
    spec_obj.instance_group.obj_reset_changes(['hosts'])

My guess is the problem is in this change which merged on July 12:

https://review.openstack.org/#/c/571166/27/nova/scheduler/filters/affinity_filter.py

And specifically because we removed this check in the filter:

        group_hosts = (spec_obj.instance_group.hosts
                       if spec_obj.instance_group else [])
        LOG.debug("Group anti affinity: check if %(host)s not "
                  "in %(configured)s", {'host': host_state.host,
                                        'configured': group_hosts})
        if group_hosts:
            return host_state.host not in group_hosts

That's what makes the ServerGroupAntiAffinityFilter work for multiple servers created in the same request because we update spec_obj.instance_group.hosts for each member of the group after we have selected a host.

But, when this failures, I don't think https://review.openstack.org/#/c/571166/27/nova/scheduler/filters/affinity_filter.py is in play because this isn't in the logs:

        LOG.debug("Group anti-affinity: check if the number of servers from "
                  "group %(group_uuid)s on host %(host)s is less than "
                  "%(max_server)s.",
                  {'group_uuid': group_uuid,
                   'host': host_state.host,
                   'max_server': max_server_per_host})