nova-scheduler did not logged the weight of each compute_node

Bug #1816360 reported by yangjie
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Pike
Fix Committed
Medium
Matt Riedemann
Queens
Fix Committed
Medium
Matt Riedemann
Rocky
Fix Committed
Medium
Matt Riedemann

Bug Description

Description
===========

nova-scheduler did not logged the weight of each compute_node, even if we configured "debug=true".
You can only see this in nova-scheduler.log (Rocky version).

2019-02-18 15:02:56.918 18716 DEBUG nova.scheduler.filter_scheduler [req-242d0408-395d-4dc2-a237-e3f2b55c2ba8 8fdccd78f9404ccbb427b0b798f46f67 d8706f56f2314bbb8e62463ba833bb1e - default default] Weighed [(nail1, nail1) ram: 27527MB disk: 226304MB io_ops: 0 instances: 2, (Shelf1Slot3SBCR, Shelf1Slot3SBCR) ram: 12743MB disk: 112640MB io_ops: 0 instances: 3, (nail2, nail2) ram: 19919MB disk: 120832MB io_ops: 0 instances: 0] _get_sorted_hosts /usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py:455

But in kilo OpenStack, we can see:

2019-02-18 15:31:07.418 24797 DEBUG nova.scheduler.filter_scheduler [req-9449a23f-643d-45a1-aed7-9d62639d874d 8228476c4baf4a819f2c7b890069c5d1 7240ab9c4351484095c15ae33e0abd0b - - -] Weighed [WeighedHost [host: (computer16-02, computer16-02) ram:45980 disk:69632 io_ops:0 instances:11, weight: 1.0], WeighedHost [host: (computer16-08, computer16-08) ram:45980 disk:73728 io_ops:0 instances:15, weight: 1.0], WeighedHost [host: (computer16-03, computer16-03) ram:43932 disk:117760 io_ops:0 instances:10, weight: 0.955458895172], WeighedHost [host: (computer16-07, computer16-07) ram:43932 disk:267264 io_ops:0 instances:11, weight: 0.955458895172], WeighedHost [host: (computer16-15, computer16-15) ram:41884 disk:-114688 io_ops:0 instances:15, weight: 0.910917790344], WeighedHost [host: (computer16-16, computer16-16) ram:35740 disk:967680 io_ops:0 instances:10, weight: 0.777294475859], WeighedHost [host: (computer16-12, computer16-12) ram:31644 disk:-301056 io_ops:0 instances:13, weight: 0.688212266203], WeighedHost [host: (computer16-05, computer16-05) ram:25500 disk:-316416 io_ops:0 instances:13, weight: 0.554588951718], WeighedHost [host: (computer16-06, computer16-06) ram:17308 disk:-66560 io_ops:0 instances:12, weight: 0.376424532405]] _schedule /usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py:149

Obviously, we have lost the weight value for each compute_nodes now.

Environment
===========

[root@nail1 ~]# rpm -qi openstack-nova-api
Name : openstack-nova-api
Epoch : 1
Version : 18.0.2
Release : 1.el7
Architecture: noarch
Install Date: Wed 17 Oct 2018 02:23:03 PM CST
Group : Unspecified
Size : 5595
License : ASL 2.0
Signature : RSA/SHA1, Mon 15 Oct 2018 05:02:18 PM CST, Key ID f9b9fee7764429e6
Source RPM : openstack-nova-18.0.2-1.el7.src.rpm
Build Date : Tue 09 Oct 2018 05:54:47 PM CST
Build Host : p8le01.rdu2.centos.org
Relocations : (not relocatable)
Packager : CBS <email address hidden>
Vendor : CentOS
URL : http://openstack.org/projects/compute/
Summary : OpenStack Nova API services

Revision history for this message
yangjie (yang.jie) wrote :

The solution of the bug is extremly simple.
In nova/scheduler/filter_scheduler.py,

        weighed_hosts = self.host_manager.get_weighed_hosts(filtered_hosts,
            spec_obj)
        # Strip off the WeighedHost wrapper class...
        weighed_hosts = [h.obj for h in weighed_hosts]

        LOG.debug("Weighed %(hosts)s", {'hosts': weighed_hosts})

Exchange the last two lines of code, make sure logging the weighed_hosts before we strip off the WeighedHost wrapper class.

Then, the log become to this:
2019-02-18 15:08:50.710 19828 DEBUG nova.scheduler.filter_scheduler [req-26963753-81af-4742-a0cd-b8279bb4905a 8fdccd78f9404ccbb427b0b798f46f67 d8706f56f2314bbb8e62463ba833bb1e - default default] Weighed [WeighedHost [host: (Shelf1Slot3SBCR, Shelf1Slot3SBCR) ram: 12743MB disk: 112640MB io_ops: 0 instances: 0, weight: 1.71508529652], WeighedHost [host: (nail2, nail2) ram: 19919MB disk: 120832MB io_ops: 0 instances: 0, weight: 1.63476368028], WeighedHost [host: (nail1, nail1) ram: 27527MB disk: 226304MB io_ops: 0 instances: 2, weight: -999997.0]] _get_sorted_hosts /usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py:453

Revision history for this message
Matt Riedemann (mriedem) wrote :

Yeah looks like it was an accidental regression in Pike:

https://review.openstack.org/#/c/483564/

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
tags: added: low-hanging-fruit scheduler serviceability
Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in nova:
status: Confirmed → In Progress
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/641143
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=84533f5eb3c5b4ab7598d7c278b53524acc1c6e0
Submitter: Zuul
Branch: master

commit 84533f5eb3c5b4ab7598d7c278b53524acc1c6e0
Author: Matt Riedemann <email address hidden>
Date: Tue Mar 5 17:16:23 2019 -0500

    Fix WeighedHost logging regression

    Change I8666e0af3f057314f6b06939a108411b8a88d64b in Pike
    refactored some code in the FilterScheduler which accidentally
    changed how the list of weighed hosts are logged, which caused
    the wrapped HostState objects to be logged rather than the
    WeighedHost objects, which contain the actual "weight" attribute
    which is useful for debugging weigher configuration and
    scheduling decisions.

    This fixes the regression by logging the weighed hosts before
    stripping off the WeighedHost wrapper and adds a simple wrinkle
    to an existing test to assert we are logging the correct object.

    Change-Id: I528794b4b6f0007efc1238ad28dc402456664f86
    Closes-Bug: #1816360

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/641355

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/641359

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/641398

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/641355
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=31b74bfa4063d68d0f5ea9e883cad8cbcb70ab09
Submitter: Zuul
Branch: stable/rocky

commit 31b74bfa4063d68d0f5ea9e883cad8cbcb70ab09
Author: Matt Riedemann <email address hidden>
Date: Tue Mar 5 17:16:23 2019 -0500

    Fix WeighedHost logging regression

    Change I8666e0af3f057314f6b06939a108411b8a88d64b in Pike
    refactored some code in the FilterScheduler which accidentally
    changed how the list of weighed hosts are logged, which caused
    the wrapped HostState objects to be logged rather than the
    WeighedHost objects, which contain the actual "weight" attribute
    which is useful for debugging weigher configuration and
    scheduling decisions.

    This fixes the regression by logging the weighed hosts before
    stripping off the WeighedHost wrapper and adds a simple wrinkle
    to an existing test to assert we are logging the correct object.

    Change-Id: I528794b4b6f0007efc1238ad28dc402456664f86
    Closes-Bug: #1816360
    (cherry picked from commit 84533f5eb3c5b4ab7598d7c278b53524acc1c6e0)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/641359
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bc2c3359b32f452cd772e510d4c139a8e4594900
Submitter: Zuul
Branch: stable/queens

commit bc2c3359b32f452cd772e510d4c139a8e4594900
Author: Matt Riedemann <email address hidden>
Date: Tue Mar 5 17:16:23 2019 -0500

    Fix WeighedHost logging regression

    Change I8666e0af3f057314f6b06939a108411b8a88d64b in Pike
    refactored some code in the FilterScheduler which accidentally
    changed how the list of weighed hosts are logged, which caused
    the wrapped HostState objects to be logged rather than the
    WeighedHost objects, which contain the actual "weight" attribute
    which is useful for debugging weigher configuration and
    scheduling decisions.

    This fixes the regression by logging the weighed hosts before
    stripping off the WeighedHost wrapper and adds a simple wrinkle
    to an existing test to assert we are logging the correct object.

    Change-Id: I528794b4b6f0007efc1238ad28dc402456664f86
    Closes-Bug: #1816360
    (cherry picked from commit 84533f5eb3c5b4ab7598d7c278b53524acc1c6e0)
    (cherry picked from commit 31b74bfa4063d68d0f5ea9e883cad8cbcb70ab09)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.10

This issue was fixed in the openstack/nova 17.0.10 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.2.0

This issue was fixed in the openstack/nova 18.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/641398
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2e31f057514fece68b428f9541b3380cdd7f4feb
Submitter: Zuul
Branch: stable/pike

commit 2e31f057514fece68b428f9541b3380cdd7f4feb
Author: Matt Riedemann <email address hidden>
Date: Tue Mar 5 17:16:23 2019 -0500

    Fix WeighedHost logging regression

    Change I8666e0af3f057314f6b06939a108411b8a88d64b in Pike
    refactored some code in the FilterScheduler which accidentally
    changed how the list of weighed hosts are logged, which caused
    the wrapped HostState objects to be logged rather than the
    WeighedHost objects, which contain the actual "weight" attribute
    which is useful for debugging weigher configuration and
    scheduling decisions.

    This fixes the regression by logging the weighed hosts before
    stripping off the WeighedHost wrapper and adds a simple wrinkle
    to an existing test to assert we are logging the correct object.

    Conflicts:
          nova/scheduler/filter_scheduler.py

    NOTE(mriedem): The conflict is due to not having change
    Icee137e15f264da59a1bdc1dc1ecfeaac82b98c6 in Pike.

    Change-Id: I528794b4b6f0007efc1238ad28dc402456664f86
    Closes-Bug: #1816360
    (cherry picked from commit 84533f5eb3c5b4ab7598d7c278b53524acc1c6e0)
    (cherry picked from commit 31b74bfa4063d68d0f5ea9e883cad8cbcb70ab09)
    (cherry picked from commit bc2c3359b32f452cd772e510d4c139a8e4594900)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.8

This issue was fixed in the openstack/nova 16.1.8 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.