CI / promotion: Nova isn't aware of the nodes that were registered with Ironic

Bug #1674236 reported by Sagi (Sergey) Shnaidman
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Jim Rollenhagen
tripleo
Fix Released
Critical
Emilien Macchi

Bug Description

All CI periodic jobs fail with "No valid host" error:

http://logs.openstack.org/periodic/periodic-tripleo-ci-centos-7-ovb-ha/6504587/
http://logs.openstack.org/periodic/periodic-tripleo-ci-centos-7-ovb-nonha/12d034e/

Hosts are not deployed:
http://logs.openstack.org/periodic/periodic-tripleo-ci-centos-7-ovb-nonha/12d034e/logs/postci.txt.gz#_2017-03-19_07_22_10_000
2017-03-19 07:22:10.000 | +--------------------------------------+-------------------------+--------+------------+-------------+----------+
2017-03-19 07:22:10.000 | | ID | Name | Status | Task State | Power State | Networks |
2017-03-19 07:22:10.000 | +--------------------------------------+-------------------------+--------+------------+-------------+----------+
2017-03-19 07:22:10.000 | | 96e8d6bc-0ff4-46ad-a274-7bf554cdaf1a | overcloud-cephstorage-0 | ERROR | - | NOSTATE | |
2017-03-19 07:22:10.000 | | 56266ef5-7483-4052-8698-37efe14bc1c6 | overcloud-novacompute-0 | ERROR | - | NOSTATE | |
2017-03-19 07:22:10.000 | +--------------------------------------+-------------------------+--------+------------+-------------+----------+

ironic node-list
+--------------------------------------+----------------------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+----------------------+---------------+-------------+--------------------+-------------+
| b4444285-e40e-4068-abd8-7edeeb255cef | baremetal-periodic-0 | None | power off | available | False |
| 102deb76-7f12-49a1-9c3c-53472a1d0f3e | baremetal-periodic-1 | None | power off | available | False |
| 8afea687-4d29-4eed-97f3-57ba449eed14 | baremetal-periodic-2 | None | power off | available | False |
+--------------------------------------+----------------------+---------------+-------------+--------------------+-------------+

Changed in tripleo:
importance: Undecided → High
status: New → Triaged
Changed in tripleo:
milestone: none → pike-1
importance: High → Critical
Revision history for this message
Derek Higgins (derekh) wrote :

It looks like Nova isn't aware of the nodes that were registered with ironic.

introspection passed and the ironic logs show the nodes being powered up and down fine

But when nova tries to schedule a node to boot the scheduler has no nodes to choose from

2017-03-19 07:19:34.157 13121 DEBUG nova.filters [req-702e96af-318d-4611-af19-d6a1f3321c6a 7727004211f045abb45f22d0773242e3 c8da1df6de9841d5a922d5da5b69ff92 - - -] Starting with 0 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:70

"Starting with 0 host(s)"

I would expect that "0" to be a "3"

summary: - CI: periodics jobs fail with "No valid host" error
+ CI / promotion: Nova isn't aware of the nodes that were registered with
+ Ironic
Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

So the new code sets a min_unit equal to the max_unit:

2017-03-21 03:54:38.198 9771 INFO nova.scheduler.client.report [req-1bbd4b23-5c6d-4644-a82b-3fd15c3d2e10 - - - - -] Inventory data after processing custom resource classes for provider e717ea77-1bd5-4122-ba13-60cd4d1342e4: {'VCPU': {'allocation_ratio': 1.0, 'total': 4, 'reserved': 0, 'step_size': 1, 'min_unit': 4, 'max_unit': 4}, 'MEMORY_MB': {'allocation_ratio': 1.0, 'total': 8192, 'reserved': 0, 'step_size': 1, 'min_unit': 8192, 'max_unit': 8192}, 'DISK_GB': {'allocation_ratio': 1.0, 'total': 40, 'reserved': 0, 'step_size': 1, 'min_unit': 40, 'max_unit': 40}}

Since the tripleo flavors undersubscribe (e.g. request 1 vcpu), the resource providers here don't match. This previously worked because tripleo doesn't use the exact match filters. I think to maintain old behavior, we want to:

* set a smaller (1?) min_unit for these properties
* allocate all resources for baremetal resources, even if the flavor is smaller than the resource (this may already be done, I'm not sure)

Folks who want to continue using the exact match filters can do so. The transition to custom resource classes will eliminate the discussion, as we won't be using cpu/ram/disk anymore.

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

I'll have a patch up for this today sometime, btw.

Changed in nova:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Jim Rollenhagen (jim-rollenhagen)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/448098

Changed in nova:
status: Confirmed → In Progress
Changed in tripleo:
assignee: nobody → Emilien Macchi (emilienm)
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/448098
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fe8415060ca452990d7019a03eaaa4b92aadfe8b
Submitter: Jenkins
Branch: master

commit fe8415060ca452990d7019a03eaaa4b92aadfe8b
Author: Jim Rollenhagen <email address hidden>
Date: Tue Mar 21 13:32:35 2017 +0000

    Ironic: hardcode min_unit for standard resources to 1

    We've always left users a choice whether to do exact matching or
    "at least" matching for baremetal flavors, by installing the
    exact match scheduler filters. The patch to add get_inventory
    broke this by setting min_unit and max_unit to be equal for
    baremetal resources.

    Set min_unit to 1 for these resources so that deployers can continue
    to use the exact match filters to decide how they want baremetal
    flavors to be matched.

    Change-Id: I04fdcb73674eb7193e82a61d856747d7985a2b65
    Closes-Bug: #1674236

Changed in nova:
status: In Progress → Fix Released
Changed in tripleo:
status: In Progress → Fix Released
tags: removed: alert ci promotion-blocker
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.0.0b1

This issue was fixed in the openstack/nova 16.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.