“UNKNOWN” host_status notification may cause unsafe evacuation

Bug #1858762 reported by Toshikazu Ichikawa
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
masakari
Fix Released
Critical
Shilpa Devharakar

Bug Description

Currently, masakari-hostmonitor sends a notification with “NORMAL” host_status when it confirms a failed host gets power off, otherwise it sends a notification with “UNKNOWN” host_status. However, when masakari-api receives a notification, it executes the failover process regardless the value of host_status.

If a failed host remains power on for some reasons and VM instances are still running on the host, those instances are requested to evacuate through Nova API. Moreover, if nova-compute is ‘down’ status and corosync is ‘offline’ status due to network infrastructure instability or so, the failover of VM instances are executed even though existing VM instances are not fenced, which is dangerous and may cause unavailable instance and data loss.

To avoid unsafe failover, the failover process must not be triggered by a notification with “UNKNOWN” host_status.

Changed in masakari:
status: New → Confirmed
Tushar Patil (tpatil)
Changed in masakari:
importance: Undecided → Critical
Changed in masakari:
assignee: nobody → Shilpa Devharakar (shilpasd)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to masakari (master)

Fix proposed to branch: master
Review: https://review.opendev.org/714573

Changed in masakari:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to masakari (master)

Reviewed: https://review.opendev.org/714573
Committed: https://git.openstack.org/cgit/openstack/masakari/commit/?id=115aa6c49ac0bad7c3c57c466192440d166e3b5f
Submitter: Zuul
Branch: master

commit 115aa6c49ac0bad7c3c57c466192440d166e3b5f
Author: Shilpa <email address hidden>
Date: Mon Mar 23 19:36:53 2020 +0530

    Ignoring host recovery if host_status is `UNKNOWN`

    This patch adds validation to ignore host recovery if received
    host_status as `UNKNOWN` in notification payload.

    Change-Id: Ie32f6c0933e15edc78a86687742a4e87e3623a85
    Closes-Bug: #1858762

Changed in masakari:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.