OpenStack Compute (nova)

Bug #1946339
Comment #20

Comment 20 for bug 1946339

Revision history for this message

melanie witt (melwitt) wrote on 2022-05-03 (last edit on 2022-05-03): Re: test_unshelve_offloaded_server_with_qos_port_pci_update_fails

#20

I looked at this for awhile trying to find if there's some way we could reduce the number of leaked threads, since killing them (naked greenlet) is not always possible as evidenced by gibi's abandoned patch [1] and some similar things I have tried locally since.

I noticed that when the nova-compute service is stopped by calling stop(), the part that cleans up live migrations in the thread pool uses wait=False [2]:

    def cleanup_host(self):
        self.driver.register_event_listener(None)
        self.instance_events.cancel_all_events()
        self.driver.cleanup_host(host=self.host)
        self._cleanup_live_migrations_in_pool()

    def _cleanup_live_migrations_in_pool(self):
        # Shutdown the pool so we don't get new requests.
        self._live_migration_executor.shutdown(wait=False)

which "If True then shutdown will not return until all running futures have finished executing and the resources used by the executor have been reclaimed." [3]. So wait=False will move on even while running futures are still executing.

I wondered if a patch like the following might reduce the chances of a leaked greenlet from a previous test running during a new test:

diff --git a/nova/test.py b/nova/test.py
index a6449c01f0..69b9d06cdc 100644
--- a/nova/test.py
+++ b/nova/test.py
@@ -37,6 +37,7 @@ import pprint
import sys

import fixtures
+import futurist
import mock
from oslo_cache import core as cache
from oslo_concurrency import lockutils
@@ -290,6 +291,15 @@ class TestCase(base.BaseTestCase):
# instead of only once initialized for test worker
wsgi_app.init_global_data.reset()

+ orig_shutdown = futurist.GreenThreadPoolExecutor.shutdown
+
+ def wrap_shutdown(*a, **kw):
+ kw['wait'] = True
+ return orig_shutdown(*a, **kw)
+
+ self.useFixture(fixtures.MonkeyPatch(
+ 'futurist.GreenThreadPoolExecutor.shutdown', wrap_shutdown))
+

but I haven't been able to reproduce the failure to tell if it might help.

[1] https://review.opendev.org/c/openstack/nova/+/815017
[2] https://github.com/openstack/nova/blob/b8cc5704558d3c08fda9db2f1bb7fecb2bcd985d/nova/compute/manager.py#L1627
[3] https://docs.openstack.org/futurist/latest/reference/index.html#futurist.GreenThreadPoolExecutor.shutdown

I noticed that when the nova-compute service is stopped by calling stop(), the part that cleans up live migrations in the thread pool uses wait=False [2]:

def _cleanup_live_migrations_in_pool(self):
        # Shutdown the pool so we don't get new requests.
        self._live_migration_executor.shutdown(wait=False)

I wondered if a patch like the following might reduce the chances of a leaked greenlet from a previous test running during a new test:

diff --git a/nova/test.py b/nova/test.py
index a6449c01f0..69b9d06cdc 100644
--- a/nova/test.py
+++ b/nova/test.py
@@ -37,6 +37,7 @@ import pprint
 import sys
 
 import fixtures
+import futurist
 import mock
 from oslo_cache import core as cache
 from oslo_concurrency import lockutils
@@ -290,6 +291,15 @@ class TestCase(base.BaseTestCase):
         # instead of only once initialized for test worker
         wsgi_app.init_global_data.reset()
 
+        orig_shutdown = futurist.GreenThreadPoolExecutor.shutdown
+
+        def wrap_shutdown(*a, **kw):
+            kw['wait'] = True
+            return orig_shutdown(*a, **kw)
+
+        self.useFixture(fixtures.MonkeyPatch(
+            'futurist.GreenThreadPoolExecutor.shutdown', wrap_shutdown))
+

but I haven't been able to reproduce the failure to tell if it might help.