rally leaving stale opened files in ssh module
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Rally |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
We are creating instance for SSH class and calling _wait_for_ssh function i.e
ssh = sshutils.SSH(user, ip, password=password)
self._wait_
and this will be successful after some attempts
2022-01-10 07:30:28.951 98870 RALLYDEBUG rally.utils.
2022-01-10 07:30:34.957 98870 RALLYDEBUG rally.utils.
...
...
...
2022-01-10 07:33:36.220 98870 RALLYDEBUG rally.utils.
However we can see many FIFO files (through lsof command) opened by rally
cat log_lsof | grep rally | wc -l
4440581
cat log_lsof | grep rally | grep FIFO | wc -l
2564483
cat log_lsof | grep rally | grep IPv4 | wc -l
1252151
Because of this our workloads are unable to connect to OSP endpoints and failing with below errors
2022-01-10 09:12:48.903 98897 ERROR rally.task.runner [-] Iteration 257 raised Exception: keystoneauth1.
2022-01-10 09:12:48.903 98897 ERROR rally.task.runner Traceback (most recent call last):
2022-01-10 09:12:48.903 98897 ERROR rally.task.runner File "/home/
2022-01-10 09:12:48.903 98897 ERROR rally.task.runner File "/home/
2022-01-10 09:12:48.903 98897 ERROR rally.task.runner File "/home/
2022-01-10 09:12:48.903 98897 ERROR rally.task.runner File "/usr/lib64/
2022-01-10 09:12:48.903 98897 ERROR rally.task.runner OSError: [Errno 24] Too many open files
We observed that "def execute" [1] method opening stdout and stderr and never closing them. Comments [2] in the function says it has to return stdout and stderr, however they are returning stdout.read(), stderr.read(). So the user can't call stdout.close and stderr.close.
If an exception happens in run()[3] (this is happend in our case i.e previous log messages) user can never close stdin and stdout leaving them stale.
[1] https:/
[2] https:/
[3] https:/
Fix proposed to branch: master /review. opendev. org/c/openstack /rally/ +/823996
Review: https:/