rally leaving stale opened files in ssh module

Bug #1956956 reported by venkata anil
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Rally
Fix Released
Undecided
Unassigned

Bug Description

We are creating instance for SSH class and calling _wait_for_ssh function i.e

ssh = sshutils.SSH(user, ip, password=password)
self._wait_for_ssh(ssh, timeout=timeout, interval=interval)

and this will be successful after some attempts

2022-01-10 07:30:28.951 98870 RALLYDEBUG rally.utils.sshutils [-] Ssh is still unavailable: SSHError("Exception <class 'socket.timeout'> was raised during connect to root@172.31.11.67:22. Exception value is: timeout('timed out',)",)
2022-01-10 07:30:34.957 98870 RALLYDEBUG rally.utils.sshutils [-] Ssh is still unavailable: SSHError("Exception <class 'socket.timeout'> was raised during connect to root@172.31.11.67:22. Exception value is: timeout('timed out',)",)
...
...
...
2022-01-10 07:33:36.220 98870 RALLYDEBUG rally.utils.sshutils [-] stdout: b'Linux\n'

However we can see many FIFO files (through lsof command) opened by rally
cat log_lsof | grep rally | wc -l
4440581
cat log_lsof | grep rally | grep FIFO | wc -l
2564483
cat log_lsof | grep rally | grep IPv4 | wc -l
1252151

Because of this our workloads are unable to connect to OSP endpoints and failing with below errors

2022-01-10 09:12:48.903 98897 ERROR rally.task.runner [-] Iteration 257 raised Exception: keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://172.18.0.111:9696/v2.0/routers: HTTPConnectionPool(host='172.18.0.111', port=9696): Max retries exceeded with url: /v2.0/routers (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4ea85ed9b0>: Failed to establish a new connection: [Errno 24] Too many open files',))
2022-01-10 09:12:48.903 98897 ERROR rally.task.runner Traceback (most recent call last):
2022-01-10 09:12:48.903 98897 ERROR rally.task.runner File "/home/stack/browbeat/.rally-venv/lib/python3.6/site-packages/urllib3/connection.py", line 175, in _new_conn
2022-01-10 09:12:48.903 98897 ERROR rally.task.runner File "/home/stack/browbeat/.rally-venv/lib/python3.6/site-packages/urllib3/util/connection.py", line 96, in create_connection
2022-01-10 09:12:48.903 98897 ERROR rally.task.runner File "/home/stack/browbeat/.rally-venv/lib/python3.6/site-packages/urllib3/util/connection.py", line 77, in create_connection
2022-01-10 09:12:48.903 98897 ERROR rally.task.runner File "/usr/lib64/python3.6/socket.py", line 144, in __init__
2022-01-10 09:12:48.903 98897 ERROR rally.task.runner OSError: [Errno 24] Too many open files

We observed that "def execute" [1] method opening stdout and stderr and never closing them. Comments [2] in the function says it has to return stdout and stderr, however they are returning stdout.read(), stderr.read(). So the user can't call stdout.close and stderr.close.

If an exception happens in run()[3] (this is happend in our case i.e previous log messages) user can never close stdin and stdout leaving them stale.

[1] https://github.com/openstack/rally/blob/master/rally/utils/sshutils.py#L240
[2] https://github.com/openstack/rally/blob/master/rally/utils/sshutils.py#L247
[3] https://github.com/openstack/rally/blob/master/rally/utils/sshutils.py#L252

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to rally (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/rally/+/823996

Changed in rally:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to rally (master)

Reviewed: https://review.opendev.org/c/openstack/rally/+/823996
Committed: https://opendev.org/openstack/rally/commit/2a77c63071abcf2e2088b72f1a81b3ffc69be7c7
Submitter: "Zuul (22348)"
Branch: master

commit 2a77c63071abcf2e2088b72f1a81b3ffc69be7c7
Author: Andrey Kurilin <email address hidden>
Date: Mon Jan 10 15:38:11 2022 +0200

    Close stringio objects at sshutils

    Closes-Bug: #1956956
    Change-Id: I94f597d99951459b12f0f0211ec73f2ae7fa908d

Changed in rally:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/rally 3.4.0

This issue was fixed in the openstack/rally 3.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.