mysql OOM: DBConnectionError while validating tokens in CI runs

Bug #1656850 reported by Matt Riedemann
This bug report is a duplicate of:  Bug #1656386: OOM issues in the gate. Edit Remove
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
Confirmed
High
Unassigned

Bug Description

Seeing this in CI runs:

http://logs.openstack.org/61/420261/1/check/gate-tempest-dsvm-cells-ubuntu-xenial/2b44ec1/logs/apache/keystone.txt.gz?level=TRACE#_2017-01-16_04_12_53_583

2017-01-16 04:12:53.583 28905 ERROR keystone.common.wsgi [req-eb4c37b4-f26e-41bf-a690-57d608418632 7276828e02c2484f87807d60adae18ba 2dad6c14dcf748c585d91a9026c1b539 - default default] (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '127.0.0.1' ([Errno 111] Connection refused)") [SQL: u'SELECT 1']
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi Traceback (most recent call last):
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/opt/stack/new/keystone/keystone/common/wsgi.py", line 373, in _inner
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi return method(self, request)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/opt/stack/new/keystone/keystone/middleware/auth.py", line 141, in process_request
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi resp = super(AuthContextMiddleware, self).process_request(request)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/keystonemiddleware/auth_token/__init__.py", line 405, in process_request
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi allow_expired=allow_expired)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/keystonemiddleware/auth_token/__init__.py", line 435, in _do_fetch_token
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi data = self.fetch_token(token, **kwargs)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/opt/stack/new/keystone/keystone/middleware/auth.py", line 50, in fetch_token
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi return self.token_provider_api.validate_token(token)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/opt/stack/new/keystone/keystone/common/manager.py", line 123, in wrapped
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi __ret_val = __f(*args, **kwargs)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/opt/stack/new/keystone/keystone/token/provider.py", line 174, in validate_token
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi self._is_valid_token(token_ref, window_seconds=window_seconds)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/opt/stack/new/keystone/keystone/token/provider.py", line 209, in _is_valid_token
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi self.check_revocation(token)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/opt/stack/new/keystone/keystone/common/manager.py", line 123, in wrapped
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi __ret_val = __f(*args, **kwargs)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/opt/stack/new/keystone/keystone/token/provider.py", line 157, in check_revocation
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi return self.check_revocation_v3(token)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/opt/stack/new/keystone/keystone/common/manager.py", line 123, in wrapped
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi __ret_val = __f(*args, **kwargs)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/opt/stack/new/keystone/keystone/token/provider.py", line 150, in check_revocation_v3
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi self.revoke_api.check_token(token_values)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/opt/stack/new/keystone/keystone/common/manager.py", line 123, in wrapped
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi __ret_val = __f(*args, **kwargs)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/opt/stack/new/keystone/keystone/revoke/core.py", line 203, in check_token
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi if revoke_model.is_revoked(self.driver.list_events(token=token),
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/opt/stack/new/keystone/keystone/revoke/backends/sql.py", line 129, in list_events
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi return self._list_token_events(token)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/opt/stack/new/keystone/keystone/revoke/backends/sql.py", line 113, in _list_token_events
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi events = [revoke_model.RevokeEvent(**e.to_dict()) for e in query]
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2761, in __iter__
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi return self._execute_and_instances(context)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2774, in _execute_and_instances
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi close_with_result=True)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2765, in _connection_from_session
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi **kw)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 956, in connection
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi execution_options=execution_options)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 961, in _connection_for_bind
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi engine, execution_options)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 397, in _connection_for_bind
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi conn = bind.contextual_connect()
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2041, in contextual_connect
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi **kwargs)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 92, in __init__
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi self.dispatch.engine_connect(self, self.__branch)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/event/attr.py", line 256, in __call__
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi fn(*args, **kw)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py", line 80, in _connect_ping_listener
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi connection.scalar(select([1]))
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 844, in scalar
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi return self.execute(object, *multiparams, **params).scalar()
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 914, in execute
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi return meth(self, multiparams, params)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi return connection._execute_clauseelement(self, multiparams, params)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi compiled_sql, distilled_params
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1078, in _execute_context
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi None, None)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1337, in _handle_dbapi_exception
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi util.raise_from_cause(newraise, exc_info)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 202, in raise_from_cause
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi reraise(type(exception), exception, tb=exc_tb, cause=cause)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1071, in _execute_context
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi conn = self._revalidate_connection()
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 393, in _revalidate_connection
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi self.__connection = self.engine.raw_connection(_connection=self)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2104, in raw_connection
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi self.pool.unique_connection, _connection)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2080, in _wrap_pool_connect
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi util.reraise(*sys.exc_info())
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2074, in _wrap_pool_connect
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi return fn()
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 318, in unique_connection
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi return _ConnectionFairy._checkout(self)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 713, in _checkout
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi fairy = _ConnectionRecord.checkout(pool)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 485, in checkout
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi rec.checkin()
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi compat.reraise(exc_type, exc_value, exc_tb)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 482, in checkout
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi dbapi_connection = rec.get_connection()
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 563, in get_connection
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi self.connection = self.__connect()
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 607, in __connect
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi connection = self.__pool._invoke_creator(self)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/strategies.py", line 97, in connect
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi return dialect.connect(*cargs, **cparams)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 385, in connect
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi return self.dbapi.connect(*cargs, **cparams)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/pymysql/__init__.py", line 90, in Connect
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi return Connection(*args, **kwargs)
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 688, in __init__
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi self.connect()
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 937, in connect
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi raise exc
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '127.0.0.1' ([Errno 111] Connection refused)") [SQL: u'SELECT 1']
2017-01-16 04:12:53.583 28905 TRACE keystone.common.wsgi

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22(pymysql.err.OperationalError)%20(2003%2C%20%5C%5C%5C%22Can't%20connect%20to%20MySQL%20server%20on%20'127.0.0.1'%20(%5BErrno%20111%5D%20Connection%20refused)%5C%5C%5C%22)%20%5BSQL%3A%20u'SELECT%201'%5D%5C%22%20AND%20message%3A%5C%22check_revocation_v3%5C%22%20AND%20tags%3A%5C%22keystone.txt%5C%22%20AND%20voting%3A1%20AND%20(build_queue%3A%5C%22check%5C%22%20OR%20build_queue%3A%5C%22gate%5C%22)&from=7d

104 hits in the check and gate in the last 7 days, 99% failures. 79% of the failures are on rax-ord nodes.

Revision history for this message
Matt Riedemann (mriedem) wrote :

It looks like this started showing up around 1/10.

Revision history for this message
Matt Riedemann (mriedem) wrote :

https://github.com/openstack/keystone/commit/d4a890a6c8bd6927e229f4b665a982a51c130073 merged on 1/10 and in the failure stacktrace it is listing revocation events (I think, I'm not familiar with this code or flow), but that change is just a policy change and makes things more restrictive so I'm not sure how it would be related.

Revision history for this message
Matt Riedemann (mriedem) wrote :

https://github.com/openstack-dev/devstack/commit/425ed145cb3d6ea67fbbb15d3307329efa265410 merged on 1/9 in devstack which enables the reseller feature in keystone/tempest for CI testing. I'm not sure what that reseller feature is though.

Revision history for this message
Matt Riedemann (mriedem) wrote :

There is an OOM error in syslog when mysql crashes:

http://logs.openstack.org/61/420261/1/check/gate-tempest-dsvm-cells-ubuntu-xenial/2b44ec1/logs/syslog.txt.gz#_Jan_16_04_12_53

Jan 16 04:12:53 ubuntu-xenial-rax-ord-6682904 kernel: Killed process 16686 (mysqld) total-vm:4707732kB, anon-rss:465436kB, file-rss:13016kB

summary: - DBConnectionError while validating tokens in CI runs
+ mysql OOM: DBConnectionError while validating tokens in CI runs
Revision history for this message
Lance Bragstad (lbragstad) wrote :

To the best of my knowledge, neither of the commits mentioned in comments 2 or 3 could cause connection errors using pymysql. There was a patch merged to keystone that allowed the retrieval of expired tokens [0], but that logic executes prior to keystone asking the database for a list of revocation events.

[0] https://review.openstack.org/#/c/382098/14/keystone/token/provider.py

Matt Riedemann (mriedem)
Changed in keystone:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Lance Bragstad (lbragstad) wrote :

After looking through the OOM trace in comment 4 - I did find that lvmetad seems to initiate the oom-killer.

lvmetad invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0

Revision history for this message
Lance Bragstad (lbragstad) wrote :

At the same time, there are several other things that initiate the oom-killer (systemd-udevd, neutron-openvsw, ebtables, kthreadd, to name a few). I'm not sure if that would be the most effective way to determine what is causing mysql to get selected by oom-killer.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

That's right @Lance, oom-killer chooses victims from the most heavies, which is likely the mysqld. Killing a heavy process with the most score benefits the OS the most, right?

The bug also seems very related to that thread http://lists.openstack.org/pipermail/openstack-dev/2017-February/thread.html#111568

Revision history for this message
Silvan Kaiser (2-silvan) wrote :

The duplicate bug is of priority 'wishlist' while this is 'high'. Did somebody request pushing up priority on the duplicate neutron bug?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.