Under high load, DB Connection Pool get's exhausted

Bug #1445123 reported by Kiall Mac Innes
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Designate
Fix Released
High
Kiall Mac Innes
Kilo
Fix Released
High
Kiall Mac Innes

Bug Description

Under high load, DB Connection Pool get's exhausted.

No matter how high we set the max pool, we consume all the connections,

2015-04-16 15:34:12.946 27446 ERROR oslo_messaging.rpc.dispatcher [req-d1d98dc1-5162-45fa-891d-34693e9e35d9 d5ce18647f3f421b9719fd11faaf3284 abd7ca91f8474abd935f780b3daa547f - - -] Exception during message handling: QueuePool limit of size 500 overflow 10 reached, connection timed out, timeout 30
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher executor_callback))
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/designate/rpc.py", line 178, in _dispatch
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher return super(RPCDispatcher, self)._dispatch(*args, **kwds)
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher executor_callback)
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/designate/central/service.py", line 2132, in update_status
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher self._update_record_status(context, domain_id, status, serial)
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/designate/central/service.py", line 2176, in _update_record_status
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher records = self.storage.find_records(context, criterion=criterion)
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/designate/storage/impl_sqlalchemy/__init__.py", line 650, in find_records
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher sort_dir=sort_dir)
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/designate/storage/impl_sqlalchemy/__init__.py", line 620, in _find_records
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher sort_key, sort_dir)
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/designate/sqlalchemy/base.py", line 251, in _find
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher resultproxy = self.session.execute(query)
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1021, in execute
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher bind, close_with_result=True).execute(clause, params or {})
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 887, in _connection_for_bind
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher conn = engine.contextual_connect(**kw)
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/compat/handle_error.py", line 208, in contextual_connect
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher self._wrap_pool_connect(self.pool.connect, None),
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/compat/handle_error.py", line 215, in _wrap_pool_connect
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher return fn()
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 338, in connect
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher return _ConnectionFairy._checkout(self)
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 645, in _checkout
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher fairy = _ConnectionRecord.checkout(pool)
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 440, in checkout
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher rec = pool._do_get()
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/venvs/designate/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 960, in _do_get
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher (self.size(), self.overflow(), self._timeout))
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher TimeoutError: QueuePool limit of size 500 overflow 10 reached, connection timed out, timeout 30
2015-04-16 15:34:12.946 27446 TRACE oslo_messaging.rpc.dispatcher

Revision history for this message
Kiall Mac Innes (kiall) wrote :

Relates to bug 1445123

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to designate (master)

Fix proposed to branch: master
Review: https://review.openstack.org/176340

Changed in designate:
assignee: nobody → Kiall Mac Innes (kiall)
status: Confirmed → In Progress
Thierry Carrez (ttx)
tags: removed: kilo-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to designate (master)

Reviewed: https://review.openstack.org/176340
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=ac79df1567fb040067ed59d948e8a39fb90af177
Submitter: Jenkins
Branch: master

commit ac79df1567fb040067ed59d948e8a39fb90af177
Author: Kiall Mac Innes <email address hidden>
Date: Wed Apr 22 15:42:29 2015 +0100

    central.update_status should be transactional

    Central's update status method issues a large number of SQL
    queries for large zones with many recent updates, as a result
    it's checking out many connections from our connection pool one
    after another, contributing to bug 1445123. By wrapping the
    call in a transaction, a single connection will be checked out
    of the pool.

    Partial-Bug: 1445123
    Change-Id: Idfde7578e1a17a6a0585835fe614f787102563fa

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to designate (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/177107

Revision history for this message
Thierry Carrez (ttx) wrote :

Actually fixes the bug, according to Kiall

Changed in designate:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to designate (stable/kilo)

Reviewed: https://review.openstack.org/177107
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=2900cd391efae38be1121e58e72f5519f4a95e90
Submitter: Jenkins
Branch: stable/kilo

commit 2900cd391efae38be1121e58e72f5519f4a95e90
Author: Kiall Mac Innes <email address hidden>
Date: Wed Apr 22 15:42:29 2015 +0100

    central.update_status should be transactional

    Central's update status method issues a large number of SQL
    queries for large zones with many recent updates, as a result
    it's checking out many connections from our connection pool one
    after another, contributing to bug 1445123. By wrapping the
    call in a transaction, a single connection will be checked out
    of the pool.

    Partial-Bug: 1445123
    Change-Id: Idfde7578e1a17a6a0585835fe614f787102563fa
    (cherry picked from commit ac79df1567fb040067ed59d948e8a39fb90af177)

tags: added: in-stable-kilo
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to designate (master)

Fix proposed to branch: master
Review: https://review.openstack.org/179248

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to designate (master)
Download full text (6.6 KiB)

Reviewed: https://review.openstack.org/179248
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=51a4d4dc1aac7b3eee62747b5000c22f6524c3f9
Submitter: Jenkins
Branch: master

commit f6ee98e4ed4529b0d699f3cf2d107a1ed8faa6f0
Author: stanzgy <email address hidden>
Date: Wed Apr 22 13:53:32 2015 +0800

    Remove duplicated index on table 'records' of pDNS backend

    Remove duplicated index `rec_name_index` on table 'records' of pDNS backend.

    Change-Id: I9e8723b464522a588f0e0ef6ff261b71609b0726
    Closes-Bug: 1446980

commit 2900cd391efae38be1121e58e72f5519f4a95e90
Author: Kiall Mac Innes <email address hidden>
Date: Wed Apr 22 15:42:29 2015 +0100

    central.update_status should be transactional

    Central's update status method issues a large number of SQL
    queries for large zones with many recent updates, as a result
    it's checking out many connections from our connection pool one
    after another, contributing to bug 1445123. By wrapping the
    call in a transaction, a single connection will be checked out
    of the pool.

    Partial-Bug: 1445123
    Change-Id: Idfde7578e1a17a6a0585835fe614f787102563fa
    (cherry picked from commit ac79df1567fb040067ed59d948e8a39fb90af177)

commit c7865e2c5e220b6fa9233c4985e657298d9935ae
Author: Graham Hayes <email address hidden>
Date: Thu Apr 23 16:54:04 2015 +0100

    Add the /v2/zones/<id>/nameservers endpoint back

    APIImpact
    Closes-Bug: #1447683

    Change-Id: I80b51e4e04fc7395d8d1f32a3b3d34f09520c75c

commit 56c8877b7738c434470e0867432fdd7c454a126e
Author: Tim Simmons <email address hidden>
Date: Wed Apr 15 21:04:46 2015 +0000

    Update JSON in howtos examples to match the API

    Some of the howto examples haven't been updated to match the removal
    of wrapping resources in the API, this fixes that

    Closes-Bug: 1447663
    Change-Id: I84a349e2848329a63f84b6073ba65ac5fd56877c
    (cherry picked from commit fc2f733cb53bcce7f03d5c932d99caebb2fe2ce9)

commit b791937b717789d6d0bc0011272ac345b18c46a0
Author: Kiall Mac Innes <email address hidden>
Date: Tue Apr 21 13:17:16 2015 +0100

    Service ThreadPool size should be configurable

    Change-Id: Icbf8873723f61990a76dae5f698b3cc370a48ff0
    Closes-Bug: 1444901
    (cherry picked from commit ad922201989ff0fde6e449e3c06a0208d809886d)

commit 4e937688a4051860ad92349e82cab557c8672958
Author: Kiall Mac Innes <email address hidden>
Date: Wed Apr 22 13:02:23 2015 +0100

    Handle socket.error's in mDNS

    This ensures we don't log stacktraces for conditions outside
    of our control, like a remote client closing the connection
    abruptly.

    Closes-Bug: 1437699
    Change-Id: Idd54ec4ef0dce1a731a0c6b0e77375bf5f42080c
    (cherry picked from commit fb58ff9f7b6ffe8bd8d3b9129091f34dd47b1128)

commit 4cefc818be22ebe097ab6f08569933f98ea33e3c
Author: Graham Hayes <email address hidden>
Date: Thu Apr 16 18:48:42 2015 +0200

    Move the Central rpcapi update_status to cast

    Moves the update_status call (used by pool manager) to a cast.

    Pool Manager doesn't actually do anything with the return value,
    and causes another choke point.

    Cl...

Read more...

Thierry Carrez (ttx)
Changed in designate:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in designate:
milestone: liberty-1 → 1.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.