Bug #1554332 “neutron agents are too aggressive under server loa...” : Bugs : neutron

Kevin Benton (kevinbenton) on 2016-03-08

Changed in neutron:
assignee:	nobody → Kevin Benton (kevinbenton)
milestone:	none → mitaka-rc1

OpenStack Infra (hudson-openstack) on 2016-03-08

Changed in neutron:
status:	New → In Progress

Kevin Benton (kevinbenton) on 2016-03-08

Changed in neutron:
importance:	Undecided → High

Ihar Hrachyshka (ihar-hrachyshka) on 2016-03-09

tags:

added: loadimpact

Revision history for this message

Ihar Hrachyshka (ihar-hrachyshka) wrote on 2016-03-13:

#1

Related: https://review.openstack.org/#/c/280595/

Armando Migliaccio (armando-migliaccio) on 2016-03-14

Changed in neutron:
milestone:	mitaka-rc1 → newton-1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-06: Fix merged to neutron (master)

#2

Reviewed: https://review.openstack.org/280595
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3e668b6a3720c1509ffef4ad5b91b4242dfd47b3
Submitter: Jenkins
Branch: master

commit 3e668b6a3720c1509ffef4ad5b91b4242dfd47b3
Author: Kevin Benton <email address hidden>
Date: Tue Feb 16 01:50:23 2016 -0800

Add exponential back-off RPC client

    This adds an exponential backoff mechanism for timeout values
    on any RPC calls in Neutron that don't explicitly request a timeout
    value. This will prevent the clients from DDoSing the server by
    giving up on requests and retrying them before they are fulfilled.

    Each RPC call method in each namespace gets its own timeout value since
    some calls are expected to be much more expensive than others and we
    don't want to modify the timeouts of cheap calls.

    The backoff currently has no reduction mechanism under the assumption
    that timeouts not legitimately caused by heavy system load
    (i.e. messages completely dropped by AMQP) are rare enough that the
    cost of shrinking the timeout back down and potentially causing
    another server timeout isn't worth it. The timeout does have a ceiling
    of 10 times the configured default timeout value.

    Whenever a timeout exception occurs, the client will also sleep for a
    random value between 0 and the configured default timeout value to
    introduce a splay across all of the agents that may be trying to
    communicate with the server.

    This patch is intended to be uninvasive for candidacy to be
    back-ported. A larger refactor of delivering data to the agents
    is being discussed in I3af200ad84483e6e1fe619d516ff20bc87041f7c.

Closes-Bug: #1554332
Change-Id: I923e415c1b8e9a431be89221c78c14f39c42c80f

Changed in neutron:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-09: Fix proposed to neutron (stable/mitaka)

#3

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/314317

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-09: Fix proposed to neutron (stable/liberty)

#4

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/314319

Sheena Conant (sheena-conant) on 2016-05-11

tags:

added: scale

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-14: Fix merged to neutron (stable/liberty)

#5

Reviewed: https://review.openstack.org/314319
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0ffab0930cb8a4e283fa3920fc704aab94385e26
Submitter: Jenkins
Branch: stable/liberty

commit 0ffab0930cb8a4e283fa3920fc704aab94385e26
Author: Kevin Benton <email address hidden>
Date: Tue Feb 16 01:50:23 2016 -0800

Add exponential back-off RPC client

    This adds an exponential backoff mechanism for timeout values
    on any RPC calls in Neutron that don't explicitly request a timeout
    value. This will prevent the clients from DDoSing the server by
    giving up on requests and retrying them before they are fulfilled.

    Each RPC call method in each namespace gets its own timeout value since
    some calls are expected to be much more expensive than others and we
    don't want to modify the timeouts of cheap calls.

    The backoff currently has no reduction mechanism under the assumption
    that timeouts not legitimately caused by heavy system load
    (i.e. messages completely dropped by AMQP) are rare enough that the
    cost of shrinking the timeout back down and potentially causing
    another server timeout isn't worth it. The timeout does have a ceiling
    of 10 times the configured default timeout value.

    Whenever a timeout exception occurs, the client will also sleep for a
    random value between 0 and the configured default timeout value to
    introduce a splay across all of the agents that may be trying to
    communicate with the server.

    This patch is intended to be uninvasive for candidacy to be
    back-ported. A larger refactor of delivering data to the agents
    is being discussed in I3af200ad84483e6e1fe619d516ff20bc87041f7c.

    Closes-Bug: #1554332
    Change-Id: I923e415c1b8e9a431be89221c78c14f39c42c80f
    (cherry picked from commit 3e668b6a3720c1509ffef4ad5b91b4242dfd47b3)

tags:

added: in-stable-liberty

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-17: Fix merged to neutron (stable/mitaka)

#6

Reviewed: https://review.openstack.org/314317
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3ab2ada748a5f66c3f1dd717223a2d180f686c89
Submitter: Jenkins
Branch: stable/mitaka

commit 3ab2ada748a5f66c3f1dd717223a2d180f686c89
Author: Kevin Benton <email address hidden>
Date: Tue Feb 16 01:50:23 2016 -0800

Add exponential back-off RPC client

    This adds an exponential backoff mechanism for timeout values
    on any RPC calls in Neutron that don't explicitly request a timeout
    value. This will prevent the clients from DDoSing the server by
    giving up on requests and retrying them before they are fulfilled.

    Each RPC call method in each namespace gets its own timeout value since
    some calls are expected to be much more expensive than others and we
    don't want to modify the timeouts of cheap calls.

    The backoff currently has no reduction mechanism under the assumption
    that timeouts not legitimately caused by heavy system load
    (i.e. messages completely dropped by AMQP) are rare enough that the
    cost of shrinking the timeout back down and potentially causing
    another server timeout isn't worth it. The timeout does have a ceiling
    of 10 times the configured default timeout value.

    Whenever a timeout exception occurs, the client will also sleep for a
    random value between 0 and the configured default timeout value to
    introduce a splay across all of the agents that may be trying to
    communicate with the server.

    This patch is intended to be uninvasive for candidacy to be
    back-ported. A larger refactor of delivering data to the agents
    is being discussed in I3af200ad84483e6e1fe619d516ff20bc87041f7c.

    Closes-Bug: #1554332
    Change-Id: I923e415c1b8e9a431be89221c78c14f39c42c80f
    (cherry picked from commit 3e668b6a3720c1509ffef4ad5b91b4242dfd47b3)

tags:

added: in-stable-mitaka

Revision history for this message

Thierry Carrez (ttx) wrote on 2016-06-01: Fix included in openstack/neutron 8.1.1

#7

This issue was fixed in the openstack/neutron 8.1.1 release.

Revision history for this message

Thierry Carrez (ttx) wrote on 2016-06-01: Fix included in openstack/neutron 7.1.0

#8

This issue was fixed in the openstack/neutron 7.1.0 release.

Revision history for this message

Doug Hellmann (doug-hellmann) wrote on 2016-06-03: Fix included in openstack/neutron 9.0.0.0b1

#9

This issue was fixed in the openstack/neutron 9.0.0.0b1 development milestone.

neutron

neutron agents are too aggressive under server load

Bug Description

Other bug subscribers

Remote bug watches