During a VIP failover, services colocated with the VIP are slow to recover

Bug #1643487 reported by Damien Ciabrini
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Undecided
Damien Ciabrini

Bug Description

In HA OpenStack deployments, OpenStack services on controller nodes access the database via a virtual IP. Haproxy listens to the VIP and forwards traffic to the galera nodes located on the controllers.

When both the OpenStack service and the VIP to connect to are located on the same node, a connection to the VIP will result in a TCP socket having its src IP and destination IP both bound to the VIP. This causes issue when the VIP is failed over to another controller node _when_ there are packets in the socket's Send-Q at kernel level. Keepalive doesn't apply, rather the persist timer kicks in; eventually the kernel will return a "connection time out" to the Openstack service, but only after a very long time (by default more than 10min). During this period, Openstack service won't recreate new connection and will be marked as "down" on the controller.

In order to prevent such socket connection from being created, tripleo should configure the DB settings to bind source to the controller network NIC. This is possible in latest version of PyMysql upstream.

Changed in tripleo:
assignee: nobody → Damien Ciabrini (dciabrin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/414629

Changed in tripleo:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/414629
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=56ebc7e58d117743363c4a251395d710bf511a2c
Submitter: Jenkins
Branch: master

commit 56ebc7e58d117743363c4a251395d710bf511a2c
Author: Damien Ciabrini <email address hidden>
Date: Fri Dec 23 17:57:48 2016 +0100

    DB connection: prevent src address from binding to a VIP

    When a service connects to the database VIP from the node hosting this
    VIP, the resulting TCP socket has a src address which is by default
    bound to the VIP as well. If the VIP is failed over to another node
    while the socket's Send-Q is not empty, TCP keepalive won't engage and
    the service will become unavailable for a very long time (by default
    more than 10m).

    To prevent failover issues, DB connections should have the src address
    of their TCP socket bound to the IP of the network interface used for
    MySQL traffic. This is achieved by passing a new option to the
    database connection URIs. This option is available starting from
    PyMySQL 0.7.9-2.

    We use a new intermediate variable in hiera to hold the IP to be used
    as a source address for all DB connections. All services adapt their
    database URI accordingly.

    Moreover, a new YAML validation check is added to guarantee that new
    services will construct their database URI appropriately.

    Change-Id: Ic69de63acbfb992314ea30a3a9b17c0b5341c035
    Closes-Bug: #1643487

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/417467

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/421181

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/mitaka)

Reviewed: https://review.openstack.org/421181
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=631c375673971363561aaac727d2cb581c14a52f
Submitter: Jenkins
Branch: stable/mitaka

commit 631c375673971363561aaac727d2cb581c14a52f
Author: Damien Ciabrini <email address hidden>
Date: Fri Dec 23 17:57:48 2016 +0100

    DB connection: prevent src address from binding to a VIP

    When a service connects to the database VIP from the node hosting this
    VIP, the resulting TCP socket has a src address which is by default
    bound to the VIP as well. If the VIP is failed over to another node
    while the socket's Send-Q is not empty, TCP keepalive won't engage and
    the service will become unavailable for a very long time (by default
    more than 10m).

    To prevent failover issues, DB connections should have the src address
    of their TCP socket bound to the IP of the network interface used for
    MySQL traffic. This is achieved by passing a new option to the
    database connection URIs. This option is available starting from
    PyMySQL 0.7.9-2.

    We use a new intermediate variable in hiera to hold the IP to be used
    as a source address for all DB connections. All services adapt their
    database URI accordingly.

    Moreover, a new YAML validation check is added to guarantee that new
    services will construct their database URI appropriately.

    This is a rework of commit 56ebc7e58d117743363c4a251395d710bf511a2c
    to backport the "src binding selection" feature in Mitaka. It reuses
    the variable conventions from the original commit, but is organized
    differently to match the way services are defined in Mitaka.

    Change-Id: Ic69de63acbfb992314ea30a3a9b17c0b5341c035
    Closes-Bug: #1643487

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/newton)

Reviewed: https://review.openstack.org/417467
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=0fbac20fd6b54f51625c952e3d87956367123c0a
Submitter: Jenkins
Branch: stable/newton

commit 0fbac20fd6b54f51625c952e3d87956367123c0a
Author: Damien Ciabrini <email address hidden>
Date: Fri Dec 23 17:57:48 2016 +0100

    DB connection: prevent src address from binding to a VIP

    When a service connects to the database VIP from the node hosting this
    VIP, the resulting TCP socket has a src address which is by default
    bound to the VIP as well. If the VIP is failed over to another node
    while the socket's Send-Q is not empty, TCP keepalive won't engage and
    the service will become unavailable for a very long time (by default
    more than 10m).

    To prevent failover issues, DB connections should have the src address
    of their TCP socket bound to the IP of the network interface used for
    MySQL traffic. This is achieved by passing a new option to the
    database connection URIs. This option is available starting from
    PyMySQL 0.7.9-2.

    We use a new intermediate variable in hiera to hold the IP to be used
    as a source address for all DB connections. All services adapt their
    database URI accordingly.

    Moreover, a new YAML validation check is added to guarantee that new
    services will construct their database URI appropriately.

    Change-Id: Ic69de63acbfb992314ea30a3a9b17c0b5341c035
    Closes-Bug: #1643487
    (cherry picked from commit 56ebc7e58d117743363c4a251395d710bf511a2c)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/430183

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/430183
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=6d27319b7c23dccb3005e6cbbc21ff2b44929fd8
Submitter: Jenkins
Branch: master

commit 6d27319b7c23dccb3005e6cbbc21ff2b44929fd8
Author: Oliver Walsh <email address hidden>
Date: Tue Feb 7 10:18:36 2017 +0000

    Stop setting bind_address on nova db uri.

    This reverts the changes in https://review.openstack.org/414629 for nova as
    they are incompatible with cell_v2.

    This is a temporary fix for HA while a long-term solution is developed.

    Change-Id: I79d30a2d76a354999152c0c997ea77f104c51027
    Related-bug: #1643487
    Closes-bug: #1662344

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/433607

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 6.0.0.0rc1

This issue was fixed in the openstack/tripleo-heat-templates 6.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Steven Hardy (<email address hidden>) on branch: master
Review: https://review.openstack.org/433607
Reason: Abandoning in favor of https://review.openstack.org/#/c/431425/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/431425
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=90431683b5927abb066d7964d513828b5488001c
Submitter: Jenkins
Branch: master

commit 90431683b5927abb066d7964d513828b5488001c
Author: Michele Baldessari <email address hidden>
Date: Thu Feb 9 11:14:03 2017 +0100

    Make the DB URIs host-independent for all services

    When fixing LP#1643487 we added ?bind_address to all DB URIs.
    Since this clashes with Cellsv2 due to the URIs becoming host
    dependent, we need a new approach to pass bind_address to pymysql
    that leaves the DB URIs host-independent.

    In change Iff8bd2d9ee85f7bb1445aa2e1b3cfbff1f397b18 we first create a
    /etc/my.cnf.d/tripleo.cnf file with a [tripleo] section with the correct
    bind-address option.

    In this change we make sure that the DB URIs will point to the added
    file and to the specific section containing the necessary bind-address
    option. We do introduce a new MySQLClient profile which will hold all
    this more client-specific configuration so that this change can fit
    better in the composable roles work. Also, in the future it might
    contain the necessary configuration for SSL for example.

    Note that in case the /etc/my.cnf.d/tripleo.cnf file does not exist
    (because it is created via the mysqlclient profile), things keep on
    working as usual and the bind-address option simply won't be set, which
    has no impact on hosts where there are no VIPs.

    Co-Authored-By: Damien Ciabrini <email address hidden>

    Change-Id: Ieac33efe38f32e949fd89545eb1cd8e0fe114a12
    Related-Bug: #1643487
    Closes-Bug: #1663181
    Closes-Bug: #1664524
    Depends-On: Iff8bd2d9ee85f7bb1445aa2e1b3cfbff1f397b18

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/436192

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/ocata)

Reviewed: https://review.openstack.org/436192
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=3a7ed4403fc5f53e88210577e5e912a1057574db
Submitter: Jenkins
Branch: stable/ocata

commit 3a7ed4403fc5f53e88210577e5e912a1057574db
Author: Michele Baldessari <email address hidden>
Date: Thu Feb 9 11:14:03 2017 +0100

    Make the DB URIs host-independent for all services

    When fixing LP#1643487 we added ?bind_address to all DB URIs.
    Since this clashes with Cellsv2 due to the URIs becoming host
    dependent, we need a new approach to pass bind_address to pymysql
    that leaves the DB URIs host-independent.

    In change Iff8bd2d9ee85f7bb1445aa2e1b3cfbff1f397b18 we first create a
    /etc/my.cnf.d/tripleo.cnf file with a [tripleo] section with the correct
    bind-address option.

    In this change we make sure that the DB URIs will point to the added
    file and to the specific section containing the necessary bind-address
    option. We do introduce a new MySQLClient profile which will hold all
    this more client-specific configuration so that this change can fit
    better in the composable roles work. Also, in the future it might
    contain the necessary configuration for SSL for example.

    Note that in case the /etc/my.cnf.d/tripleo.cnf file does not exist
    (because it is created via the mysqlclient profile), things keep on
    working as usual and the bind-address option simply won't be set, which
    has no impact on hosts where there are no VIPs.

    Co-Authored-By: Damien Ciabrini <email address hidden>

    Change-Id: Ieac33efe38f32e949fd89545eb1cd8e0fe114a12
    Related-Bug: #1643487
    Closes-Bug: #1663181
    Closes-Bug: #1664524
    Depends-On: Iff8bd2d9ee85f7bb1445aa2e1b3cfbff1f397b18
    (cherry picked from commit 90431683b5927abb066d7964d513828b5488001c)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 2.2.0

This issue was fixed in the openstack/tripleo-heat-templates 2.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 5.3.0

This issue was fixed in the openstack/tripleo-heat-templates 5.3.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.