Showing stack outputs is slow in Pike

Bug #1719333 reported by Zane Bitter
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
High
Zane Bitter
Pike
Fix Committed
High
Zane Bitter
Queens
Fix Released
High
Zane Bitter

Bug Description

There appears to have been a performance regression for showing stack outputs in Pike. This is causing timeouts in TripleO:

https://bugzilla.redhat.com/show_bug.cgi?id=1493263

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/507206

Changed in heat:
assignee: nobody → Zane Bitter (zaneb)
status: Triaged → In Progress
Revision history for this message
Zane Bitter (zaneb) wrote :

One issue, addressed by the patch above, is that we're loading more attribute values than we need to in the legacy path. "show stack" completes in ~half the time with that patch.

However, that doesn't seem to be the whole issue. Even when doing "show output" (so we're only getting a single output value, which ought to be very quick depending on the value chosen), it's still taking a considerable amount of time (~1 minute in TripleO) just to load all of the attribute values.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/507248

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to heat (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/507249

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/507250

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/507206
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=49d833f9aca351102bb1c9140231b809d8b4c519
Submitter: Jenkins
Branch: master

commit 49d833f9aca351102bb1c9140231b809d8b4c519
Author: Zane Bitter <email address hidden>
Date: Mon Sep 25 10:53:38 2017 -0400

    Speed up show-stack with outputs in legacy path

    When we show a stack including the outputs, we calculate all of the
    resource attributes that are referenced anywhere in the stack. In
    convergence, these are either already cached (and therefore fast) or need
    to be cached (and therefore the initial slowness will pay off in future).
    This isn't the case in the legacy path though, since we are not doing
    caching of attributes in the database in that path. So this is
    unnecessarily calculating all of the referenced attribute values, which are
    potentially very slow to get.

    For legacy stacks, only calculate the attribute values needed to show the
    outputs.

    Change-Id: I35800c7f87b58daf05cbabd05bcbcd75d0c0fadb
    Partial-Bug: #1719333

Changed in heat:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/507248
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=f1961c734e81f1e39c16b04558e17e4048c78c06
Submitter: Jenkins
Branch: master

commit f1961c734e81f1e39c16b04558e17e4048c78c06
Author: Zane Bitter <email address hidden>
Date: Mon Sep 25 13:55:41 2017 -0400

    Use show_output in TemplateResource.get_reference_id()

    TemplateResource is unique in that it can return a custom reference ID
    defined as an output in the nested template. This was being fetched by the
    StackResource.get_output() method, which also has the effect of retrieving
    *all* of the outputs of the nested stack and caching them in memory in case
    something else were to reference any of them.

    Unfortunately when calculating the resource data for a stack (which we
    must always do when e.g. showing the outputs), we always include the
    reference IDs of all resources, regardless of whether they are referenced
    by get_resource in the data we are looking to populate. (In fact, we have
    no way from the Template API to distinguish where get_resource is used on
    a particular resource, only where there are dependencies on it.) This is no
    problem under the assumption that getting the reference ID is quick, but
    that assumption does not hold for TemplateResource.

    The show_output RPC call only retrieves a single output (as opposed to
    show_stack, used in StackResource.get_output(), which calculates all of
    them). Fall back to that call in TemplateResource.get_reference_id() if the
    outputs are not already cached to avoid unnecessary overhead in the common
    case.

    Attribute values are now always fetched before the reference ID, so that we
    won't end up making two RPC calls in the case where we also need to read
    other outputs.

    Change-Id: I66da13c0bb024749de4ae3f0c4b06ebb485cee37
    Closes-Bug: #1719333

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/507930

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/507931

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/pike)

Reviewed: https://review.openstack.org/507930
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=ab46fae9cf386b9578df4dd8ff35d68b281c7c0c
Submitter: Jenkins
Branch: stable/pike

commit ab46fae9cf386b9578df4dd8ff35d68b281c7c0c
Author: Zane Bitter <email address hidden>
Date: Mon Sep 25 10:53:38 2017 -0400

    Speed up show-stack with outputs in legacy path

    When we show a stack including the outputs, we calculate all of the
    resource attributes that are referenced anywhere in the stack. In
    convergence, these are either already cached (and therefore fast) or need
    to be cached (and therefore the initial slowness will pay off in future).
    This isn't the case in the legacy path though, since we are not doing
    caching of attributes in the database in that path. So this is
    unnecessarily calculating all of the referenced attribute values, which are
    potentially very slow to get.

    For legacy stacks, only calculate the attribute values needed to show the
    outputs.

    Change-Id: I35800c7f87b58daf05cbabd05bcbcd75d0c0fadb
    Partial-Bug: #1719333
    (cherry picked from commit 49d833f9aca351102bb1c9140231b809d8b4c519)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to heat (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/508312

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/pike)

Reviewed: https://review.openstack.org/507931
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=d00f1a40f82ba6ffec9d2fbc71822df977f7cc01
Submitter: Jenkins
Branch: stable/pike

commit d00f1a40f82ba6ffec9d2fbc71822df977f7cc01
Author: Zane Bitter <email address hidden>
Date: Mon Sep 25 13:55:41 2017 -0400

    Use show_output in TemplateResource.get_reference_id()

    TemplateResource is unique in that it can return a custom reference ID
    defined as an output in the nested template. This was being fetched by the
    StackResource.get_output() method, which also has the effect of retrieving
    *all* of the outputs of the nested stack and caching them in memory in case
    something else were to reference any of them.

    Unfortunately when calculating the resource data for a stack (which we
    must always do when e.g. showing the outputs), we always include the
    reference IDs of all resources, regardless of whether they are referenced
    by get_resource in the data we are looking to populate. (In fact, we have
    no way from the Template API to distinguish where get_resource is used on
    a particular resource, only where there are dependencies on it.) This is no
    problem under the assumption that getting the reference ID is quick, but
    that assumption does not hold for TemplateResource.

    The show_output RPC call only retrieves a single output (as opposed to
    show_stack, used in StackResource.get_output(), which calculates all of
    them). Fall back to that call in TemplateResource.get_reference_id() if the
    outputs are not already cached to avoid unnecessary overhead in the common
    case.

    Attribute values are now always fetched before the reference ID, so that we
    won't end up making two RPC calls in the case where we also need to read
    other outputs.

    Change-Id: I66da13c0bb024749de4ae3f0c4b06ebb485cee37
    Closes-Bug: #1719333
    (cherry picked from commit f1961c734e81f1e39c16b04558e17e4048c78c06)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/heat 10.0.0.0b1

This issue was fixed in the openstack/heat 10.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/heat 9.0.1

This issue was fixed in the openstack/heat 9.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to heat (master)

Reviewed: https://review.openstack.org/507249
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=af0feeb18a4f5fb2c20fffb6d85617d1775e5844
Submitter: Zuul
Branch: master

commit af0feeb18a4f5fb2c20fffb6d85617d1775e5844
Author: Zane Bitter <email address hidden>
Date: Mon Sep 25 14:32:13 2017 -0400

    Ignore errors in OS::stack_id output

    If a provider stack contained an OS::stack_id output and there was an error
    in the output, we would raise TemplateOutputError when trying to calculate
    the reference ID of the facade resource. Since we do that in many API
    calls, such an error could render the stack effectively unusable.

    If we encounter such an error, log it and fall back to the default
    reference ID.

    Change-Id: I1bc921fe74c54eb0999541ef36afc42b9c19e9bc
    Partial-Bug: #1712280
    Related-Bug: #1719333

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to heat (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/523137

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to heat (master)

Reviewed: https://review.openstack.org/508312
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=6857b7f3686e93805bcddae91f307c6b62aa72d6
Submitter: Zuul
Branch: master

commit 6857b7f3686e93805bcddae91f307c6b62aa72d6
Author: Zane Bitter <email address hidden>
Date: Thu Sep 28 17:14:25 2017 -0400

    Avoid RPC call in TemplateResource.get_reference_id()

    Most TemplateResources probably don't have an OS::stack_id output defined
    in the template, so it's unfortunate that we have to make an RPC call to
    check that every time we retrieve the resource's reference_id, which we
    have to do for most stack operations.

    To cut down on unnecessary RPC calls, check the template first locally to
    see if there is an OS::stack_id output present.

    Change-Id: Ia32ed6bca453b391371f544ad0a07d49dc0616e3
    Related-Bug: #1719333

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/507250
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=2e4a6e237cf226eb61cf905a72cfb5eb3d5177f0
Submitter: Zuul
Branch: master

commit 2e4a6e237cf226eb61cf905a72cfb5eb3d5177f0
Author: Zane Bitter <email address hidden>
Date: Tue Dec 19 16:36:43 2017 -0500

    Use appropriate exception in StackResource.get_output()

    Don't raise InvalidTemplateAttribute in StackResource.get_output() when an
    output does not exist - it's not the case that get_output() is only used
    for fetching attributes. Instead, raise NotFound from get_output(), and
    translate that to InvalidTemplateAttribute in the caller when we are
    actually fetching an attribute.

    Change-Id: I4f883e4b583a965785d0a595c8c33b47dc94118c
    Related-Bug: #1719333

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to heat (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/533038

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/533047

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to heat (master)

Reviewed: https://review.openstack.org/533038
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=f965ff615e3d4df0a2b38477d5441604cd0b85e5
Submitter: Zuul
Branch: master

commit f965ff615e3d4df0a2b38477d5441604cd0b85e5
Author: Zane Bitter <email address hidden>
Date: Thu Jan 11 21:44:27 2018 -0500

    Eliminate errors getting TemplateResource OS::stack_id

    We want to avoid making an RPC call to get the OS::stack_id output in order
    to calculate the reference ID of a TemplateResource whenever possible -
    which should be most of the time given the limited use of this feature in
    the wild - since this happens virtually every time we load the stack.
    Previously, we were doing this by trying to load the template from the
    parent stack's environment, and combine it with the stored parameters
    (using live parameter values caused failures). However, this resulted in
    failures in the case that the template in the parent stack's environment
    differed from the one that was last used to update the resource if the
    parameter schema had been modified in an incompatible way.

    It transpires that we already determine what outputs are available at
    object instantiation time in order to populate the attributes_schema. So we
    can just use the attributes schema to infer the existence or otherwise of
    the output without doing any other calculation in get_reference_id() that
    could cause errors.

    This method doesn't fail-safe, in the sense that if there are any errors
    then we end up with an empty schema and won't fall back to making the RPC
    call. However, if there are any errors then probably all bets are off for
    the reference ID anyway. In common with the old method, it also has the
    drawback of using the latest template rather than the existing one
    (although again, once you do an update removing the OS::stack_id output
    then all bets are off in terms of continuing to see that value as the
    reference ID).

    Unlike the previous method, this doesn't take into account disabled outputs
    (since calculating conditions requires access to the parameters, which were
    the cause of the problem). Also, the code that calculates the attributes
    schema doesn't use a proper API but hackily dives into the raw template
    data, and fixing it would raise all of the same issues that we've had with
    trying to load the template properly here. So we are kicking the can down
    the road a bit or, alternatively, consolidating our issues into a single
    place, depending on your point of view.

    Change-Id: Ic63fe290249dba5bc00abbde1f4df608181d6a9c
    Closes-Bug: #1742646
    Related-Bug: #1719333

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/533047
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=3cd84afcfc71ad41dceb15619a3c89ce21db605c
Submitter: Zuul
Branch: master

commit 3cd84afcfc71ad41dceb15619a3c89ce21db605c
Author: Zane Bitter <email address hidden>
Date: Thu Jan 11 20:11:04 2018 -0500

    Cache the TemplateResource reference ID like an attribute

    TemplateResources can have their reference ID defined by an OS::stack_id
    output in the template. However getting it - or just checking that it even
    exists - is very expensive, and it happens virtually every time we load the
    stack in memory for any reason.

    Since when the output is present the reference ID is also available as an
    attribute, simply cache it in the database as if it were an attribute
    regardless of whether the output exists. Once it is in the cache,
    subsequent accesses will be cheap just like they are for attributes.

    Keep previous performance optimisations in place as well, since attributes
    are not stored in the database in the legacy path (so legacy stacks don't
    benefit from this change), and since the cache does get invalidated e.g. on
    every stack update.

    Change-Id: Ib9cd0aa40d377ec227754e386e02f185fd871909
    Closes-Bug: #1742847
    Related-Bug: #1719333

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to heat (stable/pike)

Reviewed: https://review.openstack.org/523137
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=6a27419521bae9cf6b0ca033d04ef268abebe1bd
Submitter: Zuul
Branch: stable/pike

commit 6a27419521bae9cf6b0ca033d04ef268abebe1bd
Author: Zane Bitter <email address hidden>
Date: Mon Sep 25 14:32:13 2017 -0400

    Ignore errors in OS::stack_id output

    If a provider stack contained an OS::stack_id output and there was an error
    in the output, we would raise TemplateOutputError when trying to calculate
    the reference ID of the facade resource. Since we do that in many API
    calls, such an error could render the stack effectively unusable.

    If we encounter such an error, log it and fall back to the default
    reference ID.

    Change-Id: I1bc921fe74c54eb0999541ef36afc42b9c19e9bc
    Partial-Bug: #1712280
    Related-Bug: #1719333
    (cherry picked from commit af0feeb18a4f5fb2c20fffb6d85617d1775e5844)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.