Monasca thresh running in local mode

Bug #1808805 reported by Doug Szumski
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
Medium
Doug Szumski

Bug Description

Monasca thresholder is running in local mode and is not being submitted to the Storm cluster.

Doug Szumski (dszumski)
Changed in kolla-ansible:
assignee: nobody → Doug Szumski (dszumski)
Revision history for this message
Scott Shambarger (sshambar) wrote :

Hit this bug today... the latest kolla/monasca-thresh is actually crashing on startup because of this, as the code has been changed to require ZOOKEEPER_PORT when running in local mode, and when it's not present, it throws an exception.

Changed in kolla-ansible:
status: New → Confirmed
Revision history for this message
Scott Shambarger (sshambar) wrote :

Hit this on v11, but it doesn't appear fixed even in master.

OK, I've managed to fix the issue with a couple changes:

First, submit the topology to the storm cluster (and not run it locally):

--- ansible/roles/monasca/templates/monasca-thresh/monasca-thresh.json.j2.orig
+++ ansible/roles/monasca/templates/monasca-thresh/monasca-thresh.json.j2
@@ -1,3 +1,3 @@
 {
- "command": "/opt/storm/bin/storm jar /monasca-thresh-source/monasca-thresh-*/thresh/target/monasca-thresh-*-SNAPSHOT-shaded.jar -Djava.io.tmpdir=/var/lib/monasca-thresh/data monasca.thresh.ThresholdingEngine /etc/monasca/thresh-config.yml monasca-thresh local",
+ "command": "/opt/storm/bin/storm jar /monasca-thresh-source/monasca-thresh-*/thresh/target/monasca-thresh-*-SNAPSHOT-shaded.jar -Djava.io.tmpdir=/var/lib/monasca-thresh/data monasca.thresh.ThresholdingEngine /etc/monasca/thresh-config.yml monasca-thresh",
     "config_files": [

This will submit the topology, and then exit... so the second fix is to not restart the registration container:

--- ansible/roles/monasca/handlers/main.yml.orig
+++ ansible/roles/monasca/handlers/main.yml
@@ -73,2 +73,3 @@
     dimensions: "{{ service.dimensions }}"
+ restart_policy: no
   when:

With these combined changes, the topology is correctly registered, and running in the storm cloud.

NOTE: monasca_thresh will fail on future runs as the topology is already present... basically the start program should probably borrow the logic from the default container's docker/start.sh which waits for mariadb, kafka, and then checks if the topology is already present, and if so just exits normally... but that would probably require dropping in a monasca_extended_start file too...

If you like, I can submit a pull request...

Revision history for this message
Mark Goddard (mgoddard) wrote :

Hi Scott, feel free to propose a fix to gerrit.

Changed in kolla-ansible:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)
Changed in kolla-ansible:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/791337
Committed: https://opendev.org/openstack/kolla-ansible/commit/aea9bf355058a15e7ce7bb2649de3872e7041c89
Submitter: "Zuul (22348)"
Branch: master

commit aea9bf355058a15e7ce7bb2649de3872e7041c89
Author: Scott Shambarger <email address hidden>
Date: Thu May 13 17:42:03 2021 -0700

    monasca-thresh: Fix topology submission to storm

    monasca-thresh currently runs a local copy of the storm
    to handle the threshold topology. However, it doesn't setup
    the environment correctly, and the executable fails, causing
    the container to continually restart.

    This patch updates the container command to correctly
    submit the topology to the running Apache storm. The
    container will exit after it finishes the submission,
    so the restart_policy is updated to on-failure, this way
    if the storm is temporarily unavailable, the submission
    will be retried. (NOTE: further deploys will see the
    container as "changed" as it won't be running)

    Patch uses KOLLA_BOOTSTRAP to trigger the container to
    check if the topology is already submitted, and if so skips
    the submission command so the container doesn't fail.

    The config task now triggers a new reconfigure handler that
    spawns a one-shot container to replace any existing topology
    if the configuration has changed.

    Also, all the storm.* variables in storm.yml.j2 are
    removed as they were only needed for local mode and
    make submitted topologies fail to load when the storm
    is restarted (the referenced directories not mounted
    on nimbus).

    Depends-On: https://review.opendev.org/c/openstack/kolla/+/792751
    Closes-Bug: #1808805
    Change-Id: Ib225d76076782d695c9387e1c2693bae9a4521d7

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/804021

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/804022

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/804023

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/804022
Committed: https://opendev.org/openstack/kolla-ansible/commit/0d7f0828efb106aeea1dd56e61716a6bb91be321
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 0d7f0828efb106aeea1dd56e61716a6bb91be321
Author: Scott Shambarger <email address hidden>
Date: Thu May 13 17:42:03 2021 -0700

    monasca-thresh: Fix topology submission to storm

    monasca-thresh currently runs a local copy of the storm
    to handle the threshold topology. However, it doesn't setup
    the environment correctly, and the executable fails, causing
    the container to continually restart.

    This patch updates the container command to correctly
    submit the topology to the running Apache storm. The
    container will exit after it finishes the submission,
    so the restart_policy is updated to on-failure, this way
    if the storm is temporarily unavailable, the submission
    will be retried. (NOTE: further deploys will see the
    container as "changed" as it won't be running)

    Patch uses KOLLA_BOOTSTRAP to trigger the container to
    check if the topology is already submitted, and if so skips
    the submission command so the container doesn't fail.

    The config task now triggers a new reconfigure handler that
    spawns a one-shot container to replace any existing topology
    if the configuration has changed.

    Also, all the storm.* variables in storm.yml.j2 are
    removed as they were only needed for local mode and
    make submitted topologies fail to load when the storm
    is restarted (the referenced directories not mounted
    on nimbus).

    Depends-On: https://review.opendev.org/c/openstack/kolla/+/804019
    Closes-Bug: #1808805
    Change-Id: Ib225d76076782d695c9387e1c2693bae9a4521d7
    (cherry picked from commit aea9bf355058a15e7ce7bb2649de3872e7041c89)

tags: added: in-stable-victoria
tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/804023
Committed: https://opendev.org/openstack/kolla-ansible/commit/d79d0d3b3ba9a0a13e5b416c8c458aa4b7e8cc5b
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit d79d0d3b3ba9a0a13e5b416c8c458aa4b7e8cc5b
Author: Scott Shambarger <email address hidden>
Date: Thu May 13 17:42:03 2021 -0700

    monasca-thresh: Fix topology submission to storm

    monasca-thresh currently runs a local copy of the storm
    to handle the threshold topology. However, it doesn't setup
    the environment correctly, and the executable fails, causing
    the container to continually restart.

    This patch updates the container command to correctly
    submit the topology to the running Apache storm. The
    container will exit after it finishes the submission,
    so the restart_policy is updated to on-failure, this way
    if the storm is temporarily unavailable, the submission
    will be retried. (NOTE: further deploys will see the
    container as "changed" as it won't be running)

    Patch uses KOLLA_BOOTSTRAP to trigger the container to
    check if the topology is already submitted, and if so skips
    the submission command so the container doesn't fail.

    The config task now triggers a new reconfigure handler that
    spawns a one-shot container to replace any existing topology
    if the configuration has changed.

    Also, all the storm.* variables in storm.yml.j2 are
    removed as they were only needed for local mode and
    make submitted topologies fail to load when the storm
    is restarted (the referenced directories not mounted
    on nimbus).

    Depends-On: https://review.opendev.org/c/openstack/kolla/+/804020
    Closes-Bug: #1808805
    Change-Id: Ib225d76076782d695c9387e1c2693bae9a4521d7
    (cherry picked from commit aea9bf355058a15e7ce7bb2649de3872e7041c89)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/804021
Committed: https://opendev.org/openstack/kolla-ansible/commit/899ab44ab14038be686f8fd79e0dba86b96a1bd1
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 899ab44ab14038be686f8fd79e0dba86b96a1bd1
Author: Scott Shambarger <email address hidden>
Date: Thu May 13 17:42:03 2021 -0700

    monasca-thresh: Fix topology submission to storm

    monasca-thresh currently runs a local copy of the storm
    to handle the threshold topology. However, it doesn't setup
    the environment correctly, and the executable fails, causing
    the container to continually restart.

    This patch updates the container command to correctly
    submit the topology to the running Apache storm. The
    container will exit after it finishes the submission,
    so the restart_policy is updated to on-failure, this way
    if the storm is temporarily unavailable, the submission
    will be retried. (NOTE: further deploys will see the
    container as "changed" as it won't be running)

    Patch uses KOLLA_BOOTSTRAP to trigger the container to
    check if the topology is already submitted, and if so skips
    the submission command so the container doesn't fail.

    The config task now triggers a new reconfigure handler that
    spawns a one-shot container to replace any existing topology
    if the configuration has changed.

    Also, all the storm.* variables in storm.yml.j2 are
    removed as they were only needed for local mode and
    make submitted topologies fail to load when the storm
    is restarted (the referenced directories not mounted
    on nimbus).

    Depends-On: https://review.opendev.org/c/openstack/kolla/+/804018
    Closes-Bug: #1808805
    Change-Id: Ib225d76076782d695c9387e1c2693bae9a4521d7
    (cherry picked from commit aea9bf355058a15e7ce7bb2649de3872e7041c89)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 12.2.0

This issue was fixed in the openstack/kolla-ansible 12.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 13.0.0.0rc1

This issue was fixed in the openstack/kolla-ansible 13.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 10.4.0

This issue was fixed in the openstack/kolla-ansible 10.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 11.2.0

This issue was fixed in the openstack/kolla-ansible 11.2.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.