TripleO standalone upgrade job fails when Ceph enabled

Bug #1867144 reported by Carlos Goncalves
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Incomplete
Critical
Jose Luis Franco

Bug Description

This issue was found in https://review.opendev.org/#/c/710473/2

The CI job deploys TripleO standalone with Octavia and Ceph enabled just fine. The problem is when the job triggers the upgrade.

2020-02-29 16:57:00,514 p=110492 u=root | TASK [ceph-mon : include_tasks ceph_keys.yml] **********************************
2020-02-29 16:57:00,515 p=110492 u=root | Saturday 29 February 2020 16:57:00 +0000 (0:00:00.683) 0:11:51.286 *****
2020-02-29 16:57:00,682 p=110492 u=root | included: /usr/share/ceph-ansible/roles/ceph-mon/tasks/ceph_keys.yml for standalone
2020-02-29 16:57:00,813 p=110492 u=root | TASK [ceph-mon : waiting for the monitor(s) to form the quorum...] *************
2020-02-29 16:57:00,813 p=110492 u=root | Saturday 29 February 2020 16:57:00 +0000 (0:00:00.298) 0:11:51.585 *****
2020-02-29 17:47:01,885 p=110492 u=root | FAILED - RETRYING: waiting for the monitor(s) to form the quorum... (10 retries left).

https://zuul.opendev.org/t/openstack/build/125e42ce57724b67b368a3f8f15f4c21/log/logs/undercloud/home/zuul/standalone-ansible-T9r6F_/ceph-ansible/ceph_ansible_command.log#5597

More information: http://eavesdrop.openstack.org/irclogs/%23tripleo/%23tripleo.2020-03-12.log.html#t2020-03-12T12:35:01

Revision history for this message
John Fulton (jfulton-org) wrote :

The bug is that maybe the wrong playbook was triggered site-contianer.yml instead of rolling_update.yml.

Either way it's not possible to use rolling_update.yml to upgrade a 1-node ceph cluster so the job itself needs to redefined.

Only thing we can do to fix the bug is ensure the correct playbook is runwhen external-upgrade is triggered.

Revision history for this message
Jose Luis Franco (jfrancoa) wrote :

So, the standlone_upgrade by itself does not trigger the external_upgrade_tasks with --tags ceph_systemd (which is the piece of code needed to change the ceph_playbook in ceph-ansible: https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/ceph-ansible/ceph-base.yaml#L675-L681)

If you see the pythontripleo-client code, when calling "openstack tripleo upgrade" we call to all these operations: https://github.com/openstack/python-tripleoclient/blob/735acf94e3791c468ee543b50c94f359b97814f5/tripleoclient/v1/tripleo_deploy.py#L1292-L1308 which translates into this structure https://github.com/openstack/python-tripleoclient/blob/735acf94e3791c468ee543b50c94f359b97814f5/tripleoclient/constants.py#L110-L132:

1st: upgrade_steps_playbook.yaml
2nd: deploy_steps_playbook.yaml
3th: post_upgrade_steps_playbook.yaml
4th: external_upgrade_steps_playbook.yaml --tags online_upgrade

So, no sign of external_upgrade_steps_playbook passing --tags ceph_systemd or --tags ceph which is the part we need to execute the ceph ugprade bits https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/ceph-ansible/ceph-base.yaml#L694

However, as John mentioned, I'm really wondering the purpose of this scenario with a single ceph-mon. We disabled ceph from our upgrades jobs long time ago as the upgrade required having at least three ceph-mons, so maybe you could pass the upgrades part but it would probably fail when running the "external_upgrade_steps_playbook --tags ceph". If the team considers this could work and it's a valid scenario I'm happy to help implementing this feature.

wes hayutin (weshayutin)
Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → Jose Luis Franco (jfrancoa)
milestone: none → ussuri-3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-3 → ussuri-rc3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
Changed in tripleo:
milestone: victoria-3 → wallaby-1
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Changed in tripleo:
milestone: wallaby-3 → wallaby-rc1
Changed in tripleo:
milestone: wallaby-rc1 → xena-1
Revision history for this message
Marios Andreou (marios-b) wrote :

This is an automated action. Bug status has been set to 'Incomplete' and target milestone has been removed due to inactivity. If you disagree please re-set these values and reach out to us on freenode #tripleo

Changed in tripleo:
milestone: xena-1 → none
status: Triaged → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.