ci ocata upgrade jobs seem to be failing on pacemaker stonith property

Bug #1688322 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Michele Baldessari

Bug Description

Seen at least twice on https://review.openstack.org/#/c/462480/

For example:
http://logs.openstack.org/80/462480/1/check/gate-tripleo-ci-centos-7-multinode-upgrades/b78ee4c/

We see on the controller node:
'#033[0mMay 4 13:46:44 centos-7-2-node-osic-cloud1-s3500-8719947-570244 os-collect-config: #033[mNotice: /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns: executed successfully#033[0mMay 4 13:46:44 centos-7-2-node-osic-cloud1-s3500-8719947-570244 os-collect-config: #033[mNotice: /Firewall[998 log all]: Dependency Pcmk_property[property--stonith-enabled] has failures: true#033[0mMay 4 13:46:44 centos-7-2-node-osic-cloud1-s3500-8719947-570244 os-collect-config: #033[mNotice: /Firewall[999 drop all]: Dependency Pcmk_property[property--stonith-enabled] has failures: true#033[0mMay 4 13:46:44 centos-7-2-node-osic-cloud1-s3500-8719947-570244 os-collect-config: #033[mNotice: Finished catalog run in 67.86 seconds#033[0mMay 4 13:46:44 centos-7-2-node-osic-cloud1-s3500-8719947-570244 os-collect-config: [2017-05-04 13:46:44,430] (heat-config) [INFO] exception: connect failedMay 4 13:46:44 centos-7-2-node-osic-cloud1-s3500-8719947-570244 os-collect-config: #033[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.#033[0m
May 4 13:46:44 centos-7-2-node-osic-cloud1-s3500-8719947-570244 os-collect-config: Error: unable to get cib
May 4 13:46:44 centos-7-2-node-osic-cloud1-s3500-8719947-570244 os-collect-config: Error: /Stage[main]/Pacemaker::Stonith/Pacemaker::Property[Disable STONITH]/Pcmk_property[property--stonith-enabled]: Could not evaluate: backup_cib: Running: /usr/sbin/pcs cluster cib /var/lib/pacemaker/cib/puppet-cib-backup20170504-29225-fetp7k failed with code: 1 -> #033[0mMay 4 13:46:44 centos-7-2-node-osic-cloud1-s3500-8719947-570244 os-collect-config: #033[1;31mWarning: /Firewall[998 log all]: Skipping because of failed dependencies#033[0mMay 4 13:46:44 centos-7-2-node-osic-cloud1-s3500-8719947-570244 os-collect-config: #033[1;31mWarning: /Firewall[999 drop all]: Skipping because of failed dependencies#033[0m

I wonder if some recent firewall changes changed the dependency chain and now pcs can't talk to pcsd

Tags: ci
Revision history for this message
Michele Baldessari (michele) wrote :

Or maybe the updated pin for newton/ocata is triggering this problem: https://review.rdoproject.org/r/#/c/6519/

Looking at the changelog I can't really see why that would be though

Revision history for this message
Michele Baldessari (michele) wrote :

Logs for a successful ocata upgrade job: http://logs.openstack.org/76/461076/1/check/gate-tripleo-ci-centos-7-multinode-upgrades/066f572/ (version of puppet-pacemaker-0.5.1-0.20170406192929.0763607.el7.centos.noarch
)

Revision history for this message
Michele Baldessari (michele) wrote :

The only two changes between the puppet-pacemaker from a successful job and broken one are:
908138862507 - (2017-05-03 12:31:56 +0200) (HEAD -> master, origin/master, origin/HEAD, fix-ipv6-label-typo) Fix a typo in ipv6 addrlabel <Michele Baldessari>
0c2e23375cd2 - (2017-04-28 08:21:18 +0200) (gerrit/master, review/michele_baldessari/ipv6_addrlabel) Add support for ipv6_addrlabel with IPaddr2 RA <Michele Baldessari>

I'd say it is unlikely that this is the issue.

Revision history for this message
Michele Baldessari (michele) wrote :

I think the problem is clear and has been introduced in puppet-pacemaker in January. No idea as to why we seem to be hitting this frequently only now. My logstash search found the first occurrence of this (in the last 30d) here:
http://logs.openstack.org/16/454816/13/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/13c623c/console.html

A fix is here:
https://review.openstack.org/#/c/462704/

Changed in tripleo:
assignee: nobody → Michele Baldessari (michele)
status: New → Incomplete
status: Incomplete → Triaged
Revision history for this message
Alfredo Moralejo (amoralej) wrote :
Revision history for this message
Michele Baldessari (michele) wrote :

Fix for puppet-pacemaker got merged to master. For newton/ocata I started:
https://review.rdoproject.org/r/6560

Revision history for this message
Michele Baldessari (michele) wrote :

RDO review got merged (thanks Emilien!) so I think we should be good on this front, closing this.

Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-pacemaker 0.6.0

This issue was fixed in the openstack/puppet-pacemaker 0.6.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.