ocata/stable jobs are broken (httpd restart failures)

Bug #1673030 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Michele Baldessari

Bug Description

Seen multiple times in a row when backporting something unrelated to upgrades in stable/ocata (https://review.openstack.org/#/c/444928/). Upgrades (gate-tripleo-ci-centos-7-multinode-upgrades) fail due to httpd failing to restart:
http://logs.openstack.org/28/444928/1/check/gate-tripleo-ci-centos-7-multinode-upgrades/55721a2/

For example in http://logs.openstack.org/28/444928/1/check/gate-tripleo-ci-centos-7-multinode-upgrades/55721a2/logs/postci.txt.gz#_2017-03-15_09_31_27_000 we get the following:
017-03-15 09:31:27.000 | Error: /Stage[main]/Apache::Service/Service[httpd]: Systemd restart for httpd failed!
2017-03-15 09:31:27.000 | journalctl log for httpd:

The problem is that we are moving heat under httpd in the upgade process but this should never even happen in ocata. The reason for this is that on the overcloud we seem to upgrade to pike packages and not ocata ones:
From http://logs.openstack.org/28/444928/2/check/gate-tripleo-ci-centos-7-multinode-upgrades/5fc0394/logs/subnode-2/var/log/yum.txt.gz

Mar 15 13:23:28 Updated: puppet-tripleo-7.0.0-0.20170314050155.c9acf8a.el7.centos.noarch

If you look at commit c9acf8a in puppet-tripleo you will see that it is from master which has the wsgi migration stuff in heat, whereas the ocata branch would not have that. Compare:
https://github.com/openstack/puppet-tripleo/blob/stable/ocata/manifests/profile/base/heat/api_cfn.pp

and

https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/base/heat/api_cfn.pp

for example

Revision history for this message
Michele Baldessari (michele) wrote :

Mar 15 13:46:52.550010 centos-7-2-node-rax-ord-7889550-474007 python[206653]: Found 'compress' tags in:
Mar 15 13:46:52.550370 centos-7-2-node-rax-ord-7889550-474007 python[206653]: /usr/share/openstack-dashboard/openstack_dashboard/templates/horizon/_conf.html
Mar 15 13:46:52.550629 centos-7-2-node-rax-ord-7889550-474007 python[206653]: /usr/share/openstack-dashboard/openstack_dashboard/templates/_stylesheets.html
Mar 15 13:46:52.551032 centos-7-2-node-rax-ord-7889550-474007 python[206653]: /usr/share/openstack-dashboard/openstack_dashboard/templates/horizon/_scripts.html
Mar 15 13:46:52.551287 centos-7-2-node-rax-ord-7889550-474007 python[206653]: Compressing... done
Mar 15 13:46:52.551603 centos-7-2-node-rax-ord-7889550-474007 python[206653]: Compressed 6 block(s) from 3 template(s) for 2 context(s).
Mar 15 13:46:52.669540 centos-7-2-node-rax-ord-7889550-474007 httpd[206837]: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:8000
Mar 15 13:46:52.670055 centos-7-2-node-rax-ord-7889550-474007 httpd[206837]: (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:8000
Mar 15 13:46:52.670275 centos-7-2-node-rax-ord-7889550-474007 httpd[206837]: no listening sockets available, shutting down
Mar 15 13:46:52.670498 centos-7-2-node-rax-ord-7889550-474007 httpd[206837]: AH00015: Unable to open logs
Mar 15 13:46:52.686015 centos-7-2-node-rax-ord-7889550-474007 systemd[1]: httpd.service: main process exited, code=exited, status=1/FAILURE
Mar 15 13:46:52.715206 centos-7-2-node-rax-ord-7889550-474007 kill[206840]: kill: cannot find process ""
Mar 15 13:46:52.720167 centos-7-2-node-rax-ord-7889550-474007 systemd[1]: httpd.service: control process exited, code=exited status=1
Mar 15 13:46:52.720977 centos-7-2-node-rax-ord-7889550-474007 systemd[1]: Failed to start The Apache HTTP Server.
Mar 15 13:46:52.721219 centos-7-2-node-rax-ord-7889550-474007 systemd[1]: Unit httpd.service entered failed state.
Mar 15 13:46:52.721292 centos-7-2-node-rax-ord-7889550-474007 systemd[1]: httpd.service failed.

Revision history for this message
Michele Baldessari (michele) wrote :

Ok so the problem is heat is binding to the * address instead of the internal ip:
etc/httpd/conf.d/15-default.conf:<VirtualHost 192.168.24.6:80>
etc/httpd/conf.d/10-placement_wsgi.conf:<VirtualHost 192.168.24.6:8778>
etc/httpd/conf.d/10-keystone_wsgi_main.conf:<VirtualHost 192.168.24.6:5000>
etc/httpd/conf.d/10-keystone_wsgi_admin.conf:<VirtualHost 192.168.24.6:35357>
etc/httpd/conf.d/10-heat_api_wsgi.conf:<VirtualHost *:8004>
etc/httpd/conf.d/10-heat_api_cfn_wsgi.conf:<VirtualHost *:8000>
etc/httpd/conf.d/10-heat_api_cloudwatch_wsgi.conf:<VirtualHost *:8003>
etc/httpd/conf.d/10-cinder_wsgi.conf:<VirtualHost 192.168.24.6:8776>
etc/httpd/conf.d/10-horizon_vhost.conf:<VirtualHost 192.168.24.6:80>

Revision history for this message
Michele Baldessari (michele) wrote :

Spoke to Juan and this seems more of a CI issue because we do not deploy heat under httpd in ocata:
 jaosorior ╡ bandini: then yeah,s eems something is busted on CI
 jaosorior ╡ bandini: no upgrade step in stable/ocata even references
           ╡ heat over httpd

tags: added: ci
summary: - N-O upgrades get httpd restart failures
+ ocata/stable upgrade jobs are broken (httpd restart failures)
description: updated
Changed in tripleo:
importance: High → Critical
description: updated
Revision history for this message
Michele Baldessari (michele) wrote : Re: ocata/stable upgrade jobs are broken (httpd restart failures)
Changed in tripleo:
assignee: nobody → Michele Baldessari (michele)
status: New → Fix Committed
Revision history for this message
Alex Schultz (alex-schultz) wrote :

Still seeing this as of 1 hour ago

Revision history for this message
Alex Schultz (alex-schultz) wrote :
Changed in tripleo:
status: Fix Committed → Confirmed
summary: - ocata/stable upgrade jobs are broken (httpd restart failures)
+ ocata/stable jobs are broken (httpd restart failures)
tags: added: alert
Revision history for this message
Alex Schultz (alex-schultz) wrote :
Revision history for this message
Alex Schultz (alex-schultz) wrote :

it's now passing so maybe it was a timing thing

tags: removed: alert
Changed in tripleo:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.