Bug #1516341 “Identity services (keystone) in High Availability ...” : Bugs : openstack-manuals

Alexandra Settle (alexandra-settle) on 2016-03-30

Changed in openstack-manuals:
status:	New → Confirmed
importance:	Undecided → Medium

Revision history for this message

Ben Silverman (tersian) wrote on 2016-12-29:

#1

http://docs.openstack.org/developer/keystone/apache-httpd.html

basically create 2 WSGI configurations for Apache, admin and public.

Admin example for CentOS/RHEL:

Listen 35357

<VirtualHost *:35357>
DocumentRoot "/var/www/cgi-bin/keystone"

  <Directory "/var/www/cgi-bin/keystone">
    Options Indexes FollowSymLinks MultiViews
    AllowOverride None
    Require all granted
  </Directory>

  ErrorLog "/var/log/httpd/keystone_wsgi_admin_error.log"
  ServerSignature Off
  CustomLog "/var/log/httpd/keystone_wsgi_admin_access.log" combined

  WSGIApplicationGroup %{GLOBAL}
  WSGIDaemonProcess keystone_admin display-name=keystone-admin group=keystone processes=1 threads=12 user=keystone
  WSGIProcessGroup keystone_admin
  WSGIScriptAlias / "/var/www/cgi-bin/keystone/keystone-admin"
  WSGIPassAuthorization On
</VirtualHost>

Public Example for CentOS/RHEL:

Listen 5000

<VirtualHost *:5000>>
DocumentRoot "/var/www/cgi-bin/keystone"

  <Directory "/var/www/cgi-bin/keystone">
    Options Indexes FollowSymLinks MultiViews
    AllowOverride None
    Require all granted
  </Directory>

  ErrorLog "/var/log/httpd/keystone_wsgi_public_error.log"
  ServerSignature Off
  CustomLog "/var/log/httpd/keystone_wsgi_public_access.log" combined

  WSGIApplicationGroup %{GLOBAL}
  WSGIDaemonProcess keystone_public display-name=keystone-public group=keystone processes=1 threads=12 user=keystone
  WSGIProcessGroup keystone_public
  WSGIScriptAlias / "/var/www/cgi-bin/keystone/keystone-public"
  WSGIPassAuthorization On
</VirtualHost>

Then, if pacemaker is not monitoring Apache/httpd, add it to pacemaker.

Something similar to this(or elsewhere in install guide, not sure)

https://www.server-world.info/en/note?os=CentOS_7&p=pacemaker&f=2

Revision history for this message

Shannon Mitchell (shannon-mitchell) wrote on 2017-02-24:

#2

I was reading over the docs and came across the keystone setup with pacemaker. The idea of using pacemaker for keystone seems flawed to me. If I understand this correctly, pacemaker will have a single floating ip tied to the service and failover to another controller node if it goes down. This doesn't seem like it will scale well, is overly complicated and will lead to disaster eventually.

Say traffic starts getting to the point to where a single keystone server can't handle the requests. The service will fail, and pacemaker will bring it up on a separate controller node. A few seconds/minuets later, keystone fails and pacemaker brings it up again... repeat. Sounds like a bad day for ops and the customer.

Why can this not just be set up behind a load balancer? Seems like it would be simpler, easy to scale and have better overall performance.

Revision history for this message

Adam Spiers (adam.spiers) wrote on 2017-02-24:

#3

@Shannon the HA guide already suggests using a load balancer for keystone (and other API services) in an active-active configuration:

https://docs.openstack.org/ha-guide/controller-ha-haproxy.html

so I'm not sure why you got the impression it was suggesting active/passive. If you can point to the source of the confusion we can try to fix it.

You should also be aware that the HA guide is about to undergo some significant restructuring and improvements which have just been planned here at the PTG:

https://blueprints.launchpad.net/openstack-manuals/+spec/implement-ha-guide-todos
http://specs.openstack.org/openstack/docs-specs/specs/ocata/improve-ha-guide.html
https://etherpad.openstack.org/p/HA_Guide

Revision history for this message

Adam Spiers (adam.spiers) wrote on 2017-02-24:

#4

I should have also reiterated, as this bug originally reported, that keystone no longer runs as a separate service and is now typically fronted by Apache. That will also be documented in the HA guide in order to fix this bug.

Revision history for this message

Shannon Mitchell (shannon-mitchell) wrote on 2017-02-24:

#5

I apologize if I missed something as I'm new to the documents. I pulled from git, built the docs and I see the following:

path: openstack-manuals/doc/ha-guide/build/html/controller-ha-identity.html

It looks like its setting up a single vip(10.0.0.11) configured in pacemaker and has the endpoints and all services pointing to a single vip. I'm not seeing any mention of load balancing on this page. Where does the vip reside?

...

nm, I'm finally see it at https://docs.openstack.org/ha-guide/controller-ha-vip.html . So haproxy is using pacemaker as well? I was reading some haproxy docs earlier and it looks like the creator recommends keepalived for haproxy use for the following reasons:

http://www.formilux.org/archives/haproxy/1003/3259.html

I'm not opposed to pacemaker, but it is a bit of overkill for some of this. We have used it in the public cloud and it always seems to turn out badly. The original builders may know how to use it, but your average tech has a hard time with it. It usually ends up with them getting in and breaking pacemaker in the process of bringing up the services manually. It usually ends up staying in a broken status as no-one wants to take it down while in the process of fixing the cluster to keep SLA.

Revision history for this message

Shannon Mitchell (shannon-mitchell) wrote on 2017-02-24:

#6

If all we are doing is restarting the service if it dies, it seems like much easier solutions exist. Here are just a few. You also have the ability with upstart and systemd to respawn a service, but... I know that keystone/apache, rabbitmq, memcached and galera still use the older sysvinit scripts which may make this difficult on ubuntu/upstart OSs.

  * monit http://mmonit.com/monit/
  * supervisord http://supervisord.org/
  * daemonize http://bmc.github.com/daemonize/
  * runit http://smarden.sunsite.dk/runit/
  * perp http://b0llix.net/perp/
  * launchd http://launchd.macosforge.org/
  * DJB's daemontools http://cr.yp.to/daemontools.html

Revision history for this message

Shannon Mitchell (shannon-mitchell) wrote on 2017-02-24:

#7

This brings up other questions that we might need to think about. Do all of the services need to be ran directly on those controller nodes? Most of the deployment tools are there are starting to use lxc or docker to house each of the services. Are each of them going to need to be a member of the same controller cluster? If not, can the openstack pcs resources support the handling of services setting in docker or lxc containers?

Revision history for this message

Adam Spiers (adam.spiers) wrote on 2017-03-16:

#8

Download full text (3.8 KiB)

> So haproxy is using pacemaker as well?

No, Pacemaker is managing HAProxy. But as I said, the HA Guide is out of date and currently undergoing a huge revamp, so please don't rely on it for accurate information right now. If you need answers now, please use one of the other resources available which I list further down in this comment.

> I was reading some haproxy docs earlier and it looks like the
> creator recommends keepalived for haproxy use for the following
> reasons:
>
> http://www.formilux.org/archives/haproxy/1003/3259.html

That post, whilst very old (so old in fact that it refers to Heartbeat which is Pacemaker's predecessor), is mostly spot-on. However I disagree with the sentence "A cluster-based product may very well end up with none of the nodes offering the service, to ensure that the shared resource is never corrupted by concurrent accesses." In a correctly configured Pacemaker cluster managing HAProxy, this should not happen, because at least one node should always have quorum and be able to run the service.

And whilst Keepalived is a perfectly good solution for network-based resources which don't require fencing, if Pacemaker is already required for other reasons, it doesn't make much sense to add Keepalived when Pacemaker can already achieve the same thing.

> I'm not opposed to pacemaker, but it is a bit of overkill for some of this.

With respect, that's a slightly vague generalization :-) There are contexts within OpenStack where (say) fencing is required, and then Pacemaker is the obvious choice. Yes it might look a bit like a sledgehammer, but if you need to crack not only nuts but some large stones, sometimes it makes sense to reuse the sledgehammer for the nuts instead of spending extra effort getting a smaller hammer just for the nuts.

> We have used it in the public cloud and it always seems to turn out
> badly. The original builders may know how to use it, but your
> average tech has a hard time with it. It usually ends up with them
> getting in and breaking pacemaker in the process of bringing up the
> services manually. It usually ends up staying in a broken status as
> no-one wants to take it down while in the process of fixing the
> cluster to keep SLA.

I appreciate what you're saying. Yes Pacemaker is a complex beast, but in some cases that complexity is necessary. That's why we are revamping this HA guide, to mitigate these kinds of problems.

> If all we are doing is restarting the service if it dies,

No, in general that's not all we are doing. It may make sense to move some of the active/active services to be managed by systemd (RH are already switching to this in fact), but that does not eliminate the need for a cluster manager altogether.

But this bug is not the correct place for a Pacemaker vs. keepalived debate, so please let's not continue this here.

> This brings up other questions that we might need to think about. Do
> all of the services need to be ran directly on those controller
> nodes? Most of the deployment tools are there are starting to use
> lxc or docker to house each of the services. Are each of them going
> to need to be a member of the same controller cluster? If not...

> So haproxy is using pacemaker as well?

No, Pacemaker is managing HAProxy.  But as I said, the HA Guide is out of date and currently undergoing a huge revamp, so please don't rely on it for accurate information right now.  If you need answers now, please use one of the other resources available which I list further down in this comment.

> I was reading some haproxy docs earlier and it looks like the
> creator recommends keepalived for haproxy use for the following
> reasons:
> 
> http://www.formilux.org/archives/haproxy/1003/3259.html

That post, whilst very old (so old in fact that it refers to Heartbeat which is Pacemaker's predecessor), is mostly spot-on.  However I disagree with the sentence "A cluster-based product may very well end up with none of the nodes offering the service, to ensure that the shared resource is never corrupted by concurrent accesses."  In a correctly configured Pacemaker cluster managing HAProxy, this should not happen, because at least one node should always have quorum and be able to run the service.

And whilst Keepalived is a perfectly good solution for network-based resources which don't require fencing, if Pacemaker is already required for other reasons, it doesn't make much sense to add Keepalived when Pacemaker can already achieve the same thing.

> I'm not opposed to pacemaker, but it is a bit of overkill for some of this.

With respect, that's a slightly vague generalization :-)  There are contexts within OpenStack where (say) fencing is required, and then Pacemaker is the obvious choice.  Yes it might look a bit like a sledgehammer, but if you need to crack not only nuts but some large stones, sometimes it makes sense to reuse the sledgehammer for the nuts instead of spending extra effort getting a smaller hammer just for the nuts.

> We have used it in the public cloud and it always seems to turn out
> badly. The original builders may know how to use it, but your
> average tech has a hard time with it. It usually ends up with them
> getting in and breaking pacemaker in the process of bringing up the
> services manually. It usually ends up staying in a broken status as
> no-one wants to take it down while in the process of fixing the
> cluster to keep SLA.

I appreciate what you're saying.  Yes Pacemaker is a complex beast, but in some cases that complexity is necessary.  That's why we are revamping this HA guide, to mitigate these kinds of problems.

> If all we are doing is restarting the service if it dies,

No, in general that's not all we are doing.  It may make sense to move some of the active/active services to be managed by systemd (RH are already switching to this in fact), but that does not eliminate the need for a cluster manager altogether.

But this bug is not the correct place for a Pacemaker vs. keepalived debate, so please let's not continue this here.

> This brings up other questions that we might need to think about. Do
> all of the services need to be ran directly on those controller
> nodes? Most of the deployment tools are there are starting to use
> lxc or docker to house each of the services. Are each of them going
> to need to be a member of the same controller cluster? If not, can
> the openstack pcs resources support the handling of services setting
> in docker or lxc containers?

I really do welcome your thoughts and input on these kinds of topics, but please provide them in the right place, e.g. any of:

- the openstack-dev mailing list (put "[HA]" as a prefix in the Subject header)
- the #openstack-ha IRC channel on Freenode
- the weekly OpenStack HA meetings on IRC
- any of the OpenStack events (e.g. if you're going to Boston then do grab me for a chat)

In contrast this bug is *specifically* about the keystone section of the HA guide, so please don't pollute it with other topics.  Thanks for your understanding!

foundjem (foundjem-devops) on 2017-05-12

Changed in openstack-manuals:
assignee:	nobody → foundjem (foundjem-devops)

Chason Chan (chen-xing) on 2017-11-09

Changed in openstack-manuals:
assignee:	foundjem (foundjem-devops) → nobody

Revision history for this message

Frank Kloeker (f-kloeker) wrote on 2019-05-06:

#9

The ha-guide isn't in openstack-manuals anymore. Please refer to
https://storyboard.openstack.org/#!/project/openstack/ha-guide or reach out the Keystone team to address this topic in the current release.

Changed in openstack-manuals:
status:	Confirmed → Won't Fix

openstack-manuals

Identity services (keystone) in High Availability Guide

Bug Description

Other bug subscribers

Remote bug watches