openstack-manuals

Bug #1516341
Comment #8

Comment 8 for bug 1516341

Revision history for this message

Adam Spiers (adam.spiers) wrote on 2017-03-16:

> So haproxy is using pacemaker as well?

No, Pacemaker is managing HAProxy. But as I said, the HA Guide is out of date and currently undergoing a huge revamp, so please don't rely on it for accurate information right now. If you need answers now, please use one of the other resources available which I list further down in this comment.

> I was reading some haproxy docs earlier and it looks like the
> creator recommends keepalived for haproxy use for the following
> reasons:
>
> http://www.formilux.org/archives/haproxy/1003/3259.html

That post, whilst very old (so old in fact that it refers to Heartbeat which is Pacemaker's predecessor), is mostly spot-on. However I disagree with the sentence "A cluster-based product may very well end up with none of the nodes offering the service, to ensure that the shared resource is never corrupted by concurrent accesses." In a correctly configured Pacemaker cluster managing HAProxy, this should not happen, because at least one node should always have quorum and be able to run the service.

And whilst Keepalived is a perfectly good solution for network-based resources which don't require fencing, if Pacemaker is already required for other reasons, it doesn't make much sense to add Keepalived when Pacemaker can already achieve the same thing.

> I'm not opposed to pacemaker, but it is a bit of overkill for some of this.

With respect, that's a slightly vague generalization :-) There are contexts within OpenStack where (say) fencing is required, and then Pacemaker is the obvious choice. Yes it might look a bit like a sledgehammer, but if you need to crack not only nuts but some large stones, sometimes it makes sense to reuse the sledgehammer for the nuts instead of spending extra effort getting a smaller hammer just for the nuts.

> We have used it in the public cloud and it always seems to turn out
> badly. The original builders may know how to use it, but your
> average tech has a hard time with it. It usually ends up with them
> getting in and breaking pacemaker in the process of bringing up the
> services manually. It usually ends up staying in a broken status as
> no-one wants to take it down while in the process of fixing the
> cluster to keep SLA.

I appreciate what you're saying. Yes Pacemaker is a complex beast, but in some cases that complexity is necessary. That's why we are revamping this HA guide, to mitigate these kinds of problems.

> If all we are doing is restarting the service if it dies,

No, in general that's not all we are doing. It may make sense to move some of the active/active services to be managed by systemd (RH are already switching to this in fact), but that does not eliminate the need for a cluster manager altogether.

But this bug is not the correct place for a Pacemaker vs. keepalived debate, so please let's not continue this here.

> This brings up other questions that we might need to think about. Do
> all of the services need to be ran directly on those controller
> nodes? Most of the deployment tools are there are starting to use
> lxc or docker to house each of the services. Are each of them going
> to need to be a member of the same controller cluster? If not, can
> the openstack pcs resources support the handling of services setting
> in docker or lxc containers?

I really do welcome your thoughts and input on these kinds of topics, but please provide them in the right place, e.g. any of:

- the openstack-dev mailing list (put "[HA]" as a prefix in the Subject header)
- the #openstack-ha IRC channel on Freenode
- the weekly OpenStack HA meetings on IRC
- any of the OpenStack events (e.g. if you're going to Boston then do grab me for a chat)

In contrast this bug is *specifically* about the keystone section of the HA guide, so please don't pollute it with other topics. Thanks for your understanding!

> So haproxy is using pacemaker as well?

No, Pacemaker is managing HAProxy.  But as I said, the HA Guide is out of date and currently undergoing a huge revamp, so please don't rely on it for accurate information right now.  If you need answers now, please use one of the other resources available which I list further down in this comment.

> I was reading some haproxy docs earlier and it looks like the
> creator recommends keepalived for haproxy use for the following
> reasons:
> 
> http://www.formilux.org/archives/haproxy/1003/3259.html

That post, whilst very old (so old in fact that it refers to Heartbeat which is Pacemaker's predecessor), is mostly spot-on.  However I disagree with the sentence "A cluster-based product may very well end up with none of the nodes offering the service, to ensure that the shared resource is never corrupted by concurrent accesses."  In a correctly configured Pacemaker cluster managing HAProxy, this should not happen, because at least one node should always have quorum and be able to run the service.

> I'm not opposed to pacemaker, but it is a bit of overkill for some of this.

With respect, that's a slightly vague generalization :-)  There are contexts within OpenStack where (say) fencing is required, and then Pacemaker is the obvious choice.  Yes it might look a bit like a sledgehammer, but if you need to crack not only nuts but some large stones, sometimes it makes sense to reuse the sledgehammer for the nuts instead of spending extra effort getting a smaller hammer just for the nuts.

I appreciate what you're saying.  Yes Pacemaker is a complex beast, but in some cases that complexity is necessary.  That's why we are revamping this HA guide, to mitigate these kinds of problems.

> If all we are doing is restarting the service if it dies,

No, in general that's not all we are doing.  It may make sense to move some of the active/active services to be managed by systemd (RH are already switching to this in fact), but that does not eliminate the need for a cluster manager altogether.

But this bug is not the correct place for a Pacemaker vs. keepalived debate, so please let's not continue this here.

I really do welcome your thoughts and input on these kinds of topics, but please provide them in the right place, e.g. any of:

In contrast this bug is *specifically* about the keystone section of the HA guide, so please don't pollute it with other topics.  Thanks for your understanding!