HA failure when no IP address is bound to the VIP interface

Bug #1391784 reported by James Page
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Swift Proxy Charm
Fix Released
High
James Page
cinder (Juju Charms Collection)
Fix Released
High
James Page
glance (Juju Charms Collection)
Fix Released
High
James Page
keystone (Juju Charms Collection)
Fix Released
High
James Page
neutron-api (Juju Charms Collection)
Fix Released
High
James Page
nova-cloud-controller (Juju Charms Collection)
Fix Released
High
James Page
openstack-dashboard (Juju Charms Collection)
Fix Released
High
James Page
percona-cluster (Juju Charms Collection)
Invalid
High
Unassigned
swift-proxy (Juju Charms Collection)
Invalid
High
James Page

Bug Description

Proxying from juju ML:

We've been working on setting up an Openstack cluster on Trusty for a few months now using Juju and MAAS, although we've yet to go into production. I had everything working fine, including HA deployments of Keystone, Glance, Percona etc.

The older versions of the charms supported HA using the config settings vip, vip_cidr and vip_iface. Without me making any modifications to these charms, I successfully deployed all of the above charms with the bog-standard hacluster charm.

Over the weekend I've been updating to Juno, and I naturally updated to the latest stable charms from the Charm store. Breaking changes have been introduced to these charms such that they no longer support my deployment. My Openstack cluster promptly broke in a nasty way. I'm *really* glad this isn't a production environment, but these kinds of non-backward compatible breakages do give me cause for concern going forward.

To explain how this broke, I'll first need to explain how our network was deployed:

    In order to not burn through many public IPs, we assign RFC1918 IPs to *every server* by DHCP.
    We run at least two instances of critical services
    Public IPs are assigned primarily by Pacemaker
    Public and Private subnets coexist on a single Layer-2 network.
    Nodes that do not directly participate in the Public subnet still have direct access (not via a router) to the Public IPs courtesy of the DHCP option (rfc3442-classless-static-routes). It turns out that a Linux hosts in different subnets can directly communicate with one-another on the same layer-2 network without the need of a router.

This set-up was highly efficient in terms of consumption of valuable public IP addresses, without forcing inter-subnet communications via an unnecessary hop. The only trick that we had to pull-off was getting the DHCP server to give out the rfc3442-classless-static-routes, which was simple.

The old OpenStack charms with their simple vip, vip_cidr and vip_iface options worked perfectly with this set-up. The new charms cannot support this at all, as they have become, in my view, "too clever". They now insist that the vip can only be bound to an interface that already has an IP in the same subnet.

If I have to bind public IPs to every server (IPs that they will never use) just in order to have Pacemaker assign the vip, I'll burn through a lot of IPs in the most pointless way imaginable.

I've modified the keystone and openstack-dashboard charms to re-introduce the old functionality in a way that doesn't break the new multiple-IP functionality. I'll paste my keystone patch below to give you an idea what I think is needed. This hasn't been thoroughly tested, but it seems to work. Pacemaker can at least set the public IP address again.

If there is some other (better) way to achieve the same level of IP address allocation efficiency and performance without patching the Openstack charms, please point me in the right direction.

Thanks,
John

Tags: openstack

Related branches

tags: added: openstack
Revision history for this message
James Page (james-page) wrote :

After some discussion on IRC, we're going to re-use the previous configuration options vip_iface and vip_cidr so that any upgraders also using John's configuration should just DTRT.

Revision history for this message
James Page (james-page) wrote :

Marking invalid for percona-cluster as this charm never switched to auto-interface detection.

Changed in cinder (Juju Charms Collection):
status: New → In Progress
Changed in glance (Juju Charms Collection):
status: New → In Progress
Changed in keystone (Juju Charms Collection):
status: New → In Progress
Changed in neutron-api (Juju Charms Collection):
status: New → In Progress
Changed in nova-cloud-controller (Juju Charms Collection):
status: New → In Progress
Changed in openstack-dashboard (Juju Charms Collection):
status: New → In Progress
Changed in percona-cluster (Juju Charms Collection):
status: New → In Progress
Changed in cinder (Juju Charms Collection):
importance: Undecided → High
Changed in glance (Juju Charms Collection):
importance: Undecided → High
Changed in keystone (Juju Charms Collection):
importance: Undecided → High
Changed in neutron-api (Juju Charms Collection):
importance: Undecided → High
Changed in nova-cloud-controller (Juju Charms Collection):
importance: Undecided → High
Changed in openstack-dashboard (Juju Charms Collection):
importance: Undecided → High
Changed in percona-cluster (Juju Charms Collection):
importance: Undecided → High
status: In Progress → Invalid
Revision history for this message
James Page (james-page) wrote :

Although the proprosed branches fix the first problem (fallback for auto-detection failures), that approach is littered throughout the codebase to support the network-splits functionality which landed this cycle.

This specifically breaks the haproxy configuration template, which now uses inbound acls based on network to load balance traffic over matching backends (required to support https certificates).

One approach might be to have a fallback backend, which would be used in the event that no acl matched, restoring in part how the haproxy configurations used to work prior to this release.

Revision history for this message
James Page (james-page) wrote :

@John

I've pushed an update to most of the branches which adds a 'default_backend' which I think will do the trick - however right now I've not had time to test these - I hope to get to that by the end of this week.

Revision history for this message
James Page (james-page) wrote :

First part (enable default_backend) in haproxy configurations has landed into next; looking at vip configuration next.

The next charm release is due at the end of January (15.01).

Revision history for this message
James Page (james-page) wrote :

Fallback configurations re-introduced for most charms, swift-proxy still needs this fix.

Changed in openstack-dashboard (Juju Charms Collection):
status: In Progress → Fix Committed
Changed in nova-cloud-controller (Juju Charms Collection):
status: In Progress → Fix Committed
Changed in neutron-api (Juju Charms Collection):
status: In Progress → Fix Committed
Changed in keystone (Juju Charms Collection):
status: In Progress → Fix Committed
Changed in cinder (Juju Charms Collection):
status: In Progress → Fix Committed
Changed in glance (Juju Charms Collection):
status: In Progress → Fix Committed
James Page (james-page)
Changed in nova-cloud-controller (Juju Charms Collection):
milestone: none → 15.01
Changed in cinder (Juju Charms Collection):
milestone: none → 15.01
Changed in glance (Juju Charms Collection):
milestone: none → 15.01
Changed in keystone (Juju Charms Collection):
milestone: none → 15.01
Changed in openstack-dashboard (Juju Charms Collection):
milestone: none → 15.01
Changed in neutron-api (Juju Charms Collection):
milestone: none → 15.01
James Page (james-page)
Changed in cinder (Juju Charms Collection):
assignee: nobody → James Page (james-page)
Changed in swift (Ubuntu):
assignee: nobody → James Page (james-page)
Changed in glance (Juju Charms Collection):
assignee: nobody → James Page (james-page)
Changed in keystone (Juju Charms Collection):
assignee: nobody → James Page (james-page)
Changed in neutron-api (Juju Charms Collection):
assignee: nobody → James Page (james-page)
Changed in nova-cloud-controller (Juju Charms Collection):
assignee: nobody → James Page (james-page)
Changed in openstack-dashboard (Juju Charms Collection):
assignee: nobody → James Page (james-page)
affects: swift (Ubuntu) → swift-proxy (Juju Charms Collection)
Changed in swift-proxy (Juju Charms Collection):
importance: Undecided → High
status: New → Triaged
James Page (james-page)
Changed in nova-cloud-controller (Juju Charms Collection):
status: Fix Committed → Fix Released
Changed in cinder (Juju Charms Collection):
status: Fix Committed → Fix Released
Changed in glance (Juju Charms Collection):
status: Fix Committed → Fix Released
Changed in keystone (Juju Charms Collection):
status: Fix Committed → Fix Released
Changed in openstack-dashboard (Juju Charms Collection):
status: Fix Committed → Fix Released
Changed in neutron-api (Juju Charms Collection):
status: Fix Committed → Fix Released
James Page (james-page)
Changed in charm-swift-proxy:
assignee: nobody → James Page (james-page)
importance: Undecided → High
status: New → Triaged
Changed in swift-proxy (Juju Charms Collection):
status: Triaged → Invalid
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

I'm tentatively marking this fix-released for the swift-proxy charm given the length of time since the last update, and the fact that the fix in the other charms is a charmhelpers change that should already be merged into charm-swift-proxy

Changed in charm-swift-proxy:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.