Pool Manager creates/deletes can go into an infinite loop

Bug #1430976 reported by Tim Simmons
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Designate
Fix Released
High
Tim Simmons

Bug Description

The Pool Manager doesn't really handle the following situation:

1. Pool Manager tries to create/delete a zone. In BIND9, this results in a rndc addzone/delzone. In PowerDNS, this results in a database query.
2. Nameserver receives this communication and starts working on the operation.
3. The connection between Pool Manager and this nameserver drops, or times out, or network issues cause the returning message to be dropped. There are a litany of things that can cause this.

Result: The operation requested happened successfully, but Designate doesn't know about it.

Most of the time, the Pool Manager triggers the zone to ERROR, with a CREATE/DELETE action. Which, upon the next syncing cycle attempts to recreate/redelete the zone. This almost always fails, because the action has already been taken. If it does fail, this pattern repeats it self ad infinitum.

There are two ways to fix this, as I see it.

1. A backend can decide, when receiving a CREATE/DELETE whether it wants to check and see if that action has been completed already. Some backends make this easier than others. If that check shows the action has happened, the backend will return success to the Pool Manager, and life can go on.
2. As part of it's syncing process, the Pool Manager can go and inspect the nameserver via SOA query through MiniDNS before deciding to retry the CREATE/DELETE. This adds more overhead to syncing, and it doesn't seem to be an extremely common issue. But it's possible that this could be used for something more general.

Thoughts?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to designate (master)

Fix proposed to branch: master
Review: https://review.openstack.org/163901

Changed in designate:
assignee: nobody → Tim Simmons (tim-simmons-t)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to designate (master)

Reviewed: https://review.openstack.org/163901
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=b597b2c280ed354bcc47139c0ff34519f3b2844d
Submitter: Jenkins
Branch: master

commit b597b2c280ed354bcc47139c0ff34519f3b2844d
Author: Tim Simmons <email address hidden>
Date: Thu Mar 12 15:52:56 2015 +0000

    Smarter Create/Delete in BIND9/Agent

    * In BIND9, an rndc addzone fails if the zone already exists,
      similarly, a delzone fails if the zone doesn't exist. This
      patch catches that in the output from the RNDC call, and allows
      the Pool Manager to realize success in the above situation.

    * In the Agent, we must take action (write a zone file, for example)
      before we execute the addzone/delzone. So we check with an SOA query
      before the backend is called.

    Partial-Bug: 1430976
    Change-Id: I0430c3a402ae30d5705bbfef1a4394772cbc12f9

Tim Simmons (timsim)
Changed in designate:
status: In Progress → Fix Committed
Kiall Mac Innes (kiall)
Changed in designate:
milestone: none → kilo-rc1
Tim Simmons (timsim)
Changed in designate:
importance: Undecided → High
Thierry Carrez (ttx)
Changed in designate:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in designate:
milestone: kilo-rc1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.