Comment 7 for bug 1039400

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

After some "hands on" experience, it looks like the potential race condition is much less of an issue then what previously thought.
The dhcp agent indeed runs enable_dhcp_helper and refresh_dhcp_helper every time a network or a subnet are created, updated, or deleted. These two methods will call 'enable' method on the dhcp driver.
The enable method, is the dhcp server is not yet active, will invoke the get_dhcp_port using the RPC interfaces. The latter method is responsible for querying or retrieving the port used by the dhcp server.

The race condition is therefore limited to the cases in which a script creates a subnet (and possibly a network) and then immediately spawns an instance.
To do so, we can either:
1) add a sleep/retry mechanism in allocate_for_instance when retrieving the dhcp port (trivial but not extremely efficient)
2) create an 'unbound' dhcp port when a subnet is created. This port will have the owner field set to 'network:dhcp' but no device_id. This will allow nova queries to work in any case. get_dhcp_port can be tweaked to look for such unbound port, and then update the device_id on the port itself with the device_id of a specific agent. This solution is not trivial, albeit not difficult at all, but will avoid having sleeps and loops in the nova/quantum integration code.