Comment 7 for bug 1543094

Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

I looked in to this a bit today. This availability range table has been a point of lock contention for a long time. I'm being serious when I suggest that we get rid of it. Let me explain.

The contents of the availability range table are completely derived from the "allocation pools" and the "allocations". It is essentially a set difference between the two. On this basis, some would argue that it never should've been stored as a table in the first place. But, I understand that someone thought that the queries required to compute availability on the fly were expensive and that caching availability in a table was an attractive alternative.

One problem is that availability had to be designed to be compact because it had to work for ipv6 and even some large ipv4 subnets where recording individual IP availability is impractical. So, availability was compressed in to ranges. The problem with this is that each allocation needs to adjust the ranges. So, all allocations (and previously all deallocations) on a subnet need to serialize around the ranges recorded for the subnet. This severely limits the number of allocations that can be performed at any given time. It is a bottleneck.

Hence, I suggest that we remove the table and compute availability on the fly. We read the subnet pools in to an ipset, read all of the allocations for the subnet, and do the set difference. Then, pick an IP and record it in the IP allocations table.

You might be thinking that we're merely moving the contention to the allocations table. But, this is easily mitigated. Given the algorithm that I described, we'll normally have a whole set of IPs from which to choose. So, instead of everyone choosing the first IP available, we select one randomly from a window of the next available IP addresses. For example, my algorithm might look at the next 16 IP addresses and pick one randomly to try to avoid colliding with someone else looking at the same (or almost the same) window. That should cut down on the number of collision that need to be retried.

The more contention I see around this table, the more I'm convinced that computing availability on the fly really won't be that bad, even in large subnets. What do you think?