Comment 5 for bug 1250168

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

The root cause of the issue is the inability of neutron of providing network allocation for 150 instances (the number large_ops spawns) within the timeout of 196 seconds.
NOTE: these operation do not include actual provisioning in the backend. It seems the time is wasted while the concurrent requests executed by _allocate_for_network_async in nova compete for db resources.

This is the symptom of a performance regression that needs to be addressed in neutron.
- Enabling multiple workers on the neutron API does not improve the situation (as the requests are waiting for a resource to become available)
- The same issue is encountered both on the OVS and the ML2 plugin, which might indicate the problem lies either in the IPAM logic or some other shared code in db_base_plugin_v2 (it is actually marginally better with OVS due to less DB operations).
- Changing the build_interval parameter to reduce the frequency of calls to GET /v2/servers/<server_id> (which in turns calls neutron to get instance port info) does not have any impact on the overall time required by test, which further confirm an hypothesis of resource contention in IP allocation.

The error can easily be reproducible locally, running the tests inside a VM with 2GB RAM.

More details will be provided when available.