The L3 agents and DHCP agents both define internal (qg-, qr-, tap-) ports via OVS. In both cases, the agents call plug() to configure and bring the device up if it does not exist. If the device does exist, however, the agents neither call plug nor do they ensure the link is up (OVS ensures that the devices survive a reboot but does not ensure that they are brought up on boot).
The responsibility for bringing devices up should probably remain in quantum/agent/linux/interface.py, so a suggested implementation would be delegating the device existence check to the driver's plug() method, which could then ensure that the device was brought up if necessary.
This bug reveals a hole in our current testing strategy. Most developers presumably work on devstack rather than installed code. Since devstack agents don't survive a reboot, most developers would never have the chance to validate whether a quantum agent node still works after a reboot. Documenting use-cases that need to be tested (e.g. quantum agent nodes need to work properly after a reboot) is a good first step - is this currently captured somewhere or can we find a place to do so?
This is an interesting problem.
Ideally it would be best if the OVS did not have the persistant devices after reboot. I am not sure if it is possible to configure it in a way that the devices will not be recreated after reboot.
One alternative is provide a script that will purge the ovs of devices created by quantum (this can be run prior to running the agents that make use of the ovs
Thoughts?