[RFE] IPAM migration from non-pluggable to pluggable

Bug #1516156 reported by John Belamaric
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Wishlist
Carl Baldwin

Bug Description

Currently, there is no upgrade path from the non-pluggable IPAM implementation to the pluggable implementation. This limits pluggable use to greenfield installations.

This proposal is to develop a migration from the non-pluggable version to the reference driver pluggable implementation. This migration would be manually run when changing the driver.

Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

We now have the pluggable IPAM reference implementation. It was our plan during the Liberty timeframe to develop the migration during Mitaka to this new implementation and deprecate the old one. So, this new rfe is just the formalization of that plan. I think we need to execute it.

Thank you for filing this, John.

Changed in neutron:
status: New → Confirmed
Changed in neutron:
importance: Undecided → High
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :
Changed in neutron:
importance: High → Wishlist
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Seems a noble initiative. Can someone explore on the feasibility before we commit to it?

Changed in neutron:
status: Confirmed → Triaged
Revision history for this message
John Belamaric (jbelamaric) wrote :

@Armando - I expect it should be quite feasible, but I'll ask Pavel to look closer and verify.

@Pavel - can you take a look at the effort here?

Thanks,
John

Revision history for this message
Pavel Bondar (pasha117) wrote :

@John, sure I'll investigate it deeper and post results here.
FYI, I am on VAC Dec 25 - Jan 5.
So if I am not able to provide investigation results by COB Dec 24,
then I post results only after VAC.

Revision history for this message
Pavel Bondar (pasha117) wrote :

I made some investigation on the topic and it looks like feasible task.

From high level point of view ipam part of create subnet and allocate ip should be called once again with new driver for each existent subnet, ip allocation.

The workflow for subnet re-creation looks next in this case:
- Get all subnets using get_subnets() from db_base_plugin_v2.
- Generate SubnetAddressRequest using SubnetRequestFactory for each db subnet.
  All requests would be generated as SpecificSubnetRequest, since cidr is already known for created subnets.
- Call ipam_driver.allocate_subnet for each subnet_request
At this point we are done with subnets.

Workflow for ip allocation re-creation:
- Get all ports using get_ports from db_base_plugin_v2.
- Generate AddressRequest using AddressRequestFactory for ips from port['fixed_ips'].
Note: IPv6 stateless addressed would be generated as StaticAddressRequest (as well as all other allocations),
since ip address is already known at this point.
That should be ok for ipam backend.
If it is not, we could add some extra processing to generate AutomaticAddressRequest.
Since subnet cidr and port mac address are the same it result in generating the same ip address as old one(listed in port['fixed_ips']
- get ipam_subnet for ip address
- call ipam_subnet.allocate(ip_request)

Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

@Pavel and @John, could you move this to In Progress? I assume that @Pavel will be doing the work? Please let me know when reviews are needed.

tags: added: rfe-approved
removed: rfe
Changed in neutron:
milestone: none → mitaka-3
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Hi Pavel, thanks for the initial investigation. Though I believe a few folks are generally positive about the initiative (see [1] for more details). I am unclear on a number of areas:

- Is the migration workflow to be performed on a live system and driven by API calls? Or the admin is intended to invoke a script that takes care of the data/schema migration only? In either case, is there a data plane migration that needs to occur first? Or existing workload remains unaffected?
- Are we suggesting that this migration is merely targeted to migrating the reference IPAM implementation, ie. from reference non-pluggable one to reference pluggable one?
- Idle curiosity, is it possible to switch from one pluggable IPAM solution to another (pluggable one)?

We should be clear about the objectives for this migration exercise, ie. that this migration effort is solely aimed at deprecating and removing the non-pluggable reference implementation.

[1] http://eavesdrop.openstack.org/meetings/neutron_drivers/2016/neutron_drivers.2016-01-12-15.00.log.html

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I tend to believe that this may need a full spec process. It's fair to say that at this point this will end up slipping to N, but we should definitely start working on this sooner rather than later.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

@Pavel: do you intend to actively work on this?

Changed in neutron:
assignee: nobody → Pavel Bondar (pasha117)
milestone: mitaka-3 → mitaka-2
Revision history for this message
Pavel Bondar (pasha117) wrote :

@Armando: Yes I plan to actively work on this.
Right now I have some high priority networking-infoblox work to do.
So for now we could figure out all needed details and I'll switch to active development
once I am done with networking-infoblox.

Changed in neutron:
status: Triaged → In Progress
Revision history for this message
Pavel Bondar (pasha117) wrote :

@Armando

> - Is the migration workflow to be performed on a live system and driven by API calls? Or the admin is intended to invoke a script that takes care of the data/schema migration only? In either case, is there a data plane migration that needs to occur first? Or existing workload remains unaffected?

Workflow described in [1] does only data population (no schema changes needed) and has to be performed when neutron is off (by some migrate script) or on neutron service start up (if ipam driver change is detected), i.e. not live system mode.

> - Are we suggesting that this migration is merely targeted to migrating the reference IPAM implementation, ie. from reference non-pluggable one to reference pluggable one?

[1] describes a universal way of migrating from any ipam driver (even built-in implementation) to any ipam driver since it utilize public ipam interface to populate data. It emulates subnet creation and ip allocation for IPAM layer only, i.e. skip all other action that are done during normal subnet creation/ip allocation.
If we narrow down task to migrate only from built-in ipam implementation to reference ipam driver, then there is no need to utilize ipam interface to perform data population.
Data from old tables can be directly populated to new tables bypassing ipam inteface layer.
In this case it is simplier to do since there is almost one-to-one mapping between old tables and new tables.

[1] https://bugs.launchpad.net/neutron/+bug/1516156/comments/6

Revision history for this message
Pavel Bondar (pasha117) wrote :

Also the way how data population would be called sounds like an open question for now.
I see two ways of triggering data population:

1) Manually by admin.
- turn off neutron-server
- update neutron.conf with 'ipam_driver = internal'
- run data migrate script to populate data to reference ipam driver tables
- start neutron-server

2) Automatically on neutron restart.
- update neutron.conf with 'ipam_driver = internal'
- restart neutron-server
- during neutron-server start up it autodetects that current driver has been changed and no longer match
old driver, so data migrate script is called

Which way is preferred?

Revision history for this message
Pavel Bondar (pasha117) wrote :

If scope of the task is to migrate only from built-in implementation to reference ipam driver,
then copying data from old tables to new tables is enough.
And no need to do subnet creation emulation using ipam interface as described in [1].

List of tables that has to be populated with data (from => to):

Subnet -> IpamSubnet (only subnet_id is needed)
IPAllocationPool -> IpamAllocationPool
IPAllocation -> IpamAllocation
IPAvailabilityRange -> IpamAvailabilityRange

[1] https://bugs.launchpad.net/neutron/+bug/1516156/comments/6

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

I agree with your last statement Pavel, for migrating to the reference IPAM driver we just need to execute the SQL statements for populating the tables you listed in your comment (which have a structure very similar to the original, so the task will be fairly easy as well).

I suggest to leave control to the operator regarding when to execute this script.
Automating it at server startup won't provide operators with significant advantages, and might present some challenges, for instance:

- we'd need to keep track of the "old" value of the configuration variable
- we'd need to ensure that the migration is executed only once even when multiple server instances are running

Also, one thing that I probably missed is whether operators can be allowed to go back to the "old" IPAM after switching to the IPAM driver. In theory, that would be possible, even if in order to be functional a 2-way migration might be required.

Changed in neutron:
milestone: mitaka-2 → mitaka-3
Henry Gessau (gessau)
summary: - IPAM migration from non-pluggable to pluggable
+ [RFE] IPAM migration from non-pluggable to pluggable
Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

I think it is pretty safe to say that the scope of this task is only to migrate from non-pluggable reference implementation to the pluggable reference implementation.

The goal is to deprecate the non-pluggable implementation in Newton, right? So, eventually everyone upgrading to Ocata will be forced to go through this. Given that, it seems like we should make this as easy and automatic as possible.

Let's think about it this way. At some point, we need to switch the pluggable implementation to be the default as in [1]. We will also need to ensure that grenade passes. Doesn't that imply some sort of automated migration?. Can we do all of this with a script that is manually run by the operator? Or, should we be looking to do this as part of the migration script that they'll need to do to upgrade anyway?

On the other hand, I can see the desire to allow the operator to choose when to do the upgrade and give them the option to revert back. But, if we're deprecating and removing the non-pluggable implementation, does this really make sense? If they decide to revert, they're just delaying the inevitable.

[1] https://review.openstack.org/#/c/181023/

Revision history for this message
Pavel Bondar (pasha117) wrote :

Since scope of this task is only to migrate from non-pluggable reference implementation to the pluggable reference implementation, then I will implement copying data from old tables to new tables (as described in my previous comment).

And looks like to make grenade pass with reference ipam driver as a default we need some automated migration. Is thereany other possible options here?

Do we still need a spec? As I see we are making progress in fleshing out details in ticket comments.

Revision history for this message
Pavel Bondar (pasha117) wrote :

I am actively working on that task now, and it is my high priority item for next days/weeks.

To validate migration that I am going to implement, I will make it part of patch [1] to run grenade with reference ipam driver as a default.

I see several option how I can update this review to pass grenade. Data migration could be executed using alembic migration, and it could be done in different ways:

- implement pure sql migration using op.execute() inside alembic migration. Complicated part here is that id's in old tables and new tables are different, so need to remap them and probably two temporarily tables for migration. I expect that pure sql may not work out with every possible backend (postgre sql).

- import/define model definition for old and new tables inside alembic migration, Then copy data and implement id remapping using orm. Issue here is that this code can help to pass tests but might be useless if we go with separate script approach, so all code would be thrown away.

- implement separate script for data migration, and just call it from alembic_migration. As for me calling executable from alembic migration looks ugly, but if we go with script approach as final solution then this script could be reused.

For now I am investigating deeper with script approach and starting some coding in this area.
Any thoughts on approach to use?

I don't feel confident in this area yet (preparing scripts external to neutron, that uses neutron models), and run into various env issues in the beginning,
so if you are aware of code that I could potentialy use as an example, please share it with me.

[1] https://review.openstack.org/#/c/181023/

Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

I think we should test the same thing in grenade that we expect operators will use for a real upgrade. I think that eliminates the second option.

I noticed that all of these options involve running from an alembic migration. Does this mean that it will be run unconditionally when the operator does the database migration from Liberty to Mitaka? I'm having second thoughts about doing that because it will force the switch on everyone at the time of upgrading from Liberty to Mitaka. We might consider the trade-offs of offering an external script that operators can choose to run on their time table. But, it still might be delaying the inevitable.

Someone at the Nova mid-cycle -- I don't remember who -- heard me talking about this and suggested that there is some way to get grenade to run an external upgrade script. He suggested that we ask Sean Dague about this but I never did. I had too many other things on my mind.

I have to say I don't understand the distinction between options 2 and 3. I'm not very confident in this area yet either so I'm not sure I can provide much more feedback. Maybe a ping to Sean Dague might be the best next step to decided what course of action to take.

Feel free to ping me on IRC when you need feedback. I miss updates to bugs all the time because there are just too many of them. We do need to stay on top of this task in order to finish it in time for Mitaka-3.

Revision history for this message
Pavel Bondar (pasha117) wrote :

Distinction between options 2 and 3 from my previous comment is next:
3 includes script that can be executed not only by alembic_migration, but by operator as well.
And option 2 migrates as reqular alembic_migration does, so executed unconditionally only during upgrade process.

Another difference in how code would look like.
For the second option I assume it would look similar to [1].
I.e. can't use objects directly from model_v2, and need create similar db models just for current migration.
And for the external script case I hope to have ability to use object from model_v2 directly.
Actually this part is just my assumptions based on what I see in code, so it would be good if someone says it is True or False.

Please note that alembic migration is as a trigger only for initial development stage, and can be replaced as with some other trigger depending on requirements.

As a final trigger solution we could use:
- Manual migration, where operator executes script;
- Automatically on neutron restart, tried to describe it as second option in [2];
- Automatically on neutron upgrade (alembic migration);

I'll try to ping Sean Dague to join discussion.
If we can easily get grenade to run an external upgrade script, that would simplify development and fixing grenade failures.

[1] https://github.com/openstack/neutron/blob/master/neutron/db/migration/alembic_migrations/versions/mitaka/contract/8a6d8bdae39_migrate_neutron_resources_table.py
[2] https://bugs.launchpad.net/neutron/+bug/1516156/comments/13

Revision history for this message
Pavel Bondar (pasha117) wrote :

Uploaded initial version [1] of code.
It might not work for now, but should give a picture of what I mean by option 3 "implement separate script for data migration, and just call it from alembic_migration".
It is useful mostly for debugging purpose and to make sure grenade passes.
Once failures are cleaned up we can use anther trigger for data population:
- operator,
- detecting driver on neutron start up,
or anything else.

[1] https://review.openstack.org/#/c/181023/

Revision history for this message
Pavel Bondar (pasha117) wrote :

Testing various approaches for now in https://review.openstack.org/#/c/181023/
- PS53 use alembic migration that executes script for data migration;
- PS54 pure alembic migration without calling external scripts;
Haven't got clean pass yet, but some initial reviews would help a lot.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/277767

Changed in neutron:
milestone: mitaka-3 → mitaka-rc1
Changed in neutron:
milestone: mitaka-rc1 → newton-1
Changed in neutron:
milestone: newton-1 → newton-2
Changed in neutron:
assignee: Pavel Bondar (pasha117) → Carl Baldwin (carl-baldwin)
Changed in neutron:
milestone: newton-2 → newton-3
Changed in neutron:
assignee: Carl Baldwin (carl-baldwin) → Pavel Bondar (pasha117)
Changed in neutron:
assignee: Pavel Bondar (pasha117) → Carl Baldwin (carl-baldwin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/348956

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/348956
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=67984850228f6f26a72504b9f464a5fbcaac59e6
Submitter: Jenkins
Branch: master

commit 67984850228f6f26a72504b9f464a5fbcaac59e6
Author: Carl Baldwin <email address hidden>
Date: Fri Jul 29 10:10:48 2016 -0600

    Avoid IPAM driver reusing a session that has been rolled back

    With the in-tree pluggable IPAM driver, IPAM rollback tries to use the
    DB session after it has been rolled back due to an exception. This
    driver doesn't need roll back, so fix this by adding a method to the
    driver signalling that rollback shouldn't be attempted.

    Change-Id: Ic254789e58a8a51cd1aa943cb71de12410f4c0a7
    Closes-Bug: #1603162
    Related-Bug: #1516156

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/181023
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=625de54de3936b0da8760c3da76d2d315d05f94e
Submitter: Jenkins
Branch: master

commit 625de54de3936b0da8760c3da76d2d315d05f94e
Author: Pavel Bondar <email address hidden>
Date: Thu May 7 17:40:55 2015 +0300

    Switch to pluggable IPAM implementation

    This patch does unconditional switch from non-pluggable IPAM to
    pluggable IPAM for all deployments during upgrade to Neutron.

    Pluggable IPAM is enabled by pointing ipam_driver default to reference
    driver. User who manually set ipam_driver in neutron.conf will continue
    to use ipam_driver of their choice.

    During upgrade data is migrated from non-pluggable IPAM tables to
    pluggable IPAM tables using alembic_migration. Availability ranges
    (IPAvailabilityRange) is no longer used to calculate next available ip
    address, so migration for this table is not included.

    Migration is covered with functional tests. Dataset with subnets,
    allocation pools and ip allocations is loaded prior to migration.
    Once migration is completed ipam related tables are checked
    if data is migrated properly.

    Built-in IPAM implementation becomes obsolete and is planned to be
    removed in upcoming commits.

    UpgradeImpact
    Closes-Bug: #1516156
    Change-Id: I1d633810bd16f1bec7bbca57522e9ad3f7745ea2

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/362288

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/362288
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=53503e7586ed3bf839759f3ebe0481e2cbf1e409
Submitter: Jenkins
Branch: master

commit 53503e7586ed3bf839759f3ebe0481e2cbf1e409
Author: Carl Baldwin <email address hidden>
Date: Mon Aug 29 11:14:39 2016 -0600

    Remove non-pluggable IPAM implementation

    Change-Id: I870106cd5e0872314e4c2f21d17b379a64427afc
    Related-Bug: #1516156

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 9.0.0.0b3

This issue was fixed in the openstack/neutron 9.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.