Bug #1513765 “bulk delete of ports cost iptables-firewall too mu...” : Bugs : neutron

shihanzhang (shihanzhang) on 2015-11-06

Changed in neutron:
assignee:	nobody → shihanzhang (shihanzhang)

Revision history for this message

Kyle Mestery (mestery) wrote on 2015-11-06:

#1

I suspect this does affect Liberty as well, so if we fix it, lets plan for a backport.

Changed in neutron:
status:	New → Triaged
importance:	Undecided → Medium

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-11-09:

#2

kyle, thanks for your confirm, it affects Liberty as well

Revision history for this message

Eugene Nikanorov (enikanorov) wrote on 2015-11-09:

#3

I'm afraid using a kind of async mechanism for cleaning conntrack will create possibility for race conditions for conntrack zones.
What time does it take to cleanup conntrack for a 100 VMs?

Revision history for this message

Kevin Benton (kevinbenton) wrote on 2015-11-09:

#4

What is the issue that this bug is reporting? Taking time to cleanup after a deleted VM is only an issue if it's blocking the wiring of new or updated ports. Is that what is happening? If so, I would rather just have us de-prioritizing the cleanup so this doesn't impact anything important.

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-11-10:

#5

@Eugene Nikanorov, it depends on the the number of VMs running on a compute node, for example, if a default security group has 100 VMs, and one compute node has 15 VMs, when we delete the other 85 VMs, this compute node need at least 5 minutes to clean the ip_conntrack

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-11-10:

#6

@kevin, when the ovs-agent was cleaning the ip_conntrack, it block the wiring of new or updated ports, because ovs-agent just run with one process.
I have test with eventlet.GreenPool(size=10), as above case, the time can be reduced to 2 minutes, but I think some user can't accept it, so I want to add a config option, the default value is not cleaning the ip_conntrack, what do you think?

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-10: Fix proposed to neutron (master)

#7

Fix proposed to branch: master
Review: https://review.openstack.org/243484

Changed in neutron:
status:	Triaged → In Progress

Revision history for this message

Ryan Tidwell (ryan-tidwell) wrote on 2015-11-11: Re: bulk delete ports cost ovs-agent much time

#8

Can confirm that conntrack cleanup seems to be a blocking operation. Can also attest to poor performance. I witnessed my system spend well over 10 minutes spinning on conntrack cleanup of 50 instances and a security group that allows IP traffic from 0.0.0.0, which seems like fairly basic scale. I can only imagine what happens with a more complex security group. Haven't spent much time in the code, and it did appear that there were attempts to clean up conntrack state that had already been cleaned up. I've lost the log file, but I can reset and attempt to reproduce.

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-11-11:

#9

@kevin, Ryan Tidwell, do you have good idea to solve this problem?

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-11-11:

#10

@Ryan Tidwell, I have test case that a security group that allows IP traffic from 0.0.0.0, but not found problems, can you provide some details?

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-11: Fix proposed to neutron (master)

#11

Fix proposed to branch: master
Review: https://review.openstack.org/243994

Revision history for this message

Miguel Angel Ajo (mangelajo) wrote on 2015-11-11: Re: bulk delete ports cost ovs-agent much time

#12

I'm rising importance to High, as this seems to be really widespread (as per comment #8) and probably turning some clouds into no functional at scale.

tags:	added: liberty-backport-potential
tags:	added: sg-fw
Changed in neutron:
importance:	Medium → High

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-12: Change abandoned on neutron (master)

#13

Change abandoned by shihanzhang (<email address hidden>) on branch: master
Review: https://review.openstack.org/243484
Reason: please review this patch https://review.openstack.org/#/c/243994/

Revision history for this message

Miguel Angel Ajo (mangelajo) wrote on 2015-11-12: Re: bulk delete ports cost ovs-agent much time

#14

Adding the linuxbridge tag, and pinging the relevant people, since I believe this also affects linux bridge implementation due to them sharing the iptables firewall driver.

tags:

added: linuxbridge

Revision history for this message

Miguel Angel Ajo (mangelajo) wrote on 2015-11-12:

#15

@shihanzang, another question, did we already have conntrack manipulations on kilo? or that was only introduced in liberty?.

Miguel Angel Ajo (mangelajo) on 2015-11-17

tags:	added: bridge ovs
summary:	- bulk delete ports cost ovs-agent much time + bulk delete of ports cost iptables-firewall too much time

Revision history for this message

Eugene Nikanorov (enikanorov) wrote on 2015-11-19:

#16

Miguel, I believe we do, so it would affect Kilo too.
However I wonder, how such a cleanup could take so much time.

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-11-19:

#17

this feature implement in Liberty, so it does not affect kilo

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2016-03-08:

#18

Let's see if there's any chance to nuke this. Assessing...

Changed in neutron:
milestone:	none → mitaka-rc1

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2016-03-08:

#19

After looking closer, this seems like a blocker to me.

Changed in neutron:
importance:	High → Critical

Revision history for this message

Ihar Hrachyshka (ihar-hrachyshka) wrote on 2016-03-13:

#20

I am not sure how it's a release blocker if it was always the case since the feature introduction.

OVS agent is blocked on conntrack call because it's not using thread pool. Optimization for security group updates will get us that far where we could optimize for some specific use cases; but it won't tackle the global issue of the agent being completely blocked by external process calls.

Armando Migliaccio (armando-migliaccio) on 2016-03-15

Changed in neutron:
milestone:	mitaka-rc1 → newton-1

Armando Migliaccio (armando-migliaccio) on 2016-03-15

tags:

added: mitaka-rc-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-16: Fix proposed to neutron (master)

#21

Fix proposed to branch: master
Review: https://review.openstack.org/293239

Changed in neutron:
assignee:	shihanzhang (shihanzhang) → Kevin Benton (kevinbenton)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-16: Fix proposed to neutron (stable/liberty)

#22

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/293286

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-16: Fix merged to neutron (master)

#23

Reviewed: https://review.openstack.org/293239
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d7aeb8dd4b1d122e17eef8687192cd122b79fd6e
Submitter: Jenkins
Branch: master

commit d7aeb8dd4b1d122e17eef8687192cd122b79fd6e
Author: Kevin Benton <email address hidden>
Date: Mon Mar 14 13:19:54 2016 -0700

De-dup conntrack deletions before running them

    During a lot of port deletions, the OVS agent will
    build up a lot of remote security group member updates
    for a single device. Once the call to delete all
    of the removed remote IP conntrack state gets issued,
    there will be many duplicated entries for the same
    device in the devices_with_updated_sg_members dicionary
    of lists.

    This results in many duplicated calls to remove conntrack
    entries that are just a waste of time. The longer it takes
    to remove conntrack entries, the more of these duplicates
    build up for other pending changes, to the point where there
    can be hundreds of duplicate calls for a single device.

    This just adjusts the conntrack manager clearing logic to
    make sure it de-duplicates all of its delete commands before
    it issues them.

    In a local test on a single host I have 11 threads create
    11 ports each, plug them into OVS, and then delete them.
    Here are the number of conntrack delete calls issued:

Before this patch - ~232000
With this patch - ~5200

While the remaining number still seems high, the agent is now
fast enough to keep up with all of the deletes.

Closes-Bug: #1513765
Change-Id: Icba88ab47ee17bf5d6ccdfc0f78bec911987ca90

Changed in neutron:
status:	In Progress → Fix Released

Revision history for this message

Thierry Carrez (ttx) wrote on 2016-03-16: Fix included in openstack/neutron 8.0.0.0rc1

#24

This issue was fixed in the openstack/neutron 8.0.0.0rc1 release candidate.

Armando Migliaccio (armando-migliaccio) on 2016-03-17

Changed in neutron:
milestone:	newton-1 → mitaka-rc1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-23: Fix merged to neutron (stable/liberty)

#25

Reviewed: https://review.openstack.org/293286
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e4f590dc45424ddaaec0a23841156ea9d1a8bbb9
Submitter: Jenkins
Branch: stable/liberty

commit e4f590dc45424ddaaec0a23841156ea9d1a8bbb9
Author: Kevin Benton <email address hidden>
Date: Mon Mar 14 13:19:54 2016 -0700

De-dup conntrack deletions before running them

    During a lot of port deletions, the OVS agent will
    build up a lot of remote security group member updates
    for a single device. Once the call to delete all
    of the removed remote IP conntrack state gets issued,
    there will be many duplicated entries for the same
    device in the devices_with_updated_sg_members dicionary
    of lists.

    This results in many duplicated calls to remove conntrack
    entries that are just a waste of time. The longer it takes
    to remove conntrack entries, the more of these duplicates
    build up for other pending changes, to the point where there
    can be hundreds of duplicate calls for a single device.

    This just adjusts the conntrack manager clearing logic to
    make sure it de-duplicates all of its delete commands before
    it issues them.

    In a local test on a single host I have 11 threads create
    11 ports each, plug them into OVS, and then delete them.
    Here are the number of conntrack delete calls issued:

Before this patch - ~232000
With this patch - ~5200

While the remaining number still seems high, the agent is now
fast enough to keep up with all of the deletes.

    Closes-Bug: #1513765
    Depends-On: I4041478ca09bd124827782774b8520908ef07be0
    Change-Id: Icba88ab47ee17bf5d6ccdfc0f78bec911987ca90
    (cherry picked from commit d7aeb8dd4b1d122e17eef8687192cd122b79fd6e)

tags:

added: in-stable-liberty

Revision history for this message

Doug Hellmann (doug-hellmann) wrote on 2016-03-29: Fix included in openstack/neutron 7.0.4

#26

This issue was fixed in the openstack/neutron 7.0.4 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-14: Change abandoned on neutron (master)

#27

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/243994
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Ihar Hrachyshka (ihar-hrachyshka) on 2016-10-07

tags:	added: linuxbridgeovs removed: liberty-backport-potential linuxbridge mitaka-rc-potential ovs
tags:	added: linuxbridge ovs removed: linuxbridgeovs
tags:	removed: bridge

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-10-19: Fix proposed to neutron (master)

#28

Fix proposed to branch: master
Review: https://review.openstack.org/388490

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-10-19: Change abandoned on neutron (master)

#29

Change abandoned by shihanzhang (<email address hidden>) on branch: master
Review: https://review.openstack.org/243994

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-19:

#30

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/388490
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

neutron

bulk delete of ports cost iptables-firewall too much time

Bug Description

Other bug subscribers

Remote bug watches