l3_agent not disabling namespace use

Bug #1060559 reported by Koaps
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Gary Kotton
Folsom
Fix Released
High
Gary Kotton
quantum (Ubuntu)
Fix Released
Undecided
Unassigned
Precise
Won't Fix
Undecided
Unassigned
Quantal
Fix Released
Undecided
Unassigned

Bug Description

Seems related to: https://bugs.launchpad.net/quantum/+bug/1042104

Centos 6.3 iproute doesn't support netns.

From the docs I have:

In quantum.conf

allow_overlapping_ips=False

In both dhcp_agent.ini and l3_agent.ini

use_namespaces=False set

The l3_agent seems to be still attempting to use netns.

2012-10-02 18:02:33 DEBUG [quantum.agent.linux.utils] Running command: sudo ip netns list
2012-10-02 18:02:33 DEBUG [quantum.agent.linux.utils]
Command: ['sudo', 'ip', 'netns', 'list']
Exit code: 255
Stdout: ''
Stderr: 'Object "netns" is unknown, try "ip help".\n'

Browsing the code:

/usr/lib/python2.6/site-packages/quantum-2013.1-py2.6.egg/quantum/agent/l3_agent.py

I seem to load the L3NATAgent class which in it's init runs self._destroy_all_router_namespaces() which runs root_ip.get_namespaces, which is where that sudo ip netns list is being called.

No where in that chain do I see anything checking if use_namespace is set to True.

Revision history for this message
dan wendlandt (danwent) wrote :

how are you invoking quantum-l3-agent? you're not running from packages, right?

Revision history for this message
Koaps (koaps) wrote :

I have a startup script that I use for all the openstack services, it's a basic redhat/centos init script for chkconfig.

The start command is:

    daemon --user quantum --pidfile $pidfile "$exec --config-dir $config_dir --config-file $config_file --log-file $logfile &>/dev/null & echo \$! > $pidfile"

Where:

suffix=dhcp
prog=openstack-quantum-${suffix}-agent
exec="/usr/bin/quantum-${suffix}-agent"
config_dir="/etc/openstack/quantum"
config_file="/etc/openstack/quantum/${suffix}_agent.ini"
pidfile="/var/run/openstack/quantum-${suffix}-agent.pid"
logfile="/var/log/openstack/quantum-${suffix}-agent.log"

I end up with a python command running like:

/usr/bin/python /usr/bin/quantum-dhcp-agent --config-dir /etc/openstack/quantum --config-file /etc/openstack/quantum/dhcp_agent.ini --log-file /var/log/openstack/quantum-dhcp-agent.log

The only odd thing I see is it runs dnsmasq twice, not sure why:

dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap77a55569-aa --except-interface=lo --domain=openstacklocal --pid-file=/var/lib/openstack/quantum/data/dhcp/8a56b7a4-06ae-404f-acee-9715b8823f7f/pid --dhcp-hostsfile=/var/lib/openstack/quantum/data/dhcp/8a56b7a4-06ae-404f-acee-9715b8823f7f/host --dhcp-optsfile=/var/lib/openstack/quantum/data/dhcp/8a56b7a4-06ae-404f-acee-9715b8823f7f/opts --dhcp-script=/usr/bin/quantum-dhcp-agent-dnsmasq-lease-update --leasefile-ro --dhcp-range=set:tag0,10.0.0.0,static,120s

The only difference I see between the two processes is one is run as root and the other one is run as nobody.

Revision history for this message
Gary Kotton (garyk) wrote :

Hi,
Can you please add in the trace so that we can unerstand where the call took place.
Thanks
Gary

Revision history for this message
Koaps (koaps) wrote :

Hi Gary,

Here you go:

sudo -u quantum sudo /usr/bin/quantum-l3-agent --config-dir /etc/openstack/quantum --config-file /etc/openstack/quantum/l3_agent.ini --log-file /var/log/openstack/quantum-l3-agent.log -v -d

Traceback (most recent call last):
  File "/usr/bin/quantum-l3-agent", line 9, in <module>
    load_entry_point('quantum==2013.1', 'console_scripts', 'quantum-l3-agent')()
  File "/usr/lib/python2.6/site-packages/quantum-2013.1-py2.6.egg/quantum/agent/l3_agent.py", line 530, in main
    mgr = L3NATAgent(conf)
  File "/usr/lib/python2.6/site-packages/quantum-2013.1-py2.6.egg/quantum/agent/l3_agent.py", line 129, in __init__
    self._destroy_all_router_namespaces()
  File "/usr/lib/python2.6/site-packages/quantum-2013.1-py2.6.egg/quantum/agent/l3_agent.py", line 136, in _destroy_all_router_namespaces
    for ns in root_ip.get_namespaces(self.conf.root_helper):
  File "/usr/lib/python2.6/site-packages/quantum-2013.1-py2.6.egg/quantum/agent/linux/ip_lib.py", line 124, in get_namespaces
    output = cls._execute('', 'netns', ('list',), root_helper=root_helper)
  File "/usr/lib/python2.6/site-packages/quantum-2013.1-py2.6.egg/quantum/agent/linux/ip_lib.py", line 56, in _execute
    root_helper=root_helper)
  File "/usr/lib/python2.6/site-packages/quantum-2013.1-py2.6.egg/quantum/agent/linux/utils.py", line 60, in execute
    raise RuntimeError(m)
RuntimeError:
Command: ['sudo', 'ip', 'netns', 'list']
Exit code: 255
Stdout: ''
Stderr: 'Object "netns" is unknown, try "ip help".\n'

Revision history for this message
Gary Kotton (garyk) wrote :

Thanks. I'll take care of a fix soon.

Changed in quantum:
status: New → Confirmed
assignee: nobody → Gary Kotton (garyk)
milestone: none → grizzly-1
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to quantum (master)

Fix proposed to branch: master
Review: https://review.openstack.org/14079

Changed in quantum:
status: Confirmed → In Progress
Gary Kotton (garyk)
tags: added: folsom-backport-potential
Revision history for this message
Koaps (koaps) wrote :

Hi Gary,

I was able to get l3_agent running by commenting out the self._destroy_all_router_namespaces(), of course your way is the proper way :)

I'm still having an issue trying to get NAT working.

I tried to follow the test case and workflow:

https://fedoraproject.org/wiki/QA:Testcase_Quantum_V2
http://docs.openstack.org/trunk/openstack-network/admin/content/l3_workflow.html

But I can't get NAT to work and the VM can't leave its private network, though it can ping the public interface of the controller/gateway node, so I'm pretty sure the GRE tunnel is working right.

I can file a new bug on this is that is a better way, but any help would be great because I'm stuck and this is the last mile for my stack.

Thanks

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to quantum (master)

Reviewed: https://review.openstack.org/14079
Committed: http://github.com/openstack/quantum/commit/8eb7ca51c0bfda334eda8a25d599aa1d9cd21c22
Submitter: Jenkins
Branch: master

commit 8eb7ca51c0bfda334eda8a25d599aa1d9cd21c22
Author: Gary Kotton <email address hidden>
Date: Fri Oct 5 06:07:13 2012 -0400

    Treat invalid namespace call

    Fixes bug 1060559

    Change-Id: I29250100416b87f55781fb7e97339f6d3761513f

Changed in quantum:
status: In Progress → Fix Committed
Revision history for this message
Gary Kotton (garyk) wrote : Re: [Bug 1060559] Re: l3_agent not disabling namespace use

On 10/05/2012 08:11 PM, Koaps wrote:
> Hi Gary,
>
> I was able to get l3_agent running by commenting out the
> self._destroy_all_router_namespaces(), of course your way is the proper
> way :)
>
> I'm still having an issue trying to get NAT working.
>
> I tried to follow the test case and workflow:
>
> https://fedoraproject.org/wiki/QA:Testcase_Quantum_V2
> http://docs.openstack.org/trunk/openstack-network/admin/content/l3_workflow.html
>
> But I can't get NAT to work and the VM can't leave its private network,
> though it can ping the public interface of the controller/gateway node,
> so I'm pretty sure the GRE tunnel is working right.
>
> I can file a new bug on this is that is a better way, but any help would
> be great because I'm stuck and this is the last mile for my stack.
>
> Thanks
>
Hi,
Can you please help provide some additional information about your
setup. From the bug I assume that you are working with namespaces
disabled and recall that you are using openvswicth. When I wrote the
above test cases it was done with namespaces enabled.
I have a few questions:
1. Can you please print out the ifconfig?
2. Can you please send ovs-vsctl show
3. When you assign a floating IP are you able to ping the floating IP
(after it has been assigned to a VM)?
Thanks
Gary

Revision history for this message
Koaps (koaps) wrote :
Download full text (7.4 KiB)

Hi Gary,

Here's the network info:

br-ex Link encap:Ethernet
          inet addr:10.2.1.201 Bcast:10.2.1.207 Mask:255.255.255.248
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

br-int Link encap:Ethernet
          inet addr:10.0.0.1 Bcast:10.0.0.255 Mask:255.255.255.0
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

br-omg Link encap:Ethernet
          inet addr:10.0.1.1 Bcast:10.0.1.255 Mask:255.255.255.0
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

eth0 Link encap:Ethernet
          inet addr:10.2.1.175 Bcast:10.2.1.255 Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

eth1 Link encap:Ethernet
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

eth2 Link encap:Ethernet
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

eth3 Link encap:Ethernet
          UP BROADCAST PROMISC MULTICAST MTU:1500 Metric:1

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          UP LOOPBACK RUNNING MTU:16436 Metric:1

    Bridge br-tun
        Port br-tun
            Interface br-tun
                type: internal
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
    Bridge br-ex
        Port br-ex
            Interface br-ex
                type: internal
    Bridge br-omg
        Port br-omg
            Interface br-omg
                type: internal
        Port "eth2"
            Interface "eth2"
    Bridge br-int
        Port "eth1"
            Interface "eth1"
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port br-int
            Interface br-int
                type: internal
    ovs_version: "1.7.1"

No I can't ping the floating ip assigned to the VM.
The VM can ping the public and private IPs assigned to the controller node.

I don't really see anything in IPtables doing forwarding.

iptables -L -n -v

Chain INPUT (policy ACCEPT 10M packets, 2399M bytes)
 pkts bytes target prot opt in out source destination
4183K 977M nova-api-INPUT all -- * * 0.0.0.0/0 0.0.0.0/0

Chain FORWARD (policy ACCEPT 13 packets, 1092 bytes)
 pkts bytes target prot opt in out source destination
   11 924 nova-filter-top all -- * * 0.0.0.0/0 0.0.0.0/0
   11 924 nova-api-FORWARD all -- * * 0.0.0.0/0 0.0.0.0/0

Chain OUTPUT (policy ACCEPT 10M packets, 2420M bytes)
 pkts bytes target prot opt in out source destination
8482K 2035M nova-filter-top all -- * * 0.0.0.0/0 0.0.0.0/0
4094K 990M nova-api-OUTPUT all -- * * 0.0.0.0/0 0.0.0.0/0

Chain nova-api-FORWARD (1 references)
 pkts bytes target prot opt in out source destination

Chain nova-api-INPUT (1 references)
 pkts bytes target prot op...

Read more...

Revision history for this message
Koaps (koaps) wrote :

I just noticed that the br-ex was missing it's port,

It should be:

    Bridge br-ex
        Port "eth3"
            Interface "eth3"
        Port br-ex
            Interface br-ex
                type: internal

Still didn't change anything, but I did fix that.

Revision history for this message
Gary Kotton (garyk) wrote :

On 10/07/2012 12:08 PM, Koaps wrote:
> I just noticed that the br-ex was missing it's port,
>
> It should be:
>
> Bridge br-ex
> Port "eth3"
> Interface "eth3"
> Port br-ex
> Interface br-ex
> type: internal
>
> Still didn't change anything, but I did fix that.
>
Hi,
I am looking into this at the moment. Please note that when namespaces
are disabled then you need to pass the router_id to the layer 3 agent.
Can you please ensure that this is correctly configured in the
l3_agent.ini file.
Thanks
Gary

Revision history for this message
Koaps (koaps) wrote :

Hi Gary,

That was probably a key thing, so now i'm in a slightly different situation but I think I need to sort out the configs to get it right.

Once I added the router id to the ini and restarted the l3_agent, it created the qg- and qr- interfaces and I can now ping the floating IP from the controller node.

Unfortunately it also changed my routing table, adding a new gateway (10.2.1.201) which knocked my controller off the public network.

Luckily I can still access it via the internal bridges from the compute node, I also have IPMI as a worse case.

qg-acda11d9-dd Link encap:Ethernet
          inet addr:10.2.1.202 Bcast:10.2.1.207 Mask:255.255.255.248
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

qr-792fef06-66 Link encap:Ethernet
          inet addr:10.0.0.1 Bcast:10.0.0.255 Mask:255.255.255.0
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

I have IP's on br-ex and br-int, should I remove those?

br-int Link encap:Ethernet
          inet addr:10.0.0.1 Bcast:10.0.0.255 Mask:255.255.255.0

br-ex Link encap:Ethernet
          inet addr:10.2.1.201 Bcast:10.250.1.207 Mask:255.255.255.248

I also have an public interface, eth0, that is how I normally connect to the server remotely.

eth0 Link encap:Ethernet
          inet addr:10.2.1.175 Bcast:10.250.1.255 Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Ideally I would like:
eth0 to be the default gateway interface for the system.
br-ex with eth3 port used just for 10.2.1.200/29 (instance VM NAT traffic)
br-int with eth1 port used for 10.0.0.0/24 (instance VM traffic)
br-img with eth2 port used for 10.0.1.0/24 (openstack management traffic)

I can delete the added gateway and get traffic flowing through the server again:

route del -net 0.0.0.0 gw 10.2.1.201

I can ping the floating ip and gateway ( 10.2.1.201 ) from the VM, but still not able to get past that, can't ping the next hop gateway ( 10.2.1.2 ).

The system default gateway is 10.2.1.2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to quantum (stable/folsom)

Fix proposed to branch: stable/folsom
Review: https://review.openstack.org/14743

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to quantum (stable/folsom)

Reviewed: https://review.openstack.org/14743
Committed: http://github.com/openstack/quantum/commit/b4f9b1f5f629e4be21ca51c17cfd72cab4aefe39
Submitter: Jenkins
Branch: stable/folsom

commit b4f9b1f5f629e4be21ca51c17cfd72cab4aefe39
Author: Gary Kotton <email address hidden>
Date: Fri Oct 5 06:07:13 2012 -0400

    Treat invalid namespace call

    Fixes bug 1060559

    Change-Id: I29250100416b87f55781fb7e97339f6d3761513f

Gary Kotton (garyk)
tags: removed: folsom-backport-potential
Chuck Short (zulcss)
Changed in quantum (Ubuntu):
status: New → Fix Released
Changed in quantum (Ubuntu Precise):
status: New → Confirmed
Thierry Carrez (ttx)
Changed in quantum:
status: Fix Committed → Fix Released
Changed in quantum (Ubuntu Quantal):
status: New → Confirmed
Revision history for this message
Clint Byrum (clint-fewbar) wrote : Please test proposed package

Hello Koaps, or anyone else affected,

Accepted quantum into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/quantum/2012.2.1-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in quantum (Ubuntu Quantal):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.8 KiB)

This bug was fixed in the package quantum - 2012.2.1-0ubuntu1

---------------
quantum (2012.2.1-0ubuntu1) quantal-proposed; urgency=low

  * Resynchronize with stable/folsom (1e774867) (LP: #1085255):
    - [aeabb42] There are routing problems when the dnsmasq port does not come
      first in the routing table (LP: #1083238)
    - [04aab72] Quantum linux bridge not optimized with libvirt (LP: #1078210)
    - [ca7fc10] getting quotas from database has severe performance implications
      (LP: #1075369)
    - [66605e8] failed to update an external network into non external network
      (LP: #1083387)
    - [c60051a] Quantum test suite leaks memory like a sieve (LP: #1065276)
    - [3179dfc] clear_db() does incomplete db teardown (LP: #1080988)
    - [c1e19d7] Unauthorized command: cat /proc/None/cmdline (LP: #1077651)
    - [af9e076] At times a instance will not receive an IP address from the DHCP
      agent (LP: #1081664)
    - [e0d1a7d] allow multiple floating-ip on single port if they use different
      fixed ips and/or external nets (LP: #1057844)
    - [8471d79] Delete port fails to gateway ip (LP: #1079980)
    - [aca8b4a] fixed_ip allocation which is not included within
      allocation_pools makes error when delete port or re-create port
      (LP: #1077292)
    - [eacc9d3] Mapping same bridge to different phyiscal networks succeed
      (LP: #1067669)
    - [51b4c82] python-quantum: not region aware (LP: #1080793)
    - [6f0a486] delete floatingip should be in one transaction to delete port
      (LP: #1080516)
    - [db6cda7] Remove qpid configuration variables no longer supported
    - [a112840] Allow NVP plugin to use per-tenant quota extension
    - [82b1a55] Quantum service does not restart after reboot (LP: #1073999)
    - [c01a839] There are some cases that L3 API with an invalid parameter
      returns 500. (LP: #1064765)
    - [26b383f] external network can be plugged also as internal network for one
      router (LP: #1053633)
    - [49f649c] There is a lot of cases that API with an invalid parameter
      returns 500. (LP: #1062046)
    - [4546a18] When create subnet, you con set up the value as cidr (the value
      isn't cidr form). (LP: #1067959)
    - [9ba453a] killfilter should handle updated/deleted executables
      (LP: #1073768)
    - [7c8a55c] a port which is not able to delete is made when floatingip
      create fails. (LP: #1064748)
    - [c9b84cf] Linux bridge port update causes exception (LP: #1072713)
    - [cb57932] I can't add interface to router, if there is another port in
      non-shared network of other tenant (LP: #1057558)
    - [574e278] Ryu plugin does not support Security Groups (LP: #1059393)
    - [607f486] tap device added to integration bridge without tag
      (LP: #1064070)
    - [21a0fdf] L3 agent external network flag (LP: #1056720)
    - [5cbaff4] router create with external_gateway_info fails with 500 always.
      (LP: #1064235)
    - [63b81f6] l3 db operations failed in multiple transactions (LP: #1070335)
    - [bff17fb] Ensure that the SqlSoup import is still supported.
    - [e091a29] l3_nat_agent was renamed to l3_agent
    - [9030969] remove default value of 'local_ip' of 10...

Read more...

Changed in quantum (Ubuntu Quantal):
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in quantum:
milestone: grizzly-1 → 2013.1
Revision history for this message
Steve Langasek (vorlon) wrote :

The Precise Pangolin has reached end of life, so this bug will not be fixed for that release

Changed in quantum (Ubuntu Precise):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.