Restarting Neutron containers which make use of network namespaces doesn't work

Bug #1748658 reported by Daniel Alvarez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Invalid
Undecided
Unassigned
tripleo
Fix Released
High
Brent Eagles

Bug Description

When DHCP, L3, Metadata or OVN-Metadata containers are restarted they can't
set the previous namespaces:

[heat-admin@overcloud-novacompute-0 neutron]$ sudo docker restart 8559f5a7fa45
8559f5a7fa45

[heat-admin@overcloud-novacompute-0 neutron]$ tail -f /var/log/containers/neutron/networking-ovn-metadata-agent.log
2018-02-09 08:34:41.059 5 CRITICAL neutron [-] Unhandled error: ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: RTNETLINK answers: Invalid argument
2018-02-09 08:34:41.059 5 ERROR neutron Traceback (most recent call last):
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/bin/networking-ovn-metadata-agent", line 10, in <module>
2018-02-09 08:34:41.059 5 ERROR neutron sys.exit(main())
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/networking_ovn/cmd/eventlet/agents/metadata.py", line 17, in main
2018-02-09 08:34:41.059 5 ERROR neutron metadata_agent.main()
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata_agent.py", line 38, in main
2018-02-09 08:34:41.059 5 ERROR neutron agt.start()
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata/agent.py", line 147, in start
2018-02-09 08:34:41.059 5 ERROR neutron self.sync()
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata/agent.py", line 56, in wrapped
2018-02-09 08:34:41.059 5 ERROR neutron return f(*args, **kwargs)
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata/agent.py", line 169, in sync
2018-02-09 08:34:41.059 5 ERROR neutron metadata_namespaces = self.ensure_all_networks_provisioned()
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata/agent.py", line 350, in ensure_all_networks_provisioned
2018-02-09 08:34:41.059 5 ERROR neutron netns = self.provision_datapath(datapath)
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata/agent.py", line 294, in provision_datapath
2018-02-09 08:34:41.059 5 ERROR neutron veth_name[0], veth_name[1], namespace)
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 182, in add_veth
2018-02-09 08:34:41.059 5 ERROR neutron self._as_root([], 'link', tuple(args))
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 94, in _as_root
2018-02-09 08:34:41.059 5 ERROR neutron namespace=namespace)
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 102, in _execute
2018-02-09 08:34:41.059 5 ERROR neutron log_fail_as_error=self.log_fail_as_error)
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 151, in execute
2018-02-09 08:34:41.059 5 ERROR neutron raise ProcessExecutionError(msg, returncode=returncode)
2018-02-09 08:34:41.059 5 ERROR neutron ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: RTNETLINK answers: Invalid argument
2018-02-09 08:34:41.059 5 ERROR neutron
2018-02-09 08:34:41.059 5 ERROR neutron
2018-02-09 08:34:41.177 21 INFO oslo_service.service [-] Parent process has died unexpectedly, exiting
2018-02-09 08:34:41.178 21 INFO eventlet.wsgi.server [-] (21) wsgi exited, is_accepting=True

An easy way to reproduce the bug:

[heat-admin@overcloud-novacompute-0 ~]$ sudo docker exec -u root -it 5c5f254a9321bd74b5911f46acb9513574c2cd9a3c59805a85cffd960bcc864d /bin/bash

[root@overcloud-novacompute-0 /]# ip netns a my_netns
[root@overcloud-novacompute-0 /]# exit

[heat-admin@overcloud-novacompute-0 ~]$ sudo ip netns
[heat-admin@overcloud-novacompute-0 ~]$ sudo docker restart 5c5f254a9321bd74b5911f46acb9513574c2cd9a3c59805a85cffd960bcc864d
5c5f254a9321bd74b5911f46acb9513574c2cd9a3c59805a85cffd960bcc864d

[heat-admin@overcloud-novacompute-0 ~]$ sudo docker exec -u root -it 5c5f254a9321bd74b5911f46acb9513574c2cd9a3c59805a85cffd960bcc864d /bin/bash
[root@overcloud-novacompute-0 /]# ip netns
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
my_netns

[root@overcloud-novacompute-0 /]# ip netns e my_netns ip a
RTNETLINK answers: Invalid argument
setting the network namespace "my_netns" failed: Invalid argument

Deleting everything under /run/netns/* from kolla_start but this would involve
a full sync of the agents which is not desirable:

[root@overcloud-novacompute-0 /]# rm /run/netns/my_netns
rm: remove regular empty file '/run/netns/my_netns'? y
[root@overcloud-novacompute-0 /]# ip netns
[root@overcloud-novacompute-0 /]# ip netns a my_netns
[root@overcloud-novacompute-0 /]#

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

For containerized services deployed with tripleo, it's addressed in https://review.openstack.org/#/c/542858/

Changed in tripleo:
status: New → In Progress
milestone: none → queens-rc1
importance: Undecided → High
assignee: nobody → Brent Eagles (beagles)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/542858
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=fb27465c6fb734d7106130765f918ed91bc329ad
Submitter: Zuul
Branch: master

commit fb27465c6fb734d7106130765f918ed91bc329ad
Author: Brent Eagles <email address hidden>
Date: Fri Feb 9 11:11:00 2018 -0330

    Mount netns as shared to persist namespaces

    Dataplane is breaking when containers are killed. Keeping the namespaces
    around allows the network objects (ports, bridges) etc. to remain
    intact without the containers running.

    Closes-Bug: #1748658
    Change-Id: I092500e9ec0820347ba0f865f3c24f828980af3a

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/548400

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 8.0.0.0rc1

This issue was fixed in the openstack/tripleo-heat-templates 8.0.0.0rc1 release candidate.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

I believe the fix belongs to packaging / containers / tripleo and not neutron itself.

Changed in neutron:
status: New → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/pike)

Reviewed: https://review.openstack.org/548400
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=41316670fe7afae9031a678a4b0187d485a02fd1
Submitter: Zuul
Branch: stable/pike

commit 41316670fe7afae9031a678a4b0187d485a02fd1
Author: Brent Eagles <email address hidden>
Date: Fri Feb 9 11:11:00 2018 -0330

    Mount netns as shared to persist namespaces

    Dataplane is breaking when containers are killed. Keeping the namespaces
    around allows the network objects (ports, bridges) etc. to remain
    intact without the containers running.

    Closes-Bug: #1748658
    Change-Id: I092500e9ec0820347ba0f865f3c24f828980af3a
    (cherry picked from commit fb27465c6fb734d7106130765f918ed91bc329ad)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 7.0.12

This issue was fixed in the openstack/tripleo-heat-templates 7.0.12 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.