Upgrade from Rocky to Stein, router namespace disappear

Bug #1863982 reported by Kevin Zhao
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Incomplete
Undecided
Unassigned
neutron
New
Undecided
Unassigned

Bug Description

Upgrade All-in-one from Rocky to Stein.
Upgrading finished but the router namespace disappears.

============================================================
Before:
ip netns list
qrouter-79658dd5-e3b4-4b13-a361-16d696ed1d1c (id: 1)
qdhcp-4a183162-64f5-49f9-a615-7c0fd63cf2a8 (id: 0)

After:
ip netns list
============================================================
After about 1 minutes, dhcp ns has appeared and no error on dhcp-agent,
but qrouter ns is still missing, until manually restart the docker container l3-agent.

l3-agent error after upgrade:
2020-02-20 02:57:07.306 12 INFO neutron.common.config [-] Logging enabled!
2020-02-20 02:57:07.308 12 INFO neutron.common.config [-] /var/lib/kolla/venv/bin/neutron-l3-agent version 14.0.4
2020-02-20 02:57:08.616 12 INFO neutron.agent.l3.agent [req-95654890-dab3-4106-b56d-c2685fb96f29 - - - - -] Agent HA routers count 0
2020-02-20 02:57:08.619 12 INFO neutron.agent.agent_extensions_manager [req-95654890-dab3-4106-b56d-c2685fb96f29 - - - - -] Loaded agent extensions: []
2020-02-20 02:57:08.657 12 INFO eventlet.wsgi.server [-] (12) wsgi starting up on http:/var/lib/neutron/keepalived-state-change
2020-02-20 02:57:08.710 12 INFO neutron.agent.l3.agent [-] L3 agent started
2020-02-20 02:57:10.716 12 INFO oslo.privsep.daemon [req-681aad3f-ae14-4315-b96d-5e95225cdf92 - - - - -] Running privsep helper: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpg8Ihqa/privsep.sock']
2020-02-20 02:57:11.750 12 INFO oslo.privsep.daemon [req-681aad3f-ae14-4315-b96d-5e95225cdf92 - - - - -] Spawned new privsep daemon via rootwrap
2020-02-20 02:57:11.614 29 INFO oslo.privsep.daemon [-] privsep daemon starting
2020-02-20 02:57:11.622 29 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
2020-02-20 02:57:11.627 29 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/none
2020-02-20 02:57:11.628 29 INFO oslo.privsep.daemon [-] privsep daemon running as pid 29
2020-02-20 02:57:14.449 12 INFO neutron.agent.l3.agent [-] Starting router update for 79658dd5-e3b4-4b13-a361-16d696ed1d1c, action 3, priority 2, update_id 49908db7-8a8c-410f-84a7-9e95a3dede16. Wait time elapsed: 0.000
2020-02-20 02:57:24.160 12 ERROR neutron.agent.linux.utils [-] Exit code: 4; Stdin: # Generated by iptables_manager

2020-02-20 02:57:26.388 12 ERROR neutron.agent.l3.router_info self.process_floating_ip_address_scope_rules()
2020-02-20 02:57:26.388 12 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2020-02-20 02:57:26.388 12 ERROR neutron.agent.l3.router_info self.gen.next()
2020-02-20 02:57:26.388 12 ERROR neutron.agent.l3.router_info File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/linux/iptables_manager.py", line 438, in defer_apply
2020-02-20 02:57:26.388 12 ERROR neutron.agent.l3.router_info raise l3_exc.IpTablesApplyException(msg)
2020-02-20 02:57:26.388 12 ERROR neutron.agent.l3.router_info IpTablesApplyException: Failure applying iptables rules
2020-02-20 02:57:26.388 12 ERROR neutron.agent.l3.router_info
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent [-] Failed to process compatible router: 79658dd5-e3b4-4b13-a361-16d696ed1d1c: IpTablesApplyException: Failure applying iptables rules
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 723, in _process_routers_if_compatible
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent self._process_router_if_compatible(router)
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 567, in _process_router_if_compatible
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent self._process_added_router(router)
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 575, in _process_added_router
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent ri.process()
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/common/utils.py", line 161, in call
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent self.logger(e)
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent self.force_reraise()
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb)
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/common/utils.py", line 158, in call
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent return func(*args, **kwargs)
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 1189, in process
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent self.process_address_scope()
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 1152, in process_address_scope
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent self.process_floating_ip_address_scope_rules()
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent self.gen.next()
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/linux/iptables_manager.py", line 438, in defer_apply
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent raise l3_exc.IpTablesApplyException(msg)
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent IpTablesApplyException: Failure applying iptables rules
2020-02-20 02:57:26.389 12 ERROR neutron.agent.l3.agent
2020-02-20 02:57:26.391 12 INFO neutron.agent.l3.agent [-] Starting router update for 79658dd5-e3b4-4b13-a361-16d696ed1d1c, action 3, priority 2, update_id 49908db7-8a8c-410f-84a7-9e95a3dede16. Wait time elapsed: 11.942
2020-02-20 02:57:27.878 12 INFO neutron.agent.linux.interface [-] Device qg-a456d6e2-1d already exists
2020-02-20 02:57:32.276 12 ERROR neutron.agent.linux.utils [-] Exit code: 4; Stdin: # Generated by iptables_manager

Kevin Zhao (kevin-zhao)
Changed in kolla-ansible:
status: New → Confirmed
Revision history for this message
Kevin Zhao (kevin-zhao) wrote :
Download full text (3.4 KiB)

new info:
run twice confirm the issues.
Watch the netns change, and find:
The netns qrouter* missed at:
qrouter-d7faa15d-6f88-44de-8fef-7de54e7fee88 (id: 1) qdhcp-5de5301b-a3f4-482d-8a1b-23badaff80c4 (id: 0)
Thu Feb 20 07:37:21 GMT 2020
2
qrouter-d7faa15d-6f88-44de-8fef-7de54e7fee88 (id: 1)
qdhcp-5de5301b-a3f4-482d-8a1b-23badaff80c4 (id: 0) Thu Feb 20 07:37:22 GMT 2020 1

07:37:21

And that time the l3-agent had already tried for iptables restore many times.
2020-02-20 07:37:21.735 12 ERROR neutron.agent.linux.iptables_manager line 2: CHAIN_ADD failed (Device or resource busy): chain OUTPUT
2020-02-20 07:37:21.735 12 ERROR neutron.agent.linux.iptables_manager line 12: RULE_INSERT failed (No such file or directory): rule in chain OUTPUT
2020-02-20 07:37:21.735 12 ERROR neutron.agent.linux.iptables_manager line 13: RULE_INSERT failed (No such file or directory): rule in chain POSTROUTING
2020-02-20 07:37:21.735 12 ERROR neutron.agent.linux.iptables_manager line 14: RULE_APPEND failed (No such file or directory): rule in chain POSTROUTING
2020-02-20 07:37:21.735 12 ERROR neutron.agent.linux.iptables_manager line 15: RULE_INSERT failed (No such file or directory): rule in chain PREROUTING
2020-02-20 07:37:21.735 12 ERROR neutron.agent.linux.iptables_manager
2020-02-20 07:37:21.735 12 ERROR neutron.agent.linux.iptables_manager
2020-02-20 07:37:21.736 12 ERROR neutron.agent.l3.router_info [-] Failure applying iptables rules: IpTablesApplyException: Failure applying iptables rules
2020-02-20 07:37:21.736 12 ERROR neutron.agent.l3.router_info Traceback (most recent call last):
2020-02-20 07:37:21.736 12 ERROR neutron.agent.l3.router_info File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/common/utils.py", line 158, in call
2020-02-20 07:37:21.736 12 ERROR neutron.agent.l3.router_info return func(*args, **kwargs)
2020-02-20 07:37:21.736 12 ERROR neutron.agent.l3.router_info File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 1189, in process
2020-02-20 07:37:21.736 12 ERROR neutron.agent.l3.router_info self.process_address_scope()
2020-02-20 07:37:21.736 12 ERROR neutron.agent.l3.router_info File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 1152, in process_address_scope
2020-02-20 07:37:21.736 12 ERROR neutron.agent.l3.router_info self.process_floating_ip_address_scope_rules()
2020-02-20 07:37:21.736 12 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2020-02-20 07:37:21.736 12 ERROR neutron.agent.l3.router_info self.gen.next()
2020-02-20 07:37:21.736 12 ERROR neutron.agent.l3.router_info File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/agent/linux/iptables_manager.py", line 438, in defer_apply
2020-02-20 07:37:21.736 12 ERROR neutron.agent.l3.router_info raise l3_exc.IpTablesApplyException(msg)
2020-02-20 07:37:21.736 12 ERROR neutron.agent.l3.router_info IpTablesApplyException: Failure applying iptables rules
2020-02-20 07:37:21.738 12 WARNING neutron.agent.l3.agent [-] Hit retry limit with router u...

Read more...

Revision history for this message
Kevin Zhao (kevin-zhao) wrote :
Revision history for this message
Bernard Cafarelli (bcafarel) wrote :

From logs, this appears to be outside of neutron's direct control, failures like
"RULE_INSERT failed (No such file or directory): rule in chain OUTPUT" indicate some system problem when we try to call iptables.
A change to iptables backend, a docker update? System logs (and kolla-ansible feedback) could help

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

We (kolla-ansible) should extend our CI to test upgrades with active neutron resources to triage this issue.

It can be lack of cleanup as Kevin linked.

Changed in kolla-ansible:
assignee: nobody → Radosław Piliszek (yoctozepto)
Revision history for this message
Kevin Zhao (kevin-zhao) wrote :

Restart sequency:

TASK [neutron : Running Neutron database expand container] *********************************************************************************************************************************************************************************
changed: [localhost -> localhost]

TASK [neutron : include_tasks] *************************************************************************************************************************************************************************************************************
skipping: [localhost]

RUNNING HANDLER [neutron : Restart neutron-server container] *******************************************************************************************************************************************************************************
changed: [localhost]

RUNNING HANDLER [neutron : Restart neutron-openvswitch-agent container] ********************************************************************************************************************************************************************
changed: [localhost]

RUNNING HANDLER [neutron : Restart neutron-dhcp-agent container] ***************************************************************************************************************************************************************************
changed: [localhost]

RUNNING HANDLER [neutron : Restart neutron-l3-agent container] *****************************************************************************************************************************************************************************
changed: [localhost]

RUNNING HANDLER [neutron : Restart neutron-metadata-agent container] ***********************************************************************************************************************************************************************
changed: [localhost]

TASK [neutron : Checking neutron pending contract scripts] *********************************************************************************************************************************************************************************
changed: [localhost] => (item=neutron)
changed: [localhost] => (item=neutron-fwaas)
changed: [localhost] => (item=neutron-vpnaas)

TASK [neutron : Stopping all neutron-server for contract db] *******************************************************************************************************************************************************************************
skipping: [localhost]

TASK [neutron : Running Neutron database contract container] *******************************************************************************************************************************************************************************
changed: [localhost -> localhost]

Revision history for this message
Kevin Zhao (kevin-zhao) wrote :
Download full text (4.8 KiB)

2020-02-20 07:36:41.632 12 ERROR neutron.agent.linux.iptables_manager [-] IPTablesManager.apply failed to apply the following set of iptables rules:
      1. # Generated by iptables_manager
      2. *filter
      3. :FORWARD - [0:0]
      4. :INPUT - [0:0]
      5. :OUTPUT - [0:0]
      6. :neutron-filter-top - [0:0]
      7. :neutron-l3-agent-FORWARD - [0:0]
      8. :neutron-l3-agent-INPUT - [0:0]
      9. :neutron-l3-agent-OUTPUT - [0:0]
     10. :neutron-l3-agent-local - [0:0]
     11. :neutron-l3-agent-scope - [0:0]
     12. -I FORWARD 1 -j neutron-filter-top
     13. -I FORWARD 2 -j neutron-l3-agent-FORWARD
     14. -I INPUT 1 -j neutron-l3-agent-INPUT
     15. -I OUTPUT 1 -j neutron-filter-top
     16. -I OUTPUT 2 -j neutron-l3-agent-OUTPUT
     17. -I neutron-filter-top 1 -j neutron-l3-agent-local
     18. -I neutron-l3-agent-FORWARD 1 -j neutron-l3-agent-scope
     19. COMMIT
     20. # Completed by iptables_manager
     21. # Generated by iptables_manager
     22. *mangle
     23. :FORWARD - [0:0]
     24. :INPUT - [0:0]
     25. :OUTPUT - [0:0]
     26. :POSTROUTING - [0:0]
     27. :PREROUTING - [0:0]
     28. :neutron-l3-agent-FORWARD - [0:0]
     29. :neutron-l3-agent-INPUT - [0:0]
     30. :neutron-l3-agent-OUTPUT - [0:0]
     31. :neutron-l3-agent-POSTROUTING - [0:0]
     32. :neutron-l3-agent-PREROUTING - [0:0]
     33. :neutron-l3-agent-float-snat - [0:0]
     34. :neutron-l3-agent-floatingip - [0:0]
     35. :neutron-l3-agent-mark - [0:0]
     36. :neutron-l3-agent-scope - [0:0]
     37. -I FORWARD 1 -j neutron-l3-agent-FORWARD
     38. -I INPUT 1 -j neutron-l3-agent-INPUT
     39. -I OUTPUT 1 -j neutron-l3-agent-OUTPUT
     40. -I POSTROUTING 1 -j neutron-l3-agent-POSTROUTING
     41. -I PREROUTING 1 -j neutron-l3-agent-PREROUTING
     42. -I neutron-l3-agent-POSTROUTING 1 -o qg-98229976-3b -m connmark --mark 0x0/0xffff0000 -j CONNMARK --save-mark --nfmask 0xffff0000 --ctmask 0xffff0000
     43. -I neutron-l3-agent-PREROUTING 1 -j neutron-l3-agent-mark
     44. -I neutron-l3-agent-PREROUTING 2 -j neutron-l3-agent-scope
     45. -I neutron-l3-agent-PREROUTING 3 -m connmark ! --mark 0x0/0xffff0000 -j CONNMARK --restore-mark --nfmask 0xffff0000 --ctmask 0xffff0000
     46. -I neutron-l3-agent-PREROUTING 4 -j neutron-l3-agent-floatingip
     47. -I neutron-l3-agent-PREROUTING 5 -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0xffff
     48. -I neutron-l3-agent-float-snat 1 -m connmark --mark 0x0/0xffff0000 -j CONNMARK --save-mark --nfmask 0xffff0000 --ctmask 0xffff0000
     49. -I neutron-l3-agent-mark 1 -i qg-98229976-3b -j MARK --set-xmark 0x2/0xffff
     50. COMMIT
     51. # Completed by iptables_manager
     52. # Generated by iptables_manager
     53. *nat
     54. :OUTPUT - [0:0]
     55. :POSTROUTING - [0:0]
     56. :PREROUTING - [0:0]
     57. :neutron-l3-agent-OUTPUT - [0:0]
     58. :neutron-l3-agent-POSTROUTING - [0:0]
     59. :neutron-l3-agent-PREROUTING - [0:0]
     60. :neutron-l3-agent-float-snat - [0:0]
     61. :neutron-l3-agent-snat - [0:0]
     62. :neutron-postrouting-bottom - [0:0]
     63. -I OUTPUT 1 -j neutron-l3-agent-OUTPUT
     64. -I POSTROUTING 1 -j neutron-l3...

Read more...

Revision history for this message
Kevin Zhao (kevin-zhao) wrote :

quick update about run neutron-ovs-cleanup.

Tried step:
stop all agent.
Run neutron-ovs-cleanup at neutron_server container(priviledged mode), command:
$ neutron-ovs-cleanup

Start all the agents.
After that, the vms are not pingable.

Even reboot system or restart neutron services don't work.

Revision history for this message
Ryan Farrell (whereisrysmind) wrote :

Hello,
Bootstack has just encountered this issue on a customer's cloud, was this ever replicated and a reliable workaround discovered?
Thanks,
-Ryan

Revision history for this message
Ryan Farrell (whereisrysmind) wrote :

False alarm on our side; it turns out to be an expected configuration on our cloud.
Regards,
-Ryan

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Hmm, I could not reproduce Train->Ussuri. The namespace does not seem to go wrong (blocked/missing).

Changed in kolla-ansible:
assignee: Radosław Piliszek (yoctozepto) → nobody
Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

I wonder if it's Rocky->Stein specific change (or even very particular minor versions thereof).

Changed in kolla-ansible:
status: Confirmed → Incomplete
Revision history for this message
Michal Nasiadka (mnasiadka) wrote :

It probably is linked to this bug: https://bugs.launchpad.net/neutron/+bug/1811515

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.