gate-neutron-dsvm-functional race fails HA/DVR tests with network namespace not found

Bug #1446261 reported by Matt Riedemann
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Critical
Assaf Muller

Bug Description

http://logs.openstack.org/21/174821/1/gate/gate-neutron-dsvm-functional/eb6b441/console.html#_2015-04-20_07_41_14_649

This is happening quite often and is not just this specific test:

2015-04-20 07:41:14.649 | 2015-04-20 07:41:14.642 | {4} neutron.tests.functional.agent.test_l3_agent.L3HATestFramework.test_ha_router_failover [12.678377s] ... FAILED
2015-04-20 07:41:14.650 | 2015-04-20 07:41:14.643 |
2015-04-20 07:41:14.670 | 2015-04-20 07:41:14.647 | Captured traceback:
2015-04-20 07:41:14.671 | 2015-04-20 07:41:14.649 | ~~~~~~~~~~~~~~~~~~~
2015-04-20 07:41:14.671 | 2015-04-20 07:41:14.650 | Traceback (most recent call last):
2015-04-20 07:41:14.672 | 2015-04-20 07:41:14.652 | File "neutron/tests/functional/agent/test_l3_agent.py", line 762, in test_ha_router_failover
2015-04-20 07:41:14.673 | 2015-04-20 07:41:14.653 | ha_device.link.set_down()
2015-04-20 07:41:14.673 | 2015-04-20 07:41:14.655 | File "neutron/agent/linux/ip_lib.py", line 279, in set_down
2015-04-20 07:41:14.674 | 2015-04-20 07:41:14.658 | self._as_root([], ('set', self.name, 'down'))
2015-04-20 07:41:14.675 | 2015-04-20 07:41:14.661 | File "neutron/agent/linux/ip_lib.py", line 222, in _as_root
2015-04-20 07:41:14.675 | 2015-04-20 07:41:14.663 | use_root_namespace=use_root_namespace)
2015-04-20 07:41:14.676 | 2015-04-20 07:41:14.664 | File "neutron/agent/linux/ip_lib.py", line 69, in _as_root
2015-04-20 07:41:14.677 | 2015-04-20 07:41:14.666 | log_fail_as_error=self.log_fail_as_error)
2015-04-20 07:41:14.677 | 2015-04-20 07:41:14.668 | File "neutron/agent/linux/ip_lib.py", line 78, in _execute
2015-04-20 07:41:14.679 | 2015-04-20 07:41:14.672 | log_fail_as_error=log_fail_as_error)
2015-04-20 07:41:14.681 | 2015-04-20 07:41:14.675 | File "neutron/agent/linux/utils.py", line 137, in execute
2015-04-20 07:41:14.683 | 2015-04-20 07:41:14.676 | raise RuntimeError(m)
2015-04-20 07:41:14.684 | 2015-04-20 07:41:14.678 | RuntimeError:
2015-04-20 07:41:14.686 | 2015-04-20 07:41:14.679 | Command: ['ip', 'netns', 'exec', 'qrouter-425dfc14-0f4d-45fa-8218-531cae21711f@agent1', 'ip', 'link', 'set', 'ha-29cbe060-37', 'down']
2015-04-20 07:41:14.688 | 2015-04-20 07:41:14.681 | Exit code: 1
2015-04-20 07:41:14.689 | 2015-04-20 07:41:14.682 | Stdin:
2015-04-20 07:41:14.691 | 2015-04-20 07:41:14.684 | Stdout:
2015-04-20 07:41:14.692 | 2015-04-20 07:41:14.685 | Stderr: Cannot open network namespace "qrouter-425dfc14-0f4d-45fa-8218-531cae21711f@agent1": No such file or directory

If you restrict to just the gate job, it's 62 hits in 7 days:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiU3RkZXJyOiBDYW5ub3Qgb3BlbiBuZXR3b3JrIG5hbWVzcGFjZSBcXFwicXJvdXRlclwiIEFORCBtZXNzYWdlOlwiQGFnZW50MVxcXCI6IE5vIHN1Y2ggZmlsZSBvciBkaXJlY3RvcnlcIiBBTkQgYnVpbGRfbmFtZTpcImdhdGUtbmV1dHJvbi1kc3ZtLWZ1bmN0aW9uYWxcIiBBTkQgdGFnczpcImNvbnNvbGVcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTQyOTU0MDU5NTI0NiwibW9kZSI6IiIsImFuYWx5emVfZmllbGQiOiIifQ==

Tags: dvr ha testing
Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in neutron:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Kyle Mestery (mestery) wrote :

Assigning to Carl to triage this one initially.

Changed in neutron:
assignee: nobody → Carl Baldwin (carl-baldwin)
Revision history for this message
Matt Riedemann (mriedem) wrote :

From logstash this started showing up around 4/15.

Revision history for this message
Kyle Mestery (mestery) wrote :

Per discussion with mriedem on IRC, moving this to Maru. Apparently Maru upped a timeout last week which may have made this less likely to happen.

Changed in neutron:
assignee: Carl Baldwin (carl-baldwin) → Maru Newby (maru)
Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in neutron:
assignee: Maru Newby (maru) → Carl Baldwin (carl-baldwin)
Revision history for this message
John Schwarz (jschwarz) wrote :

It is definitely related to this: https://bugs.launchpad.net/neutron/+bug/1446284 (l3 agent ran by fullstack tests deleted namespaces created by concurrent functional tests).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/175609

Changed in neutron:
status: Confirmed → In Progress
Maru Newby (maru)
Changed in neutron:
importance: High → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/176093

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/176922

Changed in neutron:
assignee: Carl Baldwin (carl-baldwin) → Assaf Muller (amuller)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/176922
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e2d5be1cb3094ffbfc979aa04262f3dbc43f38ec
Submitter: Jenkins
Branch: master

commit e2d5be1cb3094ffbfc979aa04262f3dbc43f38ec
Author: Assaf Muller <email address hidden>
Date: Thu Apr 23 13:43:29 2015 -0400

    Fix L3 agent functional tests random failures

    The test_ha_router_failover tests were not being unmocked. This
    is because the same object was being mocked twice, but unmocked
    once. The mock.patch.stopall call in the tests base class was rewinding
    the value of the object from the second mock to the first mock.

    Follow up tests in the same worker were using namespace
    names defined via the first mock in the failover test.

    Closes-Bug: #1446261
    Change-Id: I8f24b8bb3a6a501dbe210c2cc67c47fa4b76257c

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/175609
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d0d7030ce78cf3fb182a8d824b3770ab0f124d7a
Submitter: Jenkins
Branch: master

commit d0d7030ce78cf3fb182a8d824b3770ab0f124d7a
Author: Carl Baldwin <email address hidden>
Date: Mon Apr 20 22:15:46 2015 +0000

    Utilities for building/parsing netns names to facilitate testing

    Creating these utilities allows functional tests to mock them out more
    easily to in order to change the namespace identification and cleanup
    behavior.

    Change-Id: I76cb2dc43a0ca4a7ea27c2ea71b27068b92154ce
    Related-Bug: #1446261

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/176093
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=8d4cbb3911a4c5b38ef998b0425eab1994b3bc2d
Submitter: Jenkins
Branch: master

commit 8d4cbb3911a4c5b38ef998b0425eab1994b3bc2d
Author: Carl Baldwin <email address hidden>
Date: Tue Apr 21 21:36:33 2015 +0000

    Append @randtoken to L3 agent namespaces in full stack tests

    Change-Id: Ib180a5836f653385ec2877c50fbca6f850eff351
    Closes-Bug: #1446261

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (neutron-pecan)

Fix proposed to branch: neutron-pecan
Review: https://review.openstack.org/185072

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/187187

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/kilo)

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/187187
Reason: As per comments in https://review.openstack.org/178301

Thierry Carrez (ttx)
Changed in neutron:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: liberty-1 → 7.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.