Bug #1798472 “Fullstack tests fails because process is not kille...” : Bugs : neutron

Revision history for this message

Slawek Kaplonski (slaweq) wrote on 2018-11-07:

#1

I used log stash query like: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22%2C%20line%2078%2C%20in%20stop%5C%22%20AND%20project%3A%5C%22openstack%2Fneutron%5C%22 to check those issues.
From what I see in couple of examples (all which I checked) it's always problem with neutron-openvswitch-agent which don't catch SIGTERM for some reason.

Revision history for this message

Slawek Kaplonski (slaweq) wrote on 2018-11-07:

#2

I compared some "good" and "bad" run of ovs agent.
From what I see there it looks that this agent was already "not responding".
In "good" log there is entry about subnet_delete, then network_delete and than catch SIGTERM, see:
http://logs.openstack.org/93/615893/1/gate/neutron-fullstack/bf3dc84/logs/dsvm-fullstack-logs/TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_ingress,openflow-native_/neutron-openvswitch-agent--2018-11-07--11-45-30-405950.txt.gz#_2018-11-07_11_45_50_152

In "bad" run, there is info about subnet_delete and that's all - there is no info about network_delete (which happened in server) and no info about catch SIGTER, see:

http://logs.openstack.org/93/615893/1/gate/neutron-fullstack/bf3dc84/logs/dsvm-fullstack-logs/TestBwLimitQoSOvs.test_bw_limit_qos_port_removed_ingress,openflow-cli_/neutron-openvswitch-agent--2018-11-07--11-42-09-570698.txt.gz#_2018-11-07_11_42_30_454

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-11-14: Fix proposed to neutron (master)

#3

Fix proposed to branch: master
Review: https://review.openstack.org/618024

Changed in neutron:
assignee:	nobody → Slawek Kaplonski (slaweq)
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-11-20: Fix merged to neutron (master)

#4

Reviewed: https://review.openstack.org/618024
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9b23abbdb68f7e0c80c305ec1874281f6dea7e9e
Submitter: Zuul
Branch: master

commit 9b23abbdb68f7e0c80c305ec1874281f6dea7e9e
Author: Slawek Kaplonski <email address hidden>
Date: Wed Nov 14 21:31:04 2018 +0100

Add kill_timeout to AsyncProcess

    AsyncProcess.stop() method has now additional parameter
    kill_timeout. If this is set to some value different than
    None, eventlet.green.subprocess.Popen.wait() will be called
    with this timeout, so TimeoutExpired exception will be raised
    in case if process will not be killed for this "kill_timeout"
    time.
    In such case process will be killed "again" with SIGKILL signal
    to make sure that it is gone.

    This should fix problem with failing fullstack tests, when
    ovs_agent process is sometimes not killed and test timeout was
    reached in this wait() method.

Change-Id: I1e12255e5e142c395adf4e67be9d9da0f7a3d4fd
Closes-Bug: #1798472

Changed in neutron:
status:	In Progress → Fix Released

Bernard Cafarelli (bcafarel) on 2018-11-30

tags:

added: neutron-proactive-backport-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-12-21: Fix included in openstack/neutron 14.0.0.0b1

#5

This issue was fixed in the openstack/neutron 14.0.0.0b1 development milestone.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-04: Fix proposed to neutron (stable/rocky)

#6

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/628396

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-04: Fix proposed to neutron (stable/queens)

#7

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/628397

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-04: Fix proposed to neutron (stable/pike)

#8

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/628398

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-04: Fix merged to neutron (stable/rocky)

#9

Reviewed: https://review.openstack.org/628396
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=025e767b94cf29e522a67e3c1ddd8aa3dba9140a
Submitter: Zuul
Branch: stable/rocky

commit 025e767b94cf29e522a67e3c1ddd8aa3dba9140a
Author: Slawek Kaplonski <email address hidden>
Date: Wed Nov 14 21:31:04 2018 +0100

Add kill_timeout to AsyncProcess

    AsyncProcess.stop() method has now additional parameter
    kill_timeout. If this is set to some value different than
    None, eventlet.green.subprocess.Popen.wait() will be called
    with this timeout, so TimeoutExpired exception will be raised
    in case if process will not be killed for this "kill_timeout"
    time.
    In such case process will be killed "again" with SIGKILL signal
    to make sure that it is gone.

    This should fix problem with failing fullstack tests, when
    ovs_agent process is sometimes not killed and test timeout was
    reached in this wait() method.

    Change-Id: I1e12255e5e142c395adf4e67be9d9da0f7a3d4fd
    Closes-Bug: #1798472
    (cherry picked from commit 9b23abbdb68f7e0c80c305ec1874281f6dea7e9e)

tags:	added: in-stable-rocky
tags:	added: in-stable-queens

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-04: Fix merged to neutron (stable/queens)

#10

Reviewed: https://review.openstack.org/628397
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b5a0401472246dd3efa48631766faa72634ba36b
Submitter: Zuul
Branch: stable/queens

commit b5a0401472246dd3efa48631766faa72634ba36b
Author: Slawek Kaplonski <email address hidden>
Date: Wed Nov 14 21:31:04 2018 +0100

Add kill_timeout to AsyncProcess

    AsyncProcess.stop() method has now additional parameter
    kill_timeout. If this is set to some value different than
    None, eventlet.green.subprocess.Popen.wait() will be called
    with this timeout, so TimeoutExpired exception will be raised
    in case if process will not be killed for this "kill_timeout"
    time.
    In such case process will be killed "again" with SIGKILL signal
    to make sure that it is gone.

    This should fix problem with failing fullstack tests, when
    ovs_agent process is sometimes not killed and test timeout was
    reached in this wait() method.

    Change-Id: I1e12255e5e142c395adf4e67be9d9da0f7a3d4fd
    Closes-Bug: #1798472
    (cherry picked from commit 9b23abbdb68f7e0c80c305ec1874281f6dea7e9e)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-09: Fix merged to neutron (stable/pike)

#11

Reviewed: https://review.openstack.org/628398
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c86473d1a66830df1ce8eae28953e96ab96d53ea
Submitter: Zuul
Branch: stable/pike

commit c86473d1a66830df1ce8eae28953e96ab96d53ea
Author: Slawek Kaplonski <email address hidden>
Date: Wed Nov 14 21:31:04 2018 +0100

Add kill_timeout to AsyncProcess

    AsyncProcess.stop() method has now additional parameter
    kill_timeout. If this is set to some value different than
    None, eventlet.green.subprocess.Popen.wait() will be called
    with this timeout, so TimeoutExpired exception will be raised
    in case if process will not be killed for this "kill_timeout"
    time.
    In such case process will be killed "again" with SIGKILL signal
    to make sure that it is gone.

    This should fix problem with failing fullstack tests, when
    ovs_agent process is sometimes not killed and test timeout was
    reached in this wait() method.

Conflicts:
neutron/agent/linux/async_process.py

    Change-Id: I1e12255e5e142c395adf4e67be9d9da0f7a3d4fd
    Closes-Bug: #1798472
    (cherry picked from commit 9b23abbdb68f7e0c80c305ec1874281f6dea7e9e)

tags:

added: in-stable-pike

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-04-12: Fix included in openstack/neutron 11.0.7

#12

This issue was fixed in the openstack/neutron 11.0.7 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-04-12: Fix included in openstack/neutron 13.0.3

#13

This issue was fixed in the openstack/neutron 13.0.3 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-04-12: Fix included in openstack/neutron 12.0.6

#14

This issue was fixed in the openstack/neutron 12.0.6 release.

neutron

Fullstack tests fails because process is not killed properly

Bug Description

Other bug subscribers

Remote bug watches