Healthmonitor fails to spawn new amphora from spare pool

Bug #1558934 reported by Kevin
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
octavia
Fix Released
Critical
Michael Johnson

Bug Description

Active standby topology
Spare amphora pool = 2

When an amphora is stopped, the health monitor will correctly detect a stale amphora. However, the controller fails to configure a new amphora when it is allocating a new amphora from a spare amphora pool. It fails to plug the vip and raises an exception NotFound: Not Found.

Gist of o-hm.log
https://gist.github.com/anonymous/d6694757cc32d0d0626b#file-o-hm-log

From agent log in amphora
Amphora responds with 404 error to the plug_vip rest request
2016-03-17 23:21:47.199 568 WARNING werkzeug [-] * Debugger is active!
2016-03-17 23:21:47.224 568 INFO werkzeug [-] * Debugger pin code: 914-191-124
2016-03-17 23:22:32.480 568 INFO werkzeug [-] 192.168.0.3 - - [17/Mar/2016 23:22:32] "POST /0.5/plug/vip/10.0.0.3 HTTP/1.1" 404 -

If i'm not using a spare amphora pool, the failover works correctly. No issues with creating and configuring new amphora.

Changed in octavia:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
Michael Johnson (johnsom) wrote :

Ok, so when spares pool is not used we are moving the ports via the ComputeCreate task "ports" parameter. Under spares, the plug vip or network deltas is not being called.

Changed in octavia:
assignee: nobody → Michael Johnson (johnsom)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia (master)

Fix proposed to branch: master
Review: https://review.openstack.org/295475

Changed in octavia:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia (master)

Reviewed: https://review.openstack.org/295475
Committed: https://git.openstack.org/cgit/openstack/octavia/commit/?id=f7c776fdd96c7857506737e4c150413244368850
Submitter: Jenkins
Branch: master

commit f7c776fdd96c7857506737e4c150413244368850
Author: Michael Johnson <email address hidden>
Date: Mon Mar 21 19:43:48 2016 +0000

    Fixes failover when using a spares pool

    The failover flow was not plugging the ports back into the
    amphora if the failover used an amphora from the spares pool.
    This patch adds a task to plug the ports back into the amphora
    during failover

    Change-Id: Id7f0e60650ca2b35afb2695181897674abb9d8cf
    Closes-Bug: #1558934

Changed in octavia:
status: In Progress → Fix Released
Revision history for this message
Kevin (kevin-tran-h) wrote :
Download full text (3.7 KiB)

Hi Micheal, I applied the fix you released and now I am facing a new issue in the health monitor. It has to do with the changes to the FailoverPreparationForAmphora task.

I am getting this error BadRequest: Unrecognized attribute(s) 'dns_name'
Would this be an issue with my neutronclient?

o-hm log
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker Traceback (most recent call last):
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker File "/usr/local/lib/python2.7/dist-packages/taskflow/engines/action_engine/executor.py", line 82, in _execute_task
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker result = task.execute(**arguments)
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker File "/opt/stack/octavia/octavia/controller/worker/tasks/network_tasks.py", line 393, in execute
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker self.network_driver.failover_preparation(amphora)
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker File "/opt/stack/octavia/octavia/network/drivers/neutron/allowed_address_pairs.py", line 446, in failover_preparation
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker 'dns_name': ''}})
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker File "/usr/local/lib/python2.7/dist-packages/neutronclient/v2_0/client.py", line 97, in with_params
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker ret = self.function(instance, *args, **kwargs)
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker File "/usr/local/lib/python2.7/dist-packages/neutronclient/v2_0/client.py", line 659, in update_port
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker return self.put(self.port_path % (port), body=body)
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker File "/usr/local/lib/python2.7/dist-packages/neutronclient/v2_0/client.py", line 367, in put
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker headers=headers, params=params)
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker File "/usr/local/lib/python2.7/dist-packages/neutronclient/v2_0/client.py", line 335, in retry_request
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker headers=headers, params=params)
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker File "/usr/local/lib/python2.7/dist-packages/neutronclient/v2_0/client.py", line 298, in do_request
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker self._handle_fault_response(status_code, replybody, resp)
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker File "/usr/local/lib/python2.7/dist-packages/neutronclient/v2_0/client.py", line 273, in _handle_fault_response
2016-03-23 20:03:21.114 4131 ERROR octavia.controller.worker.controller_worker exception_handler_v20(status_code, error_bod...

Read more...

Revision history for this message
Michael Johnson (johnsom) wrote :

Hi Kevin.

The "dns_name" as added sometime in the Mitaka release.
Can you check that you have mitaka or newer neutron and neutronclient?

Revision history for this message
Kevin (kevin-tran-h) wrote :

Hi Micheal,

I am getting the same issue still. I just stacked a new devstack on a new machine. It has the latest commit for neutron. I'm not too sure what version of neutronclient it's on. In the /usr/local/lib/python2.7/dist-packages/, I see a python_neutronclient-4.1.1.dist-info. How would I check for the neutronclient version? How would I update it?

Revision history for this message
Michael Johnson (johnsom) wrote : Re: [Bug 1558934] Re: Healthmonitor fails to spawn new amphora from spare pool

Hi Kevin,

You can find the netruon client version by running: "neutron --version"
I have 4.1.2 installed.

I think you can update with "pip install --upgrade python-neutronclient"

This is a bit odd as I thought the error above was returned from neutron proper.

Michael

On Mon, Mar 28, 2016 at 10:06 AM, Kevin <email address hidden> wrote:
> Hi Micheal,
>
> I am getting the same issue still. I just stacked a new devstack on a
> new machine. It has the latest commit for neutron. I'm not too sure what
> version of neutronclient it's on. In the /usr/local/lib/python2.7/dist-
> packages/, I see a python_neutronclient-4.1.1.dist-info. How would I
> check for the neutronclient version? How would I update it?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1558934
>
> Title:
> Healthmonitor fails to spawn new amphora from spare pool
>
> Status in octavia:
> Fix Released
>
> Bug description:
> Active standby topology
> Spare amphora pool = 2
>
> When an amphora is stopped, the health monitor will correctly detect a
> stale amphora. However, the controller fails to configure a new
> amphora when it is allocating a new amphora from a spare amphora pool.
> It fails to plug the vip and raises an exception NotFound: Not Found.
>
> Gist of o-hm.log
> https://gist.github.com/anonymous/d6694757cc32d0d0626b#file-o-hm-log
>
> From agent log in amphora
> Amphora responds with 404 error to the plug_vip rest request
> 2016-03-17 23:21:47.199 568 WARNING werkzeug [-] * Debugger is active!
> 2016-03-17 23:21:47.224 568 INFO werkzeug [-] * Debugger pin code: 914-191-124
> 2016-03-17 23:22:32.480 568 INFO werkzeug [-] 192.168.0.3 - - [17/Mar/2016 23:22:32] "POST /0.5/plug/vip/10.0.0.3 HTTP/1.1" 404 -
>
> If i'm not using a spare amphora pool, the failover works correctly.
> No issues with creating and configuring new amphora.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/octavia/+bug/1558934/+subscriptions

Revision history for this message
Bharath (bharathm) wrote :

It appears dns_name was added to neutron client version 4.1.0. Presuming Kevin is using 4.1.1, this shouldn't be an issue.

Having said that, from the above logs Kevin posted, it's the neutron server returning this error but not the neutron client. So, I would check if neutron bits are latest. In Devstack you could run "$git log" under /opt/stack/neutron to see if you have the latest commits.

I did a fresh install of devstack an hour ago and I tried spare amp and failover with Single topology (though I don't think this error in anyway particularly related to Active-standby topology) and I don't see any such errors.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/298803

Revision history for this message
Kevin (kevin-tran-h) wrote :

Thanks you for the help.

I'm ran the latest on a ovs setup and failover worked fine without any issues. However, on my current setup, it's still has the same error. I reported the failover issue since it occurred in ovs and ovn. I am currently testing the latest on an ovn devstack setup. Could the issue be related to ovn?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia (stable/mitaka)

Reviewed: https://review.openstack.org/298803
Committed: https://git.openstack.org/cgit/openstack/octavia/commit/?id=eb2a06b47d1c45a55043a2f64957f58b5d983c1d
Submitter: Jenkins
Branch: stable/mitaka

commit eb2a06b47d1c45a55043a2f64957f58b5d983c1d
Author: Michael Johnson <email address hidden>
Date: Mon Mar 21 19:43:48 2016 +0000

    Fixes failover when using a spares pool

    The failover flow was not plugging the ports back into the
    amphora if the failover used an amphora from the spares pool.
    This patch adds a task to plug the ports back into the amphora
    during failover

    Change-Id: Id7f0e60650ca2b35afb2695181897674abb9d8cf
    Closes-Bug: #1558934
    (cherry picked from commit f7c776fdd96c7857506737e4c150413244368850)

tags: added: in-stable-mitaka
Revision history for this message
Elena Ezhova (eezhova) wrote :

I hit this bug on stable/mitaka deployment, even though the code contains Michael's fix. I didn't debugged this yet, but I tried the same failover to the spare amphora scenario on a the most fresh master and experienced the same problem as Kevin described in #4.

I strongly suspect that this happens when dns-integration extension is not enabled, and since https://review.openstack.org/311640 it is not loaded by default unless 'dns' is added to the 'extension_driver' list in ml2_conf.ini. Given this I would like to re-open this bug.

Revision history for this message
selvakumar (selvakumar-nms) wrote :

Hi Elena,
I have also reproduced the same issue in mitaka code . Also I have enabled the dns integration and still I am having the same issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/octavia 0.9.0

This issue was fixed in the openstack/octavia 0.9.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/octavia 0.8.1

This issue was fixed in the openstack/octavia 0.8.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.