NoVNCConsoleTestJSON.test_novnc fails in nova-multi-cell job since 5/20

Bug #1830417 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
devstack
Fix Released
Medium
Stephen Finucane

Bug Description

Ever since we enabled the n-novnc service in the nova-multi-cell job on May 20:

https://github.com/openstack/nova/commit/c5b83c3fbca83726f4a956009e1788d26bcedde0#diff-7415f5ff7beee2cdf9ffe31e12e4c086

The tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc test has been intermittently failing like this:

2019-05-24 01:55:59.786818 | controller | {2} tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc [0.870805s] ... FAILED
2019-05-24 01:55:59.787151 | controller |
2019-05-24 01:55:59.787193 | controller | Captured traceback:
2019-05-24 01:55:59.787226 | controller | ~~~~~~~~~~~~~~~~~~~
2019-05-24 01:55:59.787271 | controller | b'Traceback (most recent call last):'
2019-05-24 01:55:59.787381 | controller | b' File "/opt/stack/tempest/tempest/api/compute/servers/test_novnc.py", line 194, in test_novnc'
2019-05-24 01:55:59.787450 | controller | b' self._validate_rfb_negotiation()'
2019-05-24 01:55:59.787550 | controller | b' File "/opt/stack/tempest/tempest/api/compute/servers/test_novnc.py", line 92, in _validate_rfb_negotiation'
2019-05-24 01:55:59.787643 | controller | b" 'Token must be invalid because the connection '"
2019-05-24 01:55:59.787748 | controller | b' File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/unittest2/case.py", line 696, in assertFalse'
2019-05-24 01:55:59.787796 | controller | b' raise self.failureException(msg)'
2019-05-24 01:55:59.787894 | controller | b'AssertionError: True is not false : Token must be invalid because the connection closed.'
2019-05-24 01:55:59.787922 | controller | b''

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22b'AssertionError%3A%20True%20is%20not%20false%20%3A%20Token%20must%20be%20invalid%20because%20the%20connection%20closed.'%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d

My guess would be (without checking the test or the code) that something isn't properly routing console auth token information/requests to the correct cell which is why we don't see this in a "single" cell job.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looks like the API request is making it into cell2:

http://logs.openstack.org/73/638073/31/check/nova-multi-cell/6ea3306/controller/logs/screen-n-cond-cell2.txt.gz#_May_24_01_55_59_037329

May 24 01:55:59.037329 ubuntu-bionic-vexxhost-sjc1-0006505703 nova-conductor[29514]: DEBUG nova.objects.console_auth_token [None req-b2bcfa81-3ab0-4c0d-80a9-0c801bbaadd2 tempest-NoVNCConsoleTestJSON-1582114334 tempest-NoVNCConsoleTestJSON-1582114334] Authorized token with expiry 1558663559 for console connection ConsoleAuthToken(access_url_base='http://38.108.68.149:6080/vnc_auto.html',console_type='novnc',created_at=2019-05-24T01:55:59Z,host='38.108.68.205',id=1,instance_uuid=d3895054-0cc8-4539-82f7-66bec6decf4a,internal_access_path=None,port=5901,token='***',updated_at=None) {{(pid=30338) authorize /opt/stack/nova/nova/objects/console_auth_token.py:136}}

It looks like that is routing properly because the instance is on the host in cell2:

http://logs.openstack.org/73/638073/31/check/nova-multi-cell/6ea3306/compute1/logs/screen-n-cpu.txt.gz#_May_24_01_55_57_776181

May 24 01:55:57.776181 ubuntu-bionic-vexxhost-sjc1-0006505705 nova-compute[17132]: INFO nova.compute.manager [None req-a5e680df-3175-4f31-b2cb-48318ce6f853 tempest-NoVNCConsoleTestJSON-1582114334 tempest-NoVNCConsoleTestJSON-1582114334] [instance: d3895054-0cc8-4539-82f7-66bec6decf4a] Took 6.86 seconds to build instance.

http://logs.openstack.org/73/638073/31/check/nova-multi-cell/6ea3306/compute1/logs/etc/nova/nova-cpu_conf.txt.gz

transport_url = rabbit://stackrabbit:secretrabbit@38.108.68.149:5672/nova_cell2

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/661371

Revision history for this message
Matt Riedemann (mriedem) wrote :

(1:48:02 PM) melwitt: mriedem: I think what's wrong is that the vnc proxy urls in the nova-cpu.conf files are both using port 6080 when one of them (the compute in cell2) should be using 6084
(1:48:10 PM) melwitt: http://logs.openstack.org/73/638073/31/check/nova-multi-cell/6ea3306/controller/logs/etc/nova/nova-cpu_conf.txt.gz vs http://logs.openstack.org/73/638073/31/check/nova-multi-cell/6ea3306/compute1/logs/etc/nova/nova-cpu_conf.txt.gz
(1:49:05 PM) mriedem: ah yup
(1:49:24 PM) melwitt: and from what I recall in devstack, I don't know how we can configure separate ones and use them in an easy way

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (master)

Fix proposed to branch: master
Review: https://review.opendev.org/661391

Changed in devstack:
assignee: nobody → melanie witt (melwitt)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/661371
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=47b73030cba7d67fe308c5b89c2a64e93d74bcd6
Submitter: Zuul
Branch: master

commit 47b73030cba7d67fe308c5b89c2a64e93d74bcd6
Author: Matt Riedemann <email address hidden>
Date: Fri May 24 14:23:05 2019 -0400

    Skip novnc tests in multi-cell job until bug 1830417 is fixed

    NoVNCConsoleTestJSON.test_novnc is intermittently failing
    in the multi-cell job when trying to get a vnc console for
    an instance in cell2. It looks like requests aren't getting
    routed to the novnc-cell2 service properly, so let's skip
    novnc tests until the bug is fixed since we want to stabilize
    this job to make it voting.

    Change-Id: I74a4b392ccec66a7a80a117d1cdf83a418c7dcdc
    Related-Bug: #1830417

Changed in devstack:
assignee: melanie witt (melwitt) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem)
Changed in devstack:
assignee: Matt Riedemann (mriedem) → melanie witt (melwitt)
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/667388

Matt Riedemann (mriedem)
no longer affects: nova
Changed in devstack:
assignee: melanie witt (melwitt) → Stephen Finucane (stephenfinucane)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (master)

Reviewed: https://review.opendev.org/661391
Committed: https://git.openstack.org/cgit/openstack/devstack/commit/?id=d7d902f6b60298d56359a07a63b2355373e54956
Submitter: Zuul
Branch: master

commit d7d902f6b60298d56359a07a63b2355373e54956
Author: melanie witt <email address hidden>
Date: Fri May 24 20:09:28 2019 +0000

    Configure console proxy ports in nova-cpu.conf

    In change I8934d0b9392f2976347391c8a650ad260f337762, we began
    configuring console proxy ports for multiple cells in the nova
    controller config files to avoid "Address already in use" errors from
    port collisions when running multiple cells on a single host.

    This correspondingly configures the console proxy ports in the nova
    compute config file based on what cell we're in, according to the
    NOVA_CPU_CELL variable.

    The base_url config for serial console is also added where the default
    was previously used. The url is taken from the config option default in
    the nova code: nova/conf/serial_console.py [1].

    [1] https://github.com/openstack/nova/blob/8f00b5d/nova/conf/serial_console.py#L54

    Change-Id: Id885fc5a769bce8111f1052a1b55d26be817c890
    Closes-Bug: #1830417

Changed in devstack:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.