Comment 10 for bug 1795870

Revision history for this message
Miguel Lavalle (minsel) wrote :

Neutron server is losing contact with the L3 agent running in the controller node. One example is:

Jan 26 18:27:23.664199 ubuntu-bionic-rax-iad-0002168118 neutron-server[6878]: WARNING neutron.db.agents_db [None req-96b7c5e3-0c74-48ca-92a2-6b43a9ef6544 None None] Agent healthcheck: found 1 dead agents out of 8:
Jan 26 18:27:23.664199 ubuntu-bionic-rax-iad-0002168118 neutron-server[6878]: Type Last heartbeat host
Jan 26 18:27:23.664199 ubuntu-bionic-rax-iad-0002168118 neutron-server[6878]: L3 agent 2019-01-26 18:25:44 ubuntu-bionic-rax-iad-0002168118

Checking in the L3 agent log around the time the first instance of the above message is seen, we can find this traceback: http://paste.openstack.org/show/744001/. Please note that this traceback takes place at Jan 26 18:25:56.559883, whereas the Neutron server starts reporting loosing contact with the L3 agent (see message above) at Jan 26 18:27:23.664199, having received the last heartbeat at 2019-01-26 18:25:44. In fact, this is the last time the L3 agent reports receiving a router update:

Jan 26 18:25:56.399748 ubuntu-bionic-rax-iad-0002168118 neutron-l3-agent[8618]: DEBUG neutron.agent.l3.agent [None req-296cf80d-5b44-4c99-914d-499ec949394b tempest-NetworkMigrationFromHA-1759813396 tempest-NetworkMigrationFromHA-1759813396] Got routers updated notification :['e6e7911c-a3e0-4331-abe4-580aaf5ba2fc'] {{(pid=8618) routers_updated /opt/stack/neutron/neutron/agent/l3/agent.py:444}}

The router with uuid e6e7911c-a3e0-4331-abe4-580aaf5ba2fc is being migrated from HA to DVR by test case NetworkMigrationFromHA:test_from_ha_to_dvr.

I have confirmed a similar pattern takes place in several occurrences of this bug. In all cases, a router is being migrated from HA to DVR or legacy.

Nest step is to dig deeper in the traceback http://paste.openstack.org/show/744001/