Comment 15 for bug 1821912

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I focused on analysis of issue with Authentication Failures.
First I tried to reproduce this issue locally to check if slow answer from nova-metadata-api can really cause this problem. So I added simple time.sleep(15) in https://github.com/openstack/nova/blob/e5b819fb75320f192f79e92384bac8be043e9600/nova/api/metadata/handler.py#L94 and I got exactly same error coming from "ec2metadata" script which is in cirrus image.
So it looks that really some kind of slowdown in nova-metadata-api can cause this problem.

Next step was to check other similar failures. I found them with query like http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22failed%20to%20get%20http%3A%2F%2F169.254.169.254%2F2009-04-04%2Fmeta-data%2Fpublic-keys%5C%22

I checked logs from http://logs.openstack.org/66/656666/1/check/tempest-multinode-full/4d77549/job-output.txt and I saw that there was also about 12 seconds of "nothing" in nova-metadata-logs here: http://logs.openstack.org/66/656666/1/check/tempest-multinode-full/4d77549/controller/logs/screen-n-api-meta.txt.gz#_May_01_19_37_47_835271 (it matched time when broken instance was trying to get public-keys from metadata).
I also found that in almost same time there was also gap in logs of nova-api logs: http://logs.openstack.org/66/656666/1/check/tempest-multinode-full/4d77549/controller/logs/screen-n-api.txt.gz#_May_01_19_37_44_331631 and after that some errors with rabbitmq connection. Maybe those are related.

In the other case which I checked, it wasn't any rabbitmq error in nova logs but request for public-keys took more than 17s in neutron-metadata-logs: http://logs.openstack.org/91/655791/4/check/neutron-tempest-linuxbridge/1ce0a65/logs/screen-q-meta.txt.gz#_May_02_21_35_14_860194

So, conclusion is that those errors comes for sure from slowdown in metadata service, either on neutron or nova's side. But for now I still don't know what the reason (or reasons) of slowdown are.