I focused on analysis of issue with Authentication Failures.
First I tried to reproduce this issue locally to check if slow answer from nova-metadata-api can really cause this problem. So I added simple time.sleep(15) in https://github.com/openstack/nova/blob/e5b819fb75320f192f79e92384bac8be043e9600/nova/api/metadata/handler.py#L94 and I got exactly same error coming from "ec2metadata" script which is in cirrus image.
So it looks that really some kind of slowdown in nova-metadata-api can cause this problem.
So, conclusion is that those errors comes for sure from slowdown in metadata service, either on neutron or nova's side. But for now I still don't know what the reason (or reasons) of slowdown are.
I focused on analysis of issue with Authentication Failures. /github. com/openstack/ nova/blob/ e5b819fb75320f1 92f79e92384bac8 be043e9600/ nova/api/ metadata/ handler. py#L94 and I got exactly same error coming from "ec2metadata" script which is in cirrus image.
First I tried to reproduce this issue locally to check if slow answer from nova-metadata-api can really cause this problem. So I added simple time.sleep(15) in https:/
So it looks that really some kind of slowdown in nova-metadata-api can cause this problem.
Next step was to check other similar failures. I found them with query like http:// logstash. openstack. org/#dashboard/ file/logstash. json?query= message% 3A%5C%22failed% 20to%20get% 20http% 3A%2F%2F169. 254.169. 254%2F2009- 04-04%2Fmeta- data%2Fpublic- keys%5C% 22
I checked logs from http:// logs.openstack. org/66/ 656666/ 1/check/ tempest- multinode- full/4d77549/ job-output. txt and I saw that there was also about 12 seconds of "nothing" in nova-metadata-logs here: http:// logs.openstack. org/66/ 656666/ 1/check/ tempest- multinode- full/4d77549/ controller/ logs/screen- n-api-meta. txt.gz# _May_01_ 19_37_47_ 835271 (it matched time when broken instance was trying to get public-keys from metadata). logs.openstack. org/66/ 656666/ 1/check/ tempest- multinode- full/4d77549/ controller/ logs/screen- n-api.txt. gz#_May_ 01_19_37_ 44_331631 and after that some errors with rabbitmq connection. Maybe those are related.
I also found that in almost same time there was also gap in logs of nova-api logs: http://
In the other case which I checked, it wasn't any rabbitmq error in nova logs but request for public-keys took more than 17s in neutron- metadata- logs: http:// logs.openstack. org/91/ 655791/ 4/check/ neutron- tempest- linuxbridge/ 1ce0a65/ logs/screen- q-meta. txt.gz# _May_02_ 21_35_14_ 860194
So, conclusion is that those errors comes for sure from slowdown in metadata service, either on neutron or nova's side. But for now I still don't know what the reason (or reasons) of slowdown are.