Services fail to shut down on the old side of Grenade

Bug #1285323 reported by Matthew Treinish
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
grenade
Invalid
Undecided
Sean Dague

Bug Description

During a grenade run, some times services aren't shutting down correctly. This shows up in 2 ways, one in the sanity check that catches a still running service, and one in the sanity check that checks for the new services.

The latest Logstash Query for this is:

    ((message:"+ echo \'The following services are still running:") OR (message:"Error: Service" AND message:"is not running")) AND filename:"console.html" AND NOT build_name:"check-grenade-dsvm-partial-ncpu"

check-grenade-dsvm-partial-ncpu is excluded because it's still a work in progress.

Example log

http://logs.openstack.org/40/72040/5/gate/gate-grenade-dsvm/eef5df3/console.html#_2014-02-26_03_33_14_937

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

There are actually 160 in the last 7 days, all fails, across several different services, usually nova-compute and a lot of the time cinder-api isn't involved in the fail:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiKyBlY2hvIFxcJ1RoZSBmb2xsb3dpbmcgc2VydmljZXMgYXJlIHN0aWxsIHJ1bm5pbmc6XCIgQU5EIGZpbGVuYW1lOlwiY29uc29sZS5odG1sXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzOTM0NjQ4NDExNDB9

Looks like the first time it shows up is on 2/24 but it really spikes on 2/26.

Changed in grenade:
status: New → Confirmed
Revision history for this message
Joe Gordon (jogo) wrote :

Once you ignore the new job I am working in the number of hits drops way down.

message:"+ echo \'The following services are still running:" AND filename:"console.html" AND NOT build_name:"check-grenade-dsvm-partial-ncpu"

not sure why that is

Revision history for this message
Sean Dague (sdague) wrote :

Updated query - ((message:"+ echo \'The following services are still running:") OR (message:"Error: Service" AND message:"is not running")) AND filename:"console.html" AND NOT build_name:"check-grenade-dsvm-partial-ncpu"

summary: - Cinder-api failed to stop during transition from old to new
+ Services fail to shut down on the old side of Grenade
description: updated
Revision history for this message
Sean Dague (sdague) wrote :
Revision history for this message
Joe Gordon (jogo) wrote :

It looks like the latest manifestation of this bug is:

http://logs.openstack.org/42/90442/7/check/check-grenade-dsvm-neutron/cad7b84/logs/new/screen-q-vpn.txt.gz

2014-04-29 21:48:09.872 24709 ERROR neutron.agent.l3_agent [-] An interface driver must be specified

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to grenade (master)

Fix proposed to branch: master
Review: https://review.openstack.org/109250

Changed in grenade:
assignee: nobody → Sean Dague (sdague)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on grenade (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/109250

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Is it affecting neutron? Are there logs available for neutron-related cases of this issue in grenade job?

Changed in neutron:
status: New → Incomplete
Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

It seems this bug is back - with a massive spike in the past 36 hours.
I could not find a new bug filed for this so it might make sense to revive this one.

Logstash query: "message:"The following services are still running" AND message:"nova*" AND tags:"grenade.sh.txt"

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiVGhlIGZvbGxvd2luZyBzZXJ2aWNlcyBhcmUgc3RpbGwgcnVubmluZ1wiIEFORCBtZXNzYWdlOlwibm92YSpcIiBBTkQgdGFnczpcImdyZW5hZGUuc2gudHh0XCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0Mjk4NjI3MTMwNDMsIm1vZGUiOiIiLCJhbmFseXplX2ZpZWxkIjoiIn0=

3,364 hits in 7 days (logstash has a lot of dupes I expect actual jobs hit by this to be half this)
3,144 hits in 2 days

Failure rate: 77.6%

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This bug is > 172 days without activity. We are unsetting assignee and milestone and setting status to Incomplete in order to allow its expiry in 60 days.

If the bug is still valid, then update the bug status.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This bug is > 180 days without activity. We are unsetting assignee and milestone and setting status to Incomplete in order to allow its expiry in 60 days.

If the bug is still valid, then update the bug status.

no longer affects: neutron
Sean Dague (sdague)
Changed in grenade:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.