Comment 2 for bug 929021

Revision history for this message
Monty Taylor (mordred) wrote : Re: available slave count should be monitored with alerts

Hi!

This is actually in relation to a pool of cloud servers that we maintain as part of the devstack-gate scripts - they aren't actual jenkins slaves (yet, although there is work in the jclouds plugin to achieve this)

For devstack tests, we create a new cloud server, then run devstack on it, then delete the server when we're done. Creating cloud servers fails frequently though - so rather than tying creation of the server to the running of the job, we have a different process that creates the servers and keeps a spare set of them. Even that breaks sometimes - sometimes we can't create new nodes as fast as we consume them, so the pool gets to small, and then tests start failing because they can't get a server to run on.

If you pull https://github.com/openstack-ci/devstack-gate, you'll see the code that manages this process, as well as vmdatabase.py, which contains a description of the database where information is stored about the pool of slaves.

If there was a script which could run and check to see if the slave count was above a given number, we could run a jenkins job periodically to run the script and then configure that to send an alert to the IRC channels. Something like:

./devstack-vm-threshold.py 5

Which would exit 0 if there were at least 5 available slaves in the pool and would print a message about how many slaves there were and exit 1 if there were less than that.

As a bonus, if the script also wrote out a file that contained two lines, like this:

slaves
5

Then we can configure the jenkins graphing module to make a graph of the slave count over time.