Comment 4 for bug 903199

Revision history for this message
Dave Haynes (dave-haynes) wrote :

I have been using Meliae to monitor memory usage in nova-network.
The process accumulates objects which are never recovered by the garbage collector. They are _DummyThread objects, and they are created and documented in the Python threading.py module.

Each of these objects accounts for about 6KB of memory. In a simple test on Nova, creating a new instance and IP every 5 minutes, several hundred _DT objects accumulated during an overnight run. The pragmatic approach is to restart the process when the resident memory usage becomes significant.

The root cause is that certain operations are attempted from within eventlets which make a call to threading.current_thread(). The behaviour is demonstrated (not by myself) here:
https://gist.github.com/1346749/

The operations I have identified which do this are:

1. The lockfiles.synchronize decorator
2. logging.LogRecord.__init__
3. threading._after_fork which I think gets called back from C after subprocess.Popen.

It is possible to monkey-patch the first two, but the third is more difficult. The design of the Python standard libraries are not at fault here.

My feeling is that some re-engineering of Nova is needed, to lighten the load on the wsgi eventlet pool (string processing and low latency look-ups only there) and to hand over more involved operations to another subsystem which deals with lengthy tasks and subprocesses.

This would enable a clearer separation of concerns in the HA environment.