KARL3

Bug #1038167
Comment #4

Comment 4 for bug 1038167

Revision history for this message

Chris Rossi (chris-archimedeanco) wrote on 2013-03-04:

I did later do some more research that doesn't seem to have made it into the comment thread in this ticket. The essential problem is that there isn't a good Blob implementation that matches our case extremely well. When you create a ZODB instance with Blob storage, you tell it whether or not you want a "shared" blob storage. "Shared" means that all blobs live in a single master folder shared by all processes. No blobs are stored in the database instance itself. Non-shared means that blobs live in a single database instance and are retrieved over a network connection, and stored locally in a cache which is neither complete nor definitive. The code for this mode assumes that only one process will ever access this cache.

The problem for us is neither of these really fits our model well. Obviously, with multiple app servers and a single database server, a shared blob storage doesn't work for us. The database server must be responsible for the complete and definitive set of blobs, and our app servers need to retrieve and store them locally as needed. But, since the local blob cache is built with the assumption that it will only ever be used by a single process we hit the error above.

I think the ideal solution would be to "fix" ZODB such that the local blob caches don't require exclusive lock. This however might be somewhat expensive in terms of programming hours to fix and probably even more expensive in terms of politics of convincing the core devs that such a fix is necessary or suitable.

Another solution might be to re-engineer the clusters a bit so that there's a common networked disk that could be used as a shared blob storage, which is engineered for concurrency.

A more hacky solution would be something like what I suggested back in August: have each process generate its own ZODB URI that gives it its own blob cache. Some care needs to be taken to make sure these caches are removed even when a process exits abnormally. It's not the best solution in terms of performance, but it should be feasible.

The last, most expedient solution is simply to stifle an ERROR level log messages coming out of zc.lockfile. This should be possible with standard library logging configuration. We haven't been given any reason to believe the errors have an impact on the end users, so this might be a suitable solution.

I suspect, but have not confirmed, that the source of these error messages is actually the thread used by ZODB to prune blob caches to keep them under the maximum size. If that is a correct hypothesis, it may be possible to simply turn off the max limit and use an external script to limit the size of the blob cache.

It may be a matter of deciding which solution is the least objectionable. Let me know what you think.

I did later do some more research that doesn't seem to have made it into the comment thread in this ticket.  The essential problem is that there isn't a good Blob implementation that matches our case extremely well.  When you create a ZODB instance with Blob storage, you tell it whether or not you want a "shared" blob storage.  "Shared" means that all blobs live in a single master folder shared by all processes.  No blobs are stored in the database instance itself.  Non-shared means that blobs live in a single database instance and are retrieved over a network connection, and stored locally in a cache which is neither complete nor definitive.  The code for this mode assumes that only one process will ever access this cache.

The problem for us is neither of these really fits our model well.  Obviously, with multiple app servers and a single database server, a shared blob storage doesn't work for us.  The database server must be responsible for the complete and definitive set of blobs, and our app servers need to retrieve and store them locally as needed.  But, since the local blob cache is built with the assumption that it will only ever be used by a single process we hit the error above.

I think the ideal solution would be to "fix" ZODB such that the local blob caches don't require exclusive lock.  This however might be somewhat expensive in terms of programming hours to fix and probably even more expensive in terms of politics of convincing the core devs that such a fix is necessary or suitable.

Another solution might be to re-engineer the clusters a bit so that there's a common networked disk that could be used as a shared blob storage, which is engineered for concurrency.

A more hacky solution would be something like what I suggested back in August: have each process generate its own ZODB URI that gives it its own blob cache.  Some care needs to be taken to make sure these caches are removed even when a process exits abnormally.  It's not the best solution in terms of performance, but it should be feasible.

The last, most expedient solution is simply to stifle an ERROR level log messages coming out of zc.lockfile.  This should be possible with standard library logging configuration.  We haven't been given any reason to believe the errors have an impact on the end users, so this might be a suitable solution.

I suspect, but have not confirmed, that the source of these error messages is actually the thread used by ZODB to prune blob caches to keep them under the maximum size.  If that is a correct hypothesis, it may be possible to simply turn off the max limit and use an external script to limit the size of the blob cache.

It may be a matter of deciding which solution is the least objectionable.  Let me know what you think.