Comment 13 for bug 992525

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

We've just written a patch to implement this auto-retry logic in all affected branches: 6.0, 6.1, 7.0 (and trunk).
It should be fairly safe and without much side-effect: it will try to replay RPC calls that result in a transaction rollback, caused by one of these 3 PostgreSQL error codes[1]:

 - SERIALIZATION_FAILURE (40001 - "cannot serialize transactions due to concurrent update")
 - DEADLOCK_DETECTED (40P01)
 - LOCK_NOT_AVAILABLE (50P03 - "could not obtain lock on row in relation ...")

Each of these errors is transient and caused by the presence of another concurrent transaction working on the same database entries. The likelihood of seeing that other transaction committed increases with every passing millisecond, so in most cases it should be sufficient to retry once after a little while.
After testing this patch with several clients hammering the server at the same time, we noticed that having 3-4 retries with several hundred milliseconds randomized delay seems to be allow them all to pass, whereas if we retry only once we still get a few failures when there are more than 2 concurrent transactions doing the same thing.

Concerning the side-effects, the failed transactions have just been rolled back, so replaying them is correct on a semantic level. In rare cases the rolled back transaction might have had a side effect on the rest of the world (e.g. sent an email or written a file), so replaying it might cause the side-effect to occur a second time. However this would be true even with manual replay instead of automatic replay - the user could simply press the same button again to retry. Basically we're just assuming the user did mean the transaction to happen so we're pressing the button again for her.

We've though of making the retry delay and/or count configurable, but the defaults should be fine for most cases. And if the default values are not good enough a proper analysis of the concurrency issue would probably be better than bumping up the settings without understanding them. With the default settings the auto-retry could delay the transaction for up to several dozen seconds, which already seems like a very large limit. Most auto-retried transactions will not be delayed for more than a few hundred milliseconds though.

Any feedback/tests for these sensitive patches would be appreciated. We're planning to merge them soon unless a problem is detected.

Thanks!

[1] see http://www.postgresql.org/docs/current/static/errcodes-appendix.html#ERRCODES-TABLE and http://initd.org/psycopg/docs/errorcodes.html