Need doomed transaction API

Bug #44032 reported by Diogo Matsubara
44
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Medium
Guilherme Salgado

Bug Description

It appears that rollback statements are occuring in the middle of handling requests, yet the request continues to be processed. This may cause errors at commit time as SQLObject believes changes have been committed to the database that have not been. The culprit appears to be the standard Z3 editform code, which has been cargo culted into general form.

Instead of aborting the transaction mid request, we need to be able to doom the transaction.

As reference:

OOPS-53B84
OOPS-119B108
OOPS-123A293
OOPS-123B35
OOPS-125B55
OOPS-129D177
OOPS-139D917
OOPS-142A582
OOPS-147C2751
OOPS-149C143
OOPS-149C784

And:
https://lists.ubuntu.com/mailman/private/launchpad/2006-May/008784.html
https://lists.ubuntu.com/mailman/private/launchpad/2006-May/008906.html
https://lists.ubuntu.com/mailman/private/launchpad/2006-May/008938.html

Changed in launchpad:
assignee: nobody → stub
Revision history for this message
Stuart Bishop (stub) wrote :

We need to add some more debugging information to the OOPS reports:

- Ensure the connection doesn't change by logging id(connection_wrapper)
- Log database commits and rollbacks

description: updated
Stuart Bishop (stub)
Changed in launchpad:
status: Unconfirmed → In Progress
Stuart Bishop (stub)
Changed in launchpad:
status: In Progress → Fix Committed
description: updated
Stuart Bishop (stub)
Changed in launchpad:
status: Fix Committed → In Progress
description: updated
description: updated
description: updated
Revision history for this message
Diogo Matsubara (matsubara) wrote :

Recent OOPSes at http://shipit.ubuntu.com/myrequest

OOPS-152B1188
OOPS-152B1199

Revision history for this message
Diogo Matsubara (matsubara) wrote :
Revision history for this message
Diogo Matsubara (matsubara) wrote :
description: updated
description: updated
Revision history for this message
Diogo Matsubara (matsubara) wrote : Re: Possible use of more than one connection ID on a single thread may be the cause of many OOPSes

OOPS-170D339 Recent OOPS with the Connection ID.

Revision history for this message
Steve Alexander (stevea) wrote :

I don't think OOPS-170D339 has the problem. Although the connection id in the traceback is different from that in the statement log, I think this is due to the bug where connection id was used for the statement log, but the cursor id used elsewhere.

This bug has now been fixed in production.

The connection ids in the statement log in this oops report are all the same connection.

Revision history for this message
Christian Reis (kiko) wrote :

Well, if that's the case, then I think we have a more complicated problem than I initially predicted, because that log says:

/*46912674137040*/ INSERT INTO POFile (id...) VALUES (169860, ...)

and then

/*46912674137040*/ SELECT NEXTVAL('POMsgSet_id_seq')
/*46912674137040*/ INSERT INTO POMsgSet (id, ... pofile) VALUES (71049052, ... 169860)

successfully, and then at the end

/*46912674137040*/ SELECT NEXTVAL('POMsgSet_id_seq')
/*46912674137040*/ INSERT INTO POMsgSet (id, ... pofile) VALUES (71049059, ... 169860)

Which, if the IDs make sense, means that we inserted 7 POMsgSets before the last one failed. So some other hypotheses that might explain this problem:

  - Somewhere along the line we are doing an ABORT/ROLLBACK which causes all the work before the POMsgSet insertion to be invalidated. The sequence stays sequential because sequences are transaction independent (right?)
  - The message we are getting is lying to us. In this case it is:

    DETAIL: Key (pofile)=(169860) is not present in table "pofile".

I'll look at the log a bit more.

Revision history for this message
Steve Alexander (stevea) wrote :

There is a rollback. Look at statement 193.

Now we just need to add traceback recording to rollbacks in the logging, and we'll be able to see where the rollback comes from.

Revision history for this message
Christian Reis (kiko) wrote :

I knew it! I looked at the log more carefully, and guess what? When inserting the POMsgSet before the one that fails, we do:

/*46912674137040*/ INSERT INTO POMsgSet (id, ... pofile) VALUES (71049058, ... 169860)
/*46912674137040*/ SELECT ... FROM POMsgSet WHERE id = 71049058
/*46912674137040*/ ROLLBACK

I'm trying to figure out what triggered this rollback. The interesting part is that after the rollback a bunch of queries is reissued, but not the query in which we insert the pofile. So it appears we are holding references over this rollback.

Revision history for this message
Christian Reis (kiko) wrote :

<kiko> SteveA, were rollbacks not being logged before, then?
<kiko> hmmm
<SteveA> it was something stu added after he and i discussed the issue on voip
<SteveA> we guessed that the problem was due to either
<SteveA> 1. spurious rollbacks
<SteveA> 2. more than one connection
<SteveA> so, the logging code was specifically designed to find out if it was one of those possibilities, or something else

Revision history for this message
Christian Reis (kiko) wrote :

Both GeneralForm and SQLObjectEditView issue abort()s internally when WidgetsError is issued. It is somewhat surprising to me that the loop which validates widgets is not done before actually trying to modify data, though I can understand why that may be so (widget validation may depend on previous modifications done as part of the transaction).

Revision history for this message
Steve Alexander (stevea) wrote :

I think the rollback is caused by the form processing machinery. GeneralForm, and the various EditForm and FormView variants all do an abort() if there is a WidgetError.

This is dangerous. I think we implicitly start a new transaction after an abort(). What we should be doing is "dooming" the transaction, so that by the time we get to the end of publication, the publication knows it must not commit the transaction, but instead will always abort it, regardless.

Sometime ago, I proposed a doom() method for the transaction manager API in Zope. We can do an equivalent thing either by extending the TM api or hooking into the publisher.

Revision history for this message
Steve Alexander (stevea) wrote :

We should also log the stack trace when we have ROLLBACKS in OOPS reports, so we can see exactly where these are occuring.

Revision history for this message
Stuart Bishop (stub) wrote : Re: [Bug 44032] Re: Possible use of more than one connection ID on a single thread may be the cause of many OOPSes

Steve Alexander wrote:
> I think the rollback is caused by the form processing machinery.
> GeneralForm, and the various EditForm and FormView variants all do an
> abort() if there is a WidgetError.
>
> This is dangerous. I think we implicitly start a new transaction after
> an abort(). What we should be doing is "dooming" the transaction, so
> that by the time we get to the end of publication, the publication knows
> it must not commit the transaction, but instead will always abort it,
> regardless.
>
> Sometime ago, I proposed a doom() method for the transaction manager API
> in Zope. We can do an equivalent thing either by extending the TM api
> or hooking into the publisher.

Is it obvious what the abort() in EditForm is attempting to do? If it is to
ensure that the transaction is never committed (by aborting the transactions
and deliberately not starting a fresh one), then the bug would be in the
psycopgda because it will automatically start a fresh transaction.

If this is what the abort() is supposed to achieve, I think that doom()
would be good as I can imagine other code that hooks into the transactional
system such as the email delivery subsystem will have the same behaviour.
doom() would ensure that commit() is a noop or raises an exception I assume?

--
Stuart Bishop <email address hidden> http://www.canonical.com/
Canonical Ltd. http://www.ubuntu.com/

Revision history for this message
Steve Alexander (stevea) wrote : Re: Possible use of more than one connection ID on a single thread may be the cause of many OOPSes

Committing a doomed transaction would raise a specific DoomedTransaction exception. The publisher would catch this exception and explicitly abort instead.

Revision history for this message
Diogo Matsubara (matsubara) wrote :

Recent OOPS-194C656

Revision history for this message
Christian Reis (kiko) wrote :

Matsubara, that OOPS appears to be related to a different problem (it reports a duplicated key, and there are no UPDATEs in the SQL log AFAICS)

Revision history for this message
Diogo Matsubara (matsubara) wrote : Re: [Bug 44032] Re: Possible use of more than one connection ID on a single thread may be the cause of many OOPSes

On Fri, Jul 14, 2006 at 04:39:17PM -0000, Christian Reis wrote:
> Matsubara, that OOPS appears to be related to a different problem (it
> reports a duplicated key, and there are no UPDATEs in the SQL log
> AFAICS)

Indeed, you are right. I chatted with Carlos and he confirmed that this is
a race condition on that page. I reported bug 53018 and appended all occurances
of that oops there.

>
--
Diogo M. Matsubara

Revision history for this message
Stuart Bishop (stub) wrote : Re: Possible use of more than one connection ID on a single thread may be the cause of many OOPSes

Steve - should you have this for the doomed transactions?

Changed in launchpad:
assignee: stub → stevea
description: updated
Revision history for this message
Francis J. Lacoste (flacoste) wrote :

This was added to Zope3 (see bug 98382). We should try to backport this change. (Upgrading to a recent Zope3 is bug 145746).

Changed in launchpad:
assignee: stevea → nobody
status: In Progress → Triaged
Revision history for this message
Stuart Bishop (stub) wrote :

Running Zope 3.4 with doom().

We still need to use it in GeneralForm, as it is still calling transaction.abort() when it shouldn't be.

Revision history for this message
Stuart Bishop (stub) wrote :

As is SQLObjectEditView.

Revision history for this message
Christian Reis (kiko) wrote :

Isn't the proper fix to get rid of GeneralFormView and SQLObjectEditView? That means 10 views for the former and 8 for the latter.

Revision history for this message
Guilherme Salgado (salgado) wrote : Re: [Bug 44032] Re: Need doomed transaction API

I have a branch, which I started before the epic and didn't have time to
finish, that gets rid of SQLObjectEditView. I'll try to finish it this
week, and GFV is certainly the next one on my radar.

Stuart Bishop (stub)
Changed in launchpad-foundations:
assignee: nobody → salgado
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.