LibraryFileAlias.last_accessed is not being updated

Bug #86171 reported by Stuart Bishop
16
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Low
Stuart Bishop

Bug Description

We had a traceback (see original description, though that traceback is ancient) which caused us to disable updates of last_access. Do we still need last_access?

(additional historical note: an old updated description also included this:

This causes UploadFailed exception (example: OOPS-414B1001, OOPS-414B1104) and HTTPError (examples: OOPS-414C1127, OOPS-414C1064)
on +filebug with apport attached.
)

Recent OOPS: OOPS-1695K467

Related branches

Revision history for this message
Stuart Bishop (stub) wrote :
Download full text (3.5 KiB)

The trigger this time was a SERIALIZATION exception, which we can expect to happen regularly. I think the best fix for the trigger is to not run the librarian in the SERIALIZABLE transaction isolation, and fixing this is critical. After that, we can look at making the worker threads more robust in handling unexpected exceptions.

2007/02/18 19:14 GMT [-] Unhandled Error
        Traceback (most recent call last):
          File "/usr/lib/python2.4/threading.py", line 422, in run
            self.__target(*self.__args, **self.__kwargs)
          File "/srv/librarian.ubuntu.com/launchpad/lib/twisted/python/threadpool.py", line 148, in _worker
            context.call(ctx, function, *args, **kwargs)
          File "/srv/librarian.ubuntu.com/launchpad/lib/twisted/python/context.py", line 59, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/srv/librarian.ubuntu.com/launchpad/lib/twisted/python/context.py", line 37, in callWithContext
            return func(*args,**kw)
        --- <exception caught here> ---
          File "/srv/librarian.ubuntu.com/launchpad/lib/twisted/internet/threads.py", line 25, in _putResultInDeferred
            result = f(*args, **kwargs)
          File "/srv/librarian.ubuntu.com/launchpad/lib/canonical/librarian/web.py", line 88, in _getFileAlias
            commit()
          File "/srv/librarian.ubuntu.com/launchpad/lib/canonical/database/sqlbase.py", line 611, in commit
            ZopelessTransactionManager._installed.commit()
          File "/srv/librarian.ubuntu.com/launchpad/lib/canonical/database/sqlbase.py", line 339, in commit
            self.manager.get().commit()
          File "/srv/librarian.ubuntu.com/launchpad/lib/transaction/_transaction.py", line 380, in commit
            self._saveCommitishError() # This raises!
          File "/srv/librarian.ubuntu.com/launchpad/lib/transaction/_transaction.py", line 378, in commit
            self._commitResources()
          File "/srv/librarian.ubuntu.com/launchpad/lib/transaction/_transaction.py", line 436, in _commitResources
            rm.tpc_vote(self)
          File "/srv/librarian.ubuntu.com/launchpad/lib/transaction/_transaction.py", line 648, in tpc_vote
            self._datamanager.prepare(transaction)
          File "/srv/librarian.ubuntu.com/launchpad/lib/sqlos/transaction/__init__.py", line 159, in prepare
            obj.sync()
          File "/srv/librarian.ubuntu.com/launchpad/lib/sqlobject/main.py", line 704, in sync
            self.syncUpdate()
          File "/srv/librarian.ubuntu.com/launchpad/lib/sqlobject/main.py", line 723, in syncUpdate
            self._connection._SO_update(self, values)
          File "/srv/librarian.ubuntu.com/launchpad/lib/sqlobject/dbconnection.py", line 649, in _SO_update
            self.query("UPDATE %s SET %s WHERE %s = %s" %
          File "/srv/librarian.ubuntu.com/launchpad/lib/sqlobject/dbconnection.py", line 865, in query
            return self._dbConnection._query(self._connection, s)
          File "/srv/librarian.ubuntu.com/launchpad/lib/sqlobject/dbconnection.py", line 394, in _query
            self._executeRetry(conn, conn.cursor(), ...

Read more...

Changed in launchpad:
importance: Undecided → Critical
status: Unconfirmed → Confirmed
Revision history for this message
Stuart Bishop (stub) wrote :

I think a SERIALIZATION error can only occur when the librarian garbage collector is running, so I have disabled it for the time being. It should be reenabled once the first stage of this fix is cherry picked into production.

Andrew Bennetts (spiv)
Changed in launchpad:
assignee: nobody → spiv
Revision history for this message
Andrew Bennetts (spiv) wrote :

The problem just occured again. I've bounced the librarian again, but the time between failures is alarmingly short (just a couple of hours).

Revision history for this message
Stuart Bishop (stub) wrote :

I've just commented out the call to alias.update_last_accessed on the production librarian and restarted it.

Revision history for this message
Andrew Bennetts (spiv) wrote :

I think this patch will fix the bug in the ZopelessTransactionManager:

=== modified file 'lib/canonical/database/sqlbase.py'
--- lib/canonical/database/sqlbase.py 2007-02-03 19:10:03 +0000
+++ lib/canonical/database/sqlbase.py 2007-02-19 04:18:51 +0000
@@ -336,13 +336,14 @@
         self.set_isolation_level(self.desc.isolation)

     def commit(self):
- self.manager.get().commit()
-
- # We always remove the existing transaction & connection, for
- # simplicity. SQLObject does connection pooling, and we don't have any
- # indication that reconnecting every transaction would be a performance
- # problem anyway.
- self.desc._deactivate()
+ try:
+ self.manager.get().commit()
+ finally:
+ # We always remove the existing transaction & connection, for
+ # simplicity. SQLObject does connection pooling, and we don't have any
+ # indication that reconnecting every transaction would be a performance
+ # problem anyway.
+ self.desc._deactivate()

         if self.implicitBegin:
             self.begin()

description: updated
description: updated
Revision history for this message
Andrew Bennetts (spiv) wrote :

Things to do from here:
  * write a test to demonstrate that the patch fixes what I think it does
  * try it on the production librarian with the librarian-gc and last accessed time updates enabled, and check that it doesn't get stuck in again (even if an individual transaction does fail).

Revision history for this message
Stuart Bishop (stub) wrote : Re: [Bug 86171] Re: Exception in Librarian caused outage

Andrew Bennetts wrote:

> * try it on the production librarian with the librarian-gc and last accessed time updates enabled, and check that it doesn't get stuck in again (even if an individual transaction does fail).

librarian-gc is already back on

--
Stuart Bishop <email address hidden> http://www.canonical.com/
Canonical Ltd. http://www.ubuntu.com/

Revision history for this message
Andrew Bennetts (spiv) wrote : Re: Exception in Librarian caused outage

Ok, this is very weird.

In the process of writing a test for this problem, I get this weird behaviour:

https://devpad.canonical.com/~andrew/paste/file2AeTTn.html

i.e. "del some_dict[some_int]" doesn't. (but repeating it does).

This happens repeatedly at the exact same point in the code every run.

With jml's help we've discovered that using the same instance of the int as reported by the dict's keys() doesn't have this behaviour, but a freshly constructed one from a literal or from thread.get_ident() will cause the buggy behaviour.

So progress on this is stalled until I can figure out why. Obviously if a key dict can't reliably have items added and removed, thing are going to go wonky.

Revision history for this message
Andrew Bennetts (spiv) wrote :

It turns out the weird behaviour is because an sqlobject.dbconnection.Transaction.__del__ is getting triggered by deleting that key from the dictionary, which in turn inserts a new value with that key. So I'm not totally baffled anymore, although making sure the various interacting transaction management layers work nicely together is likely to be a bit of a pain.

Andrew Bennetts (spiv)
Changed in launchpad:
status: Confirmed → In Progress
Revision history for this message
Andrew Bennetts (spiv) wrote :

The workaround (disabling last accessed time updates?) seems to have stopped working around the problem. I've had to restart the librarian twice today due to this problem.

Revision history for this message
Stuart Bishop (stub) wrote : Re: [Bug 86171] Re: Exception in Librarian caused outage

Andrew Bennetts wrote:
> The workaround (disabling last accessed time updates?) seems to have
> stopped working around the problem. I've had to restart the librarian
> twice today due to this problem.
>

That would appear to be because the fix has reverted - it was only ever
applied to the live production system (as per
https://bugs.launchpad.net/launchpad/+bug/86171/comments/4).

--
Stuart Bishop <email address hidden> http://www.canonical.com/
Canonical Ltd. http://www.ubuntu.com/

Revision history for this message
Stuart Bishop (stub) wrote : Re: Exception in Librarian caused outage

I've reapplied the fix to the production system. I'm sent a merge request to pqm, but I have no idea if this fix breaks tests so it may well be rejected.

Revision history for this message
Andrew Bennetts (spiv) wrote :

This is important, but the workaround is working, so I don't think it's critically important. If someone disagrees they are welcome to bump it up again.

I still intend to fix this fairly soon, but it's not my most immediate priority.

Changed in launchpad:
importance: Critical → High
Revision history for this message
Stuart Bishop (stub) wrote :

It is no longer critical now that the fix has landed in launchpad/devel

Changed in launchpad:
assignee: spiv → nobody
status: In Progress → Triaged
Revision history for this message
Christian Reis (kiko) wrote :

Hoping for 1.2.5 for a post-Storm fix to reenable updating of last_accessed (as we're expecting Storm will allow this to work properly without SQLObject misfeatures)

Changed in launchpad:
assignee: nobody → stub
importance: High → Medium
milestone: none → 1.2.5
Revision history for this message
Stuart Bishop (stub) wrote :

Should go away with Storm, at which point we can reenable the timestamp update.

Changed in launchpad:
assignee: stub → nobody
milestone: 1.2.5 → none
Revision history for this message
Robert Collins (lifeless) wrote :

Stuart - we have storm - are we still not updating last access? IIRC we're doing log grovelling now, rather than in-line updates?

visibility: private → public
Changed in launchpad-foundations:
status: Triaged → Incomplete
Revision history for this message
Stuart Bishop (stub) wrote :

We are not updating LibraryFileAlias.last_access at the moment.

I don't know if we need this feature or not - the only use I'm aware was to clean out unused files from the Librarian, but I suspect we can't do that for paranoid reasons.

Changed in launchpad-foundations:
status: Incomplete → Triaged
summary: - Exception in Librarian caused outage
+ LibraryFileAlias.last_accessed is not being updated
Gary Poster (gary)
description: updated
description: updated
Changed in launchpad:
importance: Medium → Critical
Revision history for this message
William Grant (wgrant) wrote :

This is no longer an OOPS, and we have the log-based method to do it instead. I'm not sure if we want to keep this field at all.

tags: removed: oops
Changed in launchpad:
importance: Critical → Low
Curtis Hovey (sinzui)
Changed in launchpad:
status: Triaged → Won't Fix
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
assignee: nobody → Stuart Bishop (stub)
tags: added: qa-needstesting
Changed in launchpad:
status: Won't Fix → Fix Committed
Stuart Bishop (stub)
tags: added: qa-ok
removed: qa-needstesting
Steve Kowalik (stevenk)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.