SQLObject executing spurious COUNT(*) using slices

Bug #4818 reported by Stuart Bishop
4
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Medium
James Henstridge

Bug Description

>>> r = Person.select(orderBy='id')
>>> len(r[:5])

At this point, the following query is executed:

SELECT COUNT(*) FROM Person WHERE 1 = 1

This is causing us timeout errors

>>> list(r[:5])

This causes the following two queries to be executed:

SELECT Person.* FROM Person WHERE 1 = 1 ORDER BY id LIMIT 5
SELECT COUNT(*) FROM Person WHERE 1 = 1

In the case of PersonSet.topPeople(), the first query with the LIMIT is fast. The second spurious query causes timeouts (as it involves doing full table scans of the EmailAddress table).

Revision history for this message
Stuart Bishop (stub) wrote :

Assigning to spiv - I need a guess on how fixable this is. If it is going to be difficult or time consuming, we can work around this particular case and others as we track them down.

Changed in launchpad:
assignee: nobody → spiv
Revision history for this message
Andrew Bennetts (spiv) wrote :

It looks a bit tricky.

The implementation of SelectResults.__len__ in our SQLObject invokes SelectResults.count (upstream doesn't do this). SelectResults.count (in both ours and latest upstream) does the COUNT without limit, then manually subtracts if there's a limit or offset (as is the case when slicing with [:5]).

I'm starting to wonder if we should reconsider the convenience of implementing __len__ (upstream certainly doesn't like it)...

A workaround for us may be to change SelectResults.count to do len([x for x in self]) if self.ops['start'] or self.ops['end'] is set. That will still issue a redundant query, but from what you're saying it will at least be cheaper. It might adversely impact other cases, though, although I'd expect all the objects it constructs would be constructed and cached anyway.

An alternative may be a flag to disable __len__ for a particular SelectResults instance? I don't really like this idea, though.

Perhaps we could make SQLObject cache this information, but doing that 100% safely would be a fair bit of work.

Revision history for this message
Christian Reis (kiko) wrote :

What are the benefits we get from len() working, apart from the syntactical benefit of being able to do len(results) instead of results.count()?

What stops us from changing count() to issue LIMITs when slicing?

I think the answer to the first of those questions will guide us in deciding whether to nuke, work around or cache. :-)

Revision history for this message
Guilherme Salgado (salgado) wrote : Re: [Bug 4818] SQLObject executing spurious COUNT(*) using slices

> What are the benefits we get from len() working, apart from the
> syntactical benefit of being able to do len(results) instead of
> results.count()?
>

The __len__ method allows us to use a SelectResults object in a boolean
context, which I find very handy, specially in page templates.

Revision history for this message
Christian Reis (kiko) wrote :

On Fri, Nov 25, 2005 at 09:03:06PM -0000, Guilherme Salgado wrote:
> The __len__ method allows us to use a SelectResults object in a boolean
> context, which I find very handy, specially in page templates.

Well, it also hides the fact that it issues a database query, which we
have found is dangerous.

Not to say I don't see your use case clearly:

    if results:
        for item in results:
            do_something
    else:
        print "there was nothing there"

I guess the only real way to keep this working the way you expect would
be caching.

The alternative, removing __len__, implies to me forcing the callsite to
convert the object to a list, always, before sending it into the
template. Is that a very big deal?

Revision history for this message
Stuart Bishop (stub) wrote :

Guilherme Salgado wrote:

>>What are the benefits we get from len() working, apart from the
>>syntactical benefit of being able to do len(results) instead of
>>results.count()?
>
> The __len__ method allows us to use a SelectResults object in a boolean
> context, which I find very handy, specially in page templates.

We should define a __nonzero__ method for truth testing. This will still
require a DB query, but it can be optimized using LIMIT 1.

--
Stuart Bishop <email address hidden> http://www.canonical.com/
Canonical Ltd. http://www.ubuntu.com/

Dafydd Harries (daf)
Changed in launchpad:
status: New → Accepted
Revision history for this message
Christian Reis (kiko) wrote :

Fixed when James removed __len__ from SelectResults, I believe.

Changed in launchpad:
assignee: spiv → jamesh
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.