Use ResultSets in subselects

Bug #337494 reported by Jonathan Lange
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Storm
Fix Released
Wishlist
Jamu Kakar

Bug Description

I think it would be a good idea for is_in to accept a ResultSet and use that ResultSet to perform a query with a subselect.

Conceptually, I'd like to say foo.is_in(result_set.values(Bar.id)) and I'd like that to do a subselect. Perhaps something like foo.is_in(result.to_subselect()) would be a better way of saying this.

You can already do this by cheating. You can call _get_select() on the result set, and set 'columns' to the id column.

Related branches

Jamu Kakar (jkakar)
Changed in storm:
assignee: nobody → jkakar
importance: Undecided → Wishlist
status: New → In Progress
Revision history for this message
Jamu Kakar (jkakar) wrote :

I've added a select method to ResultSet (and EmptyResultSet), so you
can do this:

result1 = store.find(Foo, Foo.id < 50)
result2 = store.find(Foo, Foo.id.is_in(result1.select(Foo.id)))

Revision history for this message
Christopher Armstrong (radix) wrote :

I've heard there's some concern about the naming of this method because it might be confused with the other "find" method on ResultSets (which doesn't exist yet but will soon).

How about "as_select"?

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Agreed about the naming.. it also clashes a bit with what result.values(Foo.id) does.

Revision history for this message
Jonathan Lange (jml) wrote :

Call it Capulet if you'd like. :)

I don't fully understand the "clash" comment. What needs to change about the patch?

Revision history for this message
Jamu Kakar (jkakar) wrote :

I don't understand the comment about clashing with .values either.
I explicitly implemented the API to be conceptually similar to
ResultSet.values. For example:

result = store.find(...).values(MyClass.foo)
subselect = store.find(...).select(MyClass.foo)

In both cases you're "specializing" the result set to give you
MyClass.foo values. The difference is that in the first case you're
mutating the ResultSet and in the second case you're getting a
Select expression. Both the mutated ResultSet and the Select
expression will yield the same values, they just operate
differently.

Anyway, that said, I'd be happy to change the name to something
else. But, since there seems to be resistance to the changes here
I'm wondering about creating a Subselect expression that works
something like this:

result = store.find(MyClass)
subselect = Subselect(result, MyClass.foo)

That would keep the ResultSet API the way it is, but still provide a
fairly straight-forward way to create a Subselect from a ResultSet.
We could implement Subselect to work for the simple cases we have
now, and in time extend it to deal with the hairier cases. When we
determine that Subselect does all the things we want it to, we can
revisit the ResultSet API to make the feature more friendly to use.

How does that sound as a way to move forward?

Revision history for this message
Jonathan Lange (jml) wrote :

I don't know the reasons for the resistance, which limits my ability to comment intelligently.

However, that compromise would be acceptable for my purposes.

Revision history for this message
Tim Penhey (thumper) wrote :

I have just come up with another use case that jamesh suggested I add to this bug report.

I'm wanting to do an insert based on a select result.

INSERT INTO SomeTable (<some fields>)
SELECT <some fields>, <some constants>
FROM <some other tables>

Jamu Kakar (jkakar)
Changed in storm:
milestone: none → 0.17
Revision history for this message
James Henstridge (jamesh) wrote :

Since it has been a while, I thought it'd be useful to summarise what has been proposed to implement this feature. There have been three possible approaches proposed so far

1. a ResultSet.select(*columns) method that returns a query expression that returns the given columns for the rows contained in the result set. There are arguments against using "select" as the method name, and no real consensus on an alternative.

2. a ResultSelect(result, *columns) expression class. This was implemented as a subclass of Select(), but that doesn't really fit well with set expression ResultSets. Fixing that would basically leave us with a utility function that returns an expression, which is roughly equivalent to (1).

3. make ResultSet itself an expression object. This removes the need for an intermediate call, but doesn't let you specify the list of columns for the resulting expression.

Given that the column list for many result sets are not particularly useful (usually every column in a table in a random order), and that most use cases involve a subset of columns (e.g. just grabbing the primary key), I don't like the third option personally.

Jamu suggested making the existing ResultSet.values() method return a new result set instead of a plain iterator as a way to specify the column list in combination with (3). This would make the commonly used pattern close to that of (1), but still leaves the door open to improperly using certain ResultSets as expressions.

My personal preference is for (1), but I am still not sure what name would be appropriate.

Revision history for this message
Jamu Kakar (jkakar) wrote :

My initial preference was for ResultSet to be an Expr, but after
discussing the issue with James on IRC, I now have the same opinion,
that option 1 is the simplest/cleanest. I propose we call it
get_select (or maybe get_select_expr) and get on with life. :)

Jamu Kakar (jkakar)
Changed in storm:
status: In Progress → Fix Committed
Jamu Kakar (jkakar)
Changed in storm:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.