Re: SQL Puzzle - too many dimensions

From:

Robert Klemme <shortcutter@googlemail.com>

Newsgroups:

comp.lang.java.programmer

Date:

Sun, 08 May 2011 23:37:11 +0200

Message-ID:

<92ogohFnodU1@mid.individual.net>

On 08.05.2011 16:14, Arved Sandstrom wrote:

On 11-05-08 08:03 AM, Robert Klemme wrote:

On 08.05.2011 06:02, Arved Sandstrom wrote:

On 11-05-07 08:13 PM, Zapanaz wrote:

I know this is a java group, not a SQL group, I am trying to bounce
this off as many heads as I can and most Java programmers I know are
good at SQL.

So I've got a table, song_versions, in an Oracle database. It has
columns like this

SONG_ID | SONG_VERSION_ID | RATING |

SONG_ID is a key into the table songs. A song could be something like
this, 'Money (That's What I Want)' by Berry Gordie

http://www.youtube.com/watch?v=z6xkT7FMyTc

And I have three versions of this song, one by Barrett Strong, one by
The Beatles, and one by The Rolling Stones.

I also have another song, 'Butterfly Caught' by Massive Attack. I only
have the one version of this song.

So my table looks like this

SONG_ID | SONG_VERSION_ID | RATING |
Money (That's What I Want) | Barrett Strong | 3 |
Money (That's What I Want) | Rolling Stones | 2 |
Money (That's What I Want) | Beatles | 1 |
Butterfly Caught | Massive Attack | 1 |

The actual values for the ID columns in the table are integers, this
is just for clarity.

My job is to come up with one SQL query which will return the
highest-rated version of each song.

So for Money, the highest rated version is the Barrett Strong
version. For Butterfly Caught, there is only one version, so I want to
return that, the Massive Attack version. So I would want my result to
be the Barrett Strong version of Money and the Massive Attack version
fo Butterfuly Caught.

I can't think of a way to do it.

To get the MAX(Rating), I would have to GROUP BY song_id. So I can get
the numerical highest rating for each song, I can get a

song_id | rating

result set, but what I need is the song_version_id corresponding to
that rating.

Or I don't know, maybe something like

select song_version_id
   from song_versions
   where rating =
    (select max(rating)
      from song_versions
      where song_id = [??? I don't know ... the same song ID I'm talking
about in the outer query]
    )

If I group by song_id, I lose the song_version_id.

(Some DBs maybe I could cheat and return columns that aren't group by
columns, but oracle won't let me pull that one.

I just tried this in MySql

select song_id, song_version_id, max(rating)
   from song_versions
   group by song_id

and it worked, I got the Barrett Strong song_version_id, but then
there's nothing in my query that actually gaurantees that result, so I
don't know if I would want to depend on that in real code.

But anyway, this is in an Oracle DB, and they don't let you include
columns in a group-by query that aren't explicitly group-by that way.
)

You can try something like

select s.song_key, s.rating
from (
select song_key, max(rating) max_rating
from songs group by song_key
) mx inner join songs s
on s.song_key = mx.song_key and s.rating = mx.max_rating;

Just an additional note: This is only necessary if you want to pull more
columns from s. If query is only interested in s.song_key, s.rating you
only need the inline view's query.

That's correct - thanks for the elaboration. The full SQL I provided
above is as you said, more general purpose.

Actually I (or we) overlooked that OP wanted the song_version_id - that
would be one additional column. So that then would be

select s.song_version_id, s.rating
from (
      select song_id, max(rating) max_rating
      from songs
      group by song_id
) mx
inner join songs s
on s.song_id = mx.song_id
and s.rating = mx.max_rating;

For that example I presupposed a table with columns like

SONG_ID SONG_KEY RATING

where the SONG_ID (for my own sanity) is simply a primary Key "id",
SONG_KEY is like your "SONG_ID", and RATING is "RATING".

On Oracle you could also do something like

select distinct song_key, max(rating) over (partition by song_key)
max_rating from songs;

which makes use of Oracle analytic functions.

This should also work on PostgreSQL.

Good to know. I'm used to the Oracle ones, but I'm certainly no DBA in
general. Now that you mentioned this, I researched things some. It looks
like not only Oracle but also DB2 have had a rich set of analytic
functions for quite a long time (I think DB2 calls them OLAP functions),
SQL Server has been incorporating them since SQL Server 2005 (SQL Server
2008 presumably better; I think SQL Server calls these ranking
functions), and PostgreSQL has them now also (generally speaking ANSI
SQL calls these things window functions; PostgreSQL 8.4 introduced
support and PostgreSQL 9.x is presumably better at it).

Actually SQL Server's SQL engine isn't too bad. With CTE (common table
expression, Oracle has it, too, but calls it "subquery factoring clause"
- and it has slightly different functionality IIRC [1]) you could do
recursive queries (since 2005 IIRC) that in Oracle you needed non
standard CONNECT BY syntax for.

I use PostgreSQL regularly on my MacBook, but I've been stuck on 8.3 for
a few years.

MySQL I know very little about. The few articles I located on MySQL and
analytic functions gave me the impression that the support is poor or
missing, but perhaps I am wrong.

Same here but I wonder what the optimizer can do with arbitrary storage
engines. But this is a completely different topic...

Cheers

robert

[1]
http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/statements_10002.htm#i2161315

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/