Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Alikkin
Starting Member
6 Posts |
Posted - 2007-05-02 : 15:53:00
|
Given a table:CREATE TABLE #tempScores ( , userId varchar(9) , score_model varchar(15) , score int)Where no columns are unique, and records could be duplicates, how would one (preferably with a single query) get the userId of the record with the highest, unique (by user) score? For example:User 1: Score A: 785 Score B: 779 Score B: 779 Score C: 760User 2: Score B: 783 Score C: 771 Score A: 768 Score A: 768User 3: Score C: 785 Score A: 779 Score B: 763 Score B: 760In this case, I'm looking for user 1. Both User 1 and User 3 have the highest score (this eliminates user 2). So we look at their next highest score, this too is the same (779). So we go to the next highest, this is finally different, and it's user 1 with the higher.My initial attempt at a solution was as such: SET @outerScore = ( SELECT TOP 1 score FROM #tempScores GROUP BY score ORDER BY CASE @scoreType WHEN 'highest' THEN (0 - score) ELSE score END ) -- Walk the tree, comparing scores until one is higher, use it. If none are found, use the @outerScoreSET @decidingScore = ISNULL(( SELECT TOP 1 score FROM #tempScores WHERE userId IN ( SELECT userId FROM #tempScores WHERE score = @outerScore ) GROUP BY score, userId HAVING COUNT(score) = 1 ORDER BY score DESC), @outerScore) -- Select the user that has both the @outerScore, and @decidingScore in it. It's irrelevant if more than one have bothSELECT TOP 1 userIdFROM #tempScores cs1 JOIN #tempScores cs2 ON cs1.userId = cs2.userIdWHERE cs1.score = @decidingScore AND cs2.score = @outerScore But that ignored the possibility of a single user having the same score twice. Moreover, it also opened the door to selecting the wrong user if two users had the deciding score, as would be the case above with users 1 and 3 (779).Also this can have infinite scores, for infinite users with the count of scores between users not necessarily being equal.Any thoughts on how to do this with (preferably) a single query? Any thoughts on how to do it otherwise?Thanks in advance. |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-05-02 : 16:32:13
|
[code]-- prepare sample datadeclare @scores table (userid varchar(9), model varchar(15), score int)insert @scoresselect '1', 'a', 785 union allselect '1', 'b', 779 union allselect '1', 'b', 779 union allselect '1', 'c', 760 union allselect '2', 'b', 783 union allselect '2', 'c', 771 union allselect '2', 'a', 768 union allselect '2', 'a', 768 union allselect '3', 'c', 785 union allselect '3', 'a', 779 union allselect '3', 'b', 763 union allselect '3', 'b', 760-- stage the datadeclare @stage table (userid varchar(9))insert @stageselect useridfrom @scoresgroup by userid-- initialize some controldeclare @score intselect @score = max(score)from @scoreswhile (select count(*) from @stage) > 1 begin delete s from @stage as s where s.userid not in (select d.userid from @scores as d where d.score = @score) select @score = max(s.score) from @scores as s inner join @stage as z on z.userid = s.userid where s.score < @score end-- show the expected outputselect useridfrom @stage[/code]Peter LarssonHelsingborg, Sweden |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-05-02 : 16:57:02
|
The topic should read "Highest consecutive unique score".Peter LarssonHelsingborg, Sweden |
 |
|
Alikkin
Starting Member
6 Posts |
Posted - 2007-05-02 : 17:02:49
|
Unfortunately this doesn't work. It returns User 3, because it's deleting all rows where the score is 779, when it only needs to delete two (only one of User1's matching two, rather than both).Good thought though. You're right about the topic... it's not very accurate. |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-05-02 : 18:08:52
|
Late snack made me think clearer...-- prepare sample datadeclare @scores table (userid varchar(9), model varchar(15), score int)insert @scoresselect '1', 'a', 785 union allselect '1', 'b', 779 union allselect '1', 'b', 779 union allselect '1', 'c', 760 union allselect '2', 'b', 783 union allselect '2', 'c', 771 union allselect '2', 'a', 768 union allselect '2', 'a', 768 union allselect '3', 'c', 785 union allselect '3', 'a', 779 union allselect '3', 'b', 763 union allselect '3', 'b', 760-- stage the datadeclare @stage table (recid int identity, userid varchar(9), score int)insert @stageselect userid, scorefrom @scoresorder by userid, score desc-- initialize some controldeclare @score intselect @score = max(score)from @stagewhile @score is not null begin delete s from @stage as s where s.userid not in (select z.userid from @stage as z where z.score = @score) if (select count(distinct userid) from @stage where score = @score) = 1 break delete s from @stage as s where s.recid in ( select min(recid) as recid from @stage where score = @score group by userid ) set @score = null select @score = max(score) from @stage end-- show the expected outputselect distinct useridfrom @stagewhere score = coalesce(@score, score + 1) Peter LarssonHelsingborg, Sweden |
 |
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2007-05-02 : 20:41:51
|
quote: Where no columns are unique, and records could be duplicates, how would one (preferably with a single query) get the userId of the record with the highest, unique (by user) score?
In that scenario, your data is more or less garbage. You really need to have at least some sort of primary key on your table, otherwise you cannot write efficient or accurate SQL statements. There's definitely no way you can do it in 1 SELECT, you'd need to move it to another place and assign at least a "fake" primary key like Peso did. Are you sure you cannot put proper constraints on your data? How do you know if two rows that say the same thing aren't duplicates or errors? Are you sure that there are no additional columns that will help uniquely identify each row of your data? (remember, a PK can be a combination of multiple columns).I am sure we can do it with a single SELECT, but not unless the data is structured properly.- Jeffhttp://weblogs.sqlteam.com/JeffS |
 |
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2007-05-02 : 20:48:11
|
Using Peso's @stage table variable, which has a nice primary key, we can do it in 1 SELECT something like this:select top 1 min(userID) as UserID, score, rankfrom( select s1.userID, s1.score, (select count(*) from @stage s2 where s2.userID = s1.userID and (s2.score > s1.score or (s1.score = s2.score and s1.recID > s2.recID) ) ) as Rank from @stage s1 ) xwhere x.userID in (select distinct userID from @stage s3 where score = (select max(score) from @stage s4))group by score, rankhaving count(*) = 1order by score desc - Jeffhttp://weblogs.sqlteam.com/JeffS |
 |
|
Alikkin
Starting Member
6 Posts |
Posted - 2007-05-03 : 09:56:14
|
Thanks guys, I'll take a look at your suggestions after my morning meeting.To answer your questions jsmith8858, it's a bit of a legacy system, and being merely a developer I don't have the permissions to change the structure of the system. (Futher, the original designer of the system has a touchy ego on changing her architecture). And while there are other columns in the table, they are in no way reliable for creating a unique record, (not even the DateTime stamp), in fact most are NULL. That being said, the data IS being thrown into a temp table (this is a small part of a much larger query) and I could create a identity field for it pretty easily. I wish this were SQL 2005, as DENSE_RANK seems like it could be very useful in this scenario as well.Thanks again for your help. |
 |
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2007-05-03 : 10:13:48
|
yes, Rank would help a lot if you had SQL 2005. I am using the pre-SQL 2005 method, which is very inefficient, to calculate a Rank, which is pretty long and complicated (especially when you have to deal with ties). If you replace that whole part with a simple RANK() function in SQL 2005, the whole thing becomes quite simple.- Jeffhttp://weblogs.sqlteam.com/JeffS |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-05-03 : 10:35:24
|
There is a way to calculate the rank in SQL 2000...-- prepare sample datadeclare @scores table (userid varchar(9), model varchar(15), score int)insert @scoresselect '1', 'a', 785 union allselect '1', 'b', 779 union allselect '1', 'b', 779 union allselect '1', 'c', 760 union allselect '2', 'b', 783 union allselect '2', 'c', 771 union allselect '2', 'a', 768 union allselect '2', 'a', 768 union allselect '3', 'c', 785 union allselect '3', 'a', 779 union allselect '3', 'b', 763 union allselect '3', 'b', 760-- stage the datadeclare @stage table (recid int identity, userid varchar(9), score int, rank int)insert @stage (userid, score)select userid, scorefrom @scoresorder by userid, score desc-- create rankingupdate sset s.rank = s.recid - x.recid -- ADD +1 IF YOU WANT 1-based rank instead of 0-based rankfrom @stage as sinner join ( select userid, min(recid) as recid from @stage group by userid ) as x on x.userid = s.userid-- Now put Jeff's query here, since rank is pre-calculatedselect top 1 min(s1.userID) as UserID, s1.score, s1.rankfrom @stage as s1where s1.userID in (select distinct userID from @stage s3 where score = (select max(score) from @stage s4))group by s1.score, s1.rankhaving count(*) = 1order by s1.score desc Peter LarssonHelsingborg, Sweden |
 |
|
|
|
|
|
|