Highest Non-Unique (by user) Score

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

Highest Non-Unique (by user) Score

Author

Topic

Alikkin
Starting Member

6 Posts

Posted - 2007-05-02 : 15:53:00

Given a table:

CREATE TABLE #tempScores (
, userId varchar(9)
, score_model varchar(15)
, score int
)

Where no columns are unique, and records could be duplicates, how would one (preferably with a single query) get the userId of the record with the highest, unique (by user) score? For example:

User 1:
Score A: 785
Score B: 779
Score B: 779
Score C: 760
User 2:
Score B: 783
Score C: 771
Score A: 768
Score A: 768
User 3:
Score C: 785
Score A: 779
Score B: 763
Score B: 760

In this case, I'm looking for user 1. Both User 1 and User 3 have the highest score (this eliminates user 2). So we look at their next highest score, this too is the same (779). So we go to the next highest, this is finally different, and it's user 1 with the higher.

My initial attempt at a solution was as such:

SET @outerScore = ( SELECT TOP 1
score
FROM #tempScores
GROUP BY score
ORDER BY CASE @scoreType
WHEN 'highest' THEN (0 - score)
ELSE score END )

-- Walk the tree, comparing scores until one is higher, use it. If none are found, use the @outerScore
SET @decidingScore = ISNULL(( SELECT TOP 1
score
FROM #tempScores
WHERE userId IN ( SELECT userId
FROM #tempScores
WHERE score = @outerScore )
GROUP BY score, userId
HAVING COUNT(score) = 1
ORDER BY score DESC), @outerScore)

-- Select the user that has both the @outerScore, and @decidingScore in it. It's irrelevant if more than one have both
SELECT TOP 1
userId
FROM #tempScores cs1
JOIN #tempScores cs2 ON cs1.userId = cs2.userId
WHERE cs1.score = @decidingScore
AND cs2.score = @outerScore

But that ignored the possibility of a single user having the same score twice. Moreover, it also opened the door to selecting the wrong user if two users had the deciding score, as would be the case above with users 1 and 3 (779).

Also this can have infinite scores, for infinite users with the count of scores between users not necessarily being equal.

Any thoughts on how to do this with (preferably) a single query? Any thoughts on how to do it otherwise?

Thanks in advance.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-05-02 : 16:32:13

[code]-- prepare sample data
declare @scores table (userid varchar(9), model varchar(15), score int)

insert @scores
select '1', 'a', 785 union all
select '1', 'b', 779 union all
select '1', 'b', 779 union all
select '1', 'c', 760 union all
select '2', 'b', 783 union all
select '2', 'c', 771 union all
select '2', 'a', 768 union all
select '2', 'a', 768 union all
select '3', 'c', 785 union all
select '3', 'a', 779 union all
select '3', 'b', 763 union all
select '3', 'b', 760

-- stage the data
declare @stage table (userid varchar(9))

insert @stage
select userid
from @scores
group by userid

-- initialize some control
declare @score int

select @score = max(score)
from @scores

while (select count(*) from @stage) > 1
begin
delete s
from @stage as s
where s.userid not in (select d.userid from @scores as d where d.score = @score)

select @score = max(s.score)
from @scores as s
inner join @stage as z on z.userid = s.userid
where s.score < @score
end

-- show the expected output
select userid
from @stage[/code]

Peter Larsson
Helsingborg, Sweden

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-05-02 : 16:57:02

The topic should read "Highest consecutive unique score".

Peter Larsson
Helsingborg, Sweden

Alikkin
Starting Member

6 Posts

Posted - 2007-05-02 : 17:02:49

Unfortunately this doesn't work. It returns User 3, because it's deleting all rows where the score is 779, when it only needs to delete two (only one of User1's matching two, rather than both).

Good thought though.

You're right about the topic... it's not very accurate.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-05-02 : 18:08:52

Late snack made me think clearer...

-- prepare sample data
declare @scores table (userid varchar(9), model varchar(15), score int)

insert	@scores
select	'1', 'a', 785 union all
select	'1', 'b', 779 union all
select	'1', 'b', 779 union all
select	'1', 'c', 760 union all
select	'2', 'b', 783 union all
select	'2', 'c', 771 union all
select	'2', 'a', 768 union all
select	'2', 'a', 768 union all
select	'3', 'c', 785 union all
select	'3', 'a', 779 union all
select	'3', 'b', 763 union all
select	'3', 'b', 760

-- stage the data
declare @stage table (recid int identity, userid varchar(9), score int)

insert		@stage
select		userid,
		score
from		@scores
order by	userid,
		score desc

-- initialize some control
declare	@score int

select	@score = max(score)
from	@stage

while @score is not null
	begin
		delete	s
		from	@stage as s
		where	s.userid not in (select z.userid from @stage as z where z.score = @score)

		if (select count(distinct userid) from @stage where score = @score) = 1
			break

		delete	s
		from	@stage as s
		where	s.recid in	(	
						select		min(recid) as recid
						from		@stage
						where		score = @score
						group by	userid
					)

		set @score = null

		select	@score = max(score)
		from	@stage
	end

-- show the expected output
select distinct	userid
from		@stage
where		score = coalesce(@score, score + 1)

Peter Larsson
Helsingborg, Sweden

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-05-02 : 20:41:51

quote:

Where no columns are unique, and records could be duplicates, how would one (preferably with a single query) get the userId of the record with the highest, unique (by user) score?

In that scenario, your data is more or less garbage. You really need to have at least some sort of primary key on your table, otherwise you cannot write efficient or accurate SQL statements. There's definitely no way you can do it in 1 SELECT, you'd need to move it to another place and assign at least a "fake" primary key like Peso did. Are you sure you cannot put proper constraints on your data? How do you know if two rows that say the same thing aren't duplicates or errors? Are you sure that there are no additional columns that will help uniquely identify each row of your data? (remember, a PK can be a combination of multiple columns).

I am sure we can do it with a single SELECT, but not unless the data is structured properly.

- Jeff
http://weblogs.sqlteam.com/JeffS

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-05-02 : 20:48:11

Using Peso's @stage table variable, which has a nice primary key, we can do it in 1 SELECT something like this:


select top 1 min(userID) as UserID, score, rank
from
(
  select s1.userID, s1.score, (select count(*) from @stage s2 
                               where s2.userID = s1.userID and 
                                     (s2.score > s1.score or 
                                      (s1.score = s2.score and s1.recID > s2.recID)
                                      )
                               ) as Rank
  from @stage s1
  ) x
where x.userID in (select distinct userID 
                   from @stage s3 where score = (select max(score) from @stage s4))
group by score, rank
having count(*) = 1
order by score desc

- Jeff
http://weblogs.sqlteam.com/JeffS

Alikkin
Starting Member

6 Posts

Posted - 2007-05-03 : 09:56:14

Thanks guys, I'll take a look at your suggestions after my morning meeting.

To answer your questions jsmith8858, it's a bit of a legacy system, and being merely a developer I don't have the permissions to change the structure of the system. (Futher, the original designer of the system has a touchy ego on changing her architecture). And while there are other columns in the table, they are in no way reliable for creating a unique record, (not even the DateTime stamp), in fact most are NULL. That being said, the data IS being thrown into a temp table (this is a small part of a much larger query) and I could create a identity field for it pretty easily. I wish this were SQL 2005, as DENSE_RANK seems like it could be very useful in this scenario as well.

Thanks again for your help.

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-05-03 : 10:13:48

yes, Rank would help a lot if you had SQL 2005. I am using the pre-SQL 2005 method, which is very inefficient, to calculate a Rank, which is pretty long and complicated (especially when you have to deal with ties). If you replace that whole part with a simple RANK() function in SQL 2005, the whole thing becomes quite simple.

- Jeff
http://weblogs.sqlteam.com/JeffS

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-05-03 : 10:35:24

There is a way to calculate the rank in SQL 2000...

-- prepare sample data
declare @scores table (userid varchar(9), model varchar(15), score int)

insert	@scores
select	'1', 'a', 785 union all
select	'1', 'b', 779 union all
select	'1', 'b', 779 union all
select	'1', 'c', 760 union all
select	'2', 'b', 783 union all
select	'2', 'c', 771 union all
select	'2', 'a', 768 union all
select	'2', 'a', 768 union all
select	'3', 'c', 785 union all
select	'3', 'a', 779 union all
select	'3', 'b', 763 union all
select	'3', 'b', 760

-- stage the data
declare @stage table (recid int identity, userid varchar(9), score int, rank int)

insert		@stage (userid, score)
select		userid,
		score
from		@scores
order by	userid,
		score desc

-- create ranking
update		s
set		s.rank = s.recid - x.recid -- ADD +1 IF YOU WANT 1-based rank instead of 0-based rank
from		@stage as s
inner join	(
			select		userid,
					min(recid) as recid
			from		@stage
			group by	userid
		) as x on x.userid = s.userid

-- Now put Jeff's query here, since rank is pre-calculated
select top 1 min(s1.userID) as UserID, s1.score, s1.rank
from @stage as s1
where s1.userID in (select distinct userID 
                   from @stage s3 where score = (select max(score) from @stage s4))
group by s1.score, s1.rank
having count(*) = 1
order by s1.score desc

Peter Larsson
Helsingborg, Sweden

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources