How to fetch a random non-duplicate record??? - SQL Server Forums

Home | Weblogs | Forums

Site Sponsored By: SQLDSC - SQL Server Desired State Configuration

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

How to fetch a random non-duplicate record???

Author

Topic

wgpubs
Yak Posting Veteran

67 Posts

Posted - 2004-02-06 : 12:29:16

Hi folks,

I am building an ASP.NET application that returns a single random record from a table to a user upon each request. What I need to do is ensure that the record returned is not one they have already seen ... and what I'm looking for is the most efficient way to do this.

There are over 1.5 million records in that table.

The easiset approach seems to be to use NOT IN and maintain a list of previous Record IDs to use it with ... but it doesn't seem very efficient to me especially on account of the fact a user may select up to 1,000 reconds in a given session.

Anyways, I'm open to suggestions ... thanks - wgpubs

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2004-02-06 : 12:31:16

To get a random row:

SELECT TOP 1 *
FROM Table1
ORDER BY NEWID()

Does it have to be one that they haven't seen before though?

Tara

wgpubs
Yak Posting Veteran

67 Posts

Posted - 2004-02-06 : 13:21:53

Right ... it can't be one they've seen before.

(I have it encapsulated in a stored proc currently using a similar method for generating the random record using the TSQL you specified.)

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2004-02-06 : 13:23:17

So then you'd need a table to store the previous IDs as you mentioned. Previous IDs for each of the users. Then you run the random method that I posted plus WHERE NOT IN (SELECT ...).

Tara

wgpubs
Yak Posting Veteran

67 Posts

Posted - 2004-02-06 : 13:31:30

Is this the most efficient means of removing duplication records?

As I said before, the table has roughly 1.5 million records and in a given session a user could request upwards to 1,000 records one at a time ... and given this I thought that the stored proc wouldn't be able to be optimized if I used something like NOT IN (' + @ids + ')' where @ids would equal something like '1,2,3,4,56,789,345676,3234,86,997678,3333,835,84135,...'

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2004-02-06 : 13:34:28

BTW, you can't use NOT IN like the way that you mentioned:

http://www.sqlteam.com/item.asp?ItemID=11499

So you could use that method from the article or you could select from a table where the IDs are stored:

WHERE NOT IN (SELECT ID FROM Table1 WHERE UserID = @UserID)

Tara

wgpubs
Yak Posting Veteran

67 Posts

Posted - 2004-02-06 : 13:44:42

Ok I see what you are saying ... now in terms of the two proposed ways of ensuring non-duplicate random records ... which in your opinion is most efficient and will offer the best performance (that is my biggest concern at the moment given the number of records I'm dealing with)?

thanks

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2004-02-06 : 13:48:41

I don't think that you can just say that one way is better than the other. Both would have to be tested. Maybe someone else here knows which way is better. I believe each would have its benefits. It just would depend on the amount of IDs there were. So one way would be faster for a small set of IDs, the other would be faster for a large set of IDs.

Tara

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2004-02-06 : 13:57:16

It depends on how NEWID() is generated. Most random # generators are pretty bad; they start from a seed value and then generate completely predictable and repeatable sequences.

If that's how NEWID() works (I don't know) then you really don't have to worry about duplicates as they would almost never happen.

Jonathan
Gaming will never be the same

wgpubs
Yak Posting Veteran

67 Posts

Posted - 2004-02-06 : 14:05:31

Don't know the internals of NEWID() ... but it does return duplicated records periodically

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2004-02-06 : 14:05:43

NEWID is based on the NIC of the SQL Server from what I understand. But the problem here is that it has to be random based upon what the user has already seen in a previous visit. You don't want to pull up any of the rows that were previously seen.

Tara

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-02-06 : 16:29:15

Erk!

SELECT TOP 1 *
FROM Table1
ORDER BY NEWID()

is fine for small tables, but here you're asking it to generate a GUID for every row, sort (or at least Sort/TopN) 1.5 million rows and spit out the first row. When I tried it on a 2 million row table (quite narrow rows), it took about 5.5 seconds.

Hmm... thinking about it, that means there's probably no special optimization within Sort/TopN: you'd expect it to keep a heap (in the CS sense) of min/max values for small TOP N values.

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2004-02-06 : 16:32:36

Arnold, what would you suggest?

Tara

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-02-06 : 16:36:12

... don't know really: the problem is that there's no row numbers.
Normally I'm in favour of there not being row numbers, but here it makes things difficult.
Unless, of course, the design can accomodate having something that effectively is a row numbers: a unique, gap-freee integer column with an index on it.

Or was the problem really making sure the values chosen haven't been seen before?
Either you have to keep a list of rows chosen (or not chosen) or you have to preassign an order in which they will be chosen, either by tagging the values or by going through them in some pseudorandom order based on the id's...

robvolk
Most Valuable Yak

15732 Posts

Posted - 2004-02-07 : 11:03:36

Another idea to try is to add a column as a flag and update it if the row is selected. You can then add a WHERE clause to ignore those rows:

ALTER TABLE myTable ADD Flagged bit NOT NULL DEFAULT(0)

CREATE PROCEDURE GetRow AS
SET NOCOUNT ON
SELECT TOP 1 * INTO #results FROM myTable WHERE Flagged=0 ORDER BY NewID()
UPDATE myTable SET Flagged=1
FROM myTable INNER JOIN #results ON myTable.PKColumn=#results.PKColumn
SELECT * FROM #results
DROP TABLE #results

That way you don't need an extra table to store already selected row ID's. It might be better to not use a bit column though, they can't be indexed in 7.0. A tinyint might be better, and it might be worth thinking about putting this column in the clustered index for the table.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources