| Author |
Topic |
|
wgpubs
Yak Posting Veteran
67 Posts |
Posted - 2004-02-06 : 12:29:16
|
| Hi folks,I am building an ASP.NET application that returns a single random record from a table to a user upon each request. What I need to do is ensure that the record returned is not one they have already seen ... and what I'm looking for is the most efficient way to do this.There are over 1.5 million records in that table.The easiset approach seems to be to use NOT IN and maintain a list of previous Record IDs to use it with ... but it doesn't seem very efficient to me especially on account of the fact a user may select up to 1,000 reconds in a given session. Anyways, I'm open to suggestions ... thanks - wgpubs |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2004-02-06 : 12:31:16
|
| To get a random row:SELECT TOP 1 *FROM Table1ORDER BY NEWID()Does it have to be one that they haven't seen before though?Tara |
 |
|
|
wgpubs
Yak Posting Veteran
67 Posts |
Posted - 2004-02-06 : 13:21:53
|
| Right ... it can't be one they've seen before.(I have it encapsulated in a stored proc currently using a similar method for generating the random record using the TSQL you specified.) |
 |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2004-02-06 : 13:23:17
|
| So then you'd need a table to store the previous IDs as you mentioned. Previous IDs for each of the users. Then you run the random method that I posted plus WHERE NOT IN (SELECT ...).Tara |
 |
|
|
wgpubs
Yak Posting Veteran
67 Posts |
Posted - 2004-02-06 : 13:31:30
|
| Is this the most efficient means of removing duplication records?As I said before, the table has roughly 1.5 million records and in a given session a user could request upwards to 1,000 records one at a time ... and given this I thought that the stored proc wouldn't be able to be optimized if I used something like NOT IN (' + @ids + ')' where @ids would equal something like '1,2,3,4,56,789,345676,3234,86,997678,3333,835,84135,...' |
 |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2004-02-06 : 13:34:28
|
| BTW, you can't use NOT IN like the way that you mentioned:http://www.sqlteam.com/item.asp?ItemID=11499So you could use that method from the article or you could select from a table where the IDs are stored:WHERE NOT IN (SELECT ID FROM Table1 WHERE UserID = @UserID)Tara |
 |
|
|
wgpubs
Yak Posting Veteran
67 Posts |
Posted - 2004-02-06 : 13:44:42
|
| Ok I see what you are saying ... now in terms of the two proposed ways of ensuring non-duplicate random records ... which in your opinion is most efficient and will offer the best performance (that is my biggest concern at the moment given the number of records I'm dealing with)?thanks |
 |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2004-02-06 : 13:48:41
|
| I don't think that you can just say that one way is better than the other. Both would have to be tested. Maybe someone else here knows which way is better. I believe each would have its benefits. It just would depend on the amount of IDs there were. So one way would be faster for a small set of IDs, the other would be faster for a large set of IDs.Tara |
 |
|
|
setbasedisthetruepath
Used SQL Salesman
992 Posts |
Posted - 2004-02-06 : 13:57:16
|
| It depends on how NEWID() is generated. Most random # generators are pretty bad; they start from a seed value and then generate completely predictable and repeatable sequences.If that's how NEWID() works (I don't know) then you really don't have to worry about duplicates as they would almost never happen.JonathanGaming will never be the same |
 |
|
|
wgpubs
Yak Posting Veteran
67 Posts |
Posted - 2004-02-06 : 14:05:31
|
| Don't know the internals of NEWID() ... but it does return duplicated records periodically |
 |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2004-02-06 : 14:05:43
|
| NEWID is based on the NIC of the SQL Server from what I understand. But the problem here is that it has to be random based upon what the user has already seen in a previous visit. You don't want to pull up any of the rows that were previously seen.Tara |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2004-02-06 : 16:29:15
|
| Erk!SELECT TOP 1 *FROM Table1ORDER BY NEWID()is fine for small tables, but here you're asking it to generate a GUID for every row, sort (or at least Sort/TopN) 1.5 million rows and spit out the first row. When I tried it on a 2 million row table (quite narrow rows), it took about 5.5 seconds.Hmm... thinking about it, that means there's probably no special optimization within Sort/TopN: you'd expect it to keep a heap (in the CS sense) of min/max values for small TOP N values. |
 |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2004-02-06 : 16:32:36
|
| Arnold, what would you suggest?Tara |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2004-02-06 : 16:36:12
|
| ... don't know really: the problem is that there's no row numbers.Normally I'm in favour of there not being row numbers, but here it makes things difficult.Unless, of course, the design can accomodate having something that effectively is a row numbers: a unique, gap-freee integer column with an index on it.Or was the problem really making sure the values chosen haven't been seen before?Either you have to keep a list of rows chosen (or not chosen) or you have to preassign an order in which they will be chosen, either by tagging the values or by going through them in some pseudorandom order based on the id's... |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2004-02-07 : 11:03:36
|
| Another idea to try is to add a column as a flag and update it if the row is selected. You can then add a WHERE clause to ignore those rows:ALTER TABLE myTable ADD Flagged bit NOT NULL DEFAULT(0)CREATE PROCEDURE GetRow ASSET NOCOUNT ONSELECT TOP 1 * INTO #results FROM myTable WHERE Flagged=0 ORDER BY NewID()UPDATE myTable SET Flagged=1FROM myTable INNER JOIN #results ON myTable.PKColumn=#results.PKColumnSELECT * FROM #resultsDROP TABLE #resultsThat way you don't need an extra table to store already selected row ID's. It might be better to not use a bit column though, they can't be indexed in 7.0. A tinyint might be better, and it might be worth thinking about putting this column in the clustered index for the table. |
 |
|
|
|