Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 select tru random rows

Author  Topic 

jassie
Constraint Violating Yak Guru

332 Posts

Posted - 2011-05-27 : 12:25:22
I would like to know how to select random records from data in both sql server 2005 standard and sql server 2008 r2 using t-sql.

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-05-27 : 12:40:14
You can use the tablesample function described here: http://msdn.microsoft.com/en-us/library/ms189108(v=SQL.90).aspx

However, they do indicate that it is not "truly random". If you really want truly random, you can do

select top (n)
*
from
YourTable
order by
newid();
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-05-27 : 19:54:08
I may have given you wrong information here when I said that you can get truly random rows simply by ordering by newid(). The reason I suspect I was wrong is this snippet from the MSDN page that I referred to in my previous post:

If you really want a random sample of individual rows, modify your query to filter out rows randomly, instead of using TABLESAMPLE. For example, the following query uses the NEWID function to return approximately one percent of the rows of the Sales.SalesOrderDetail table:
SELECT * FROM Sales.SalesOrderDetail
WHERE 0.01 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float)
/ CAST (0x7fffffff AS int)
The SalesOrderID column is included in the CHECKSUM expression so that NEWID() evaluates once per row to achieve sampling on a per-row basis. The expression CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float / CAST (0x7fffffff AS int) evaluates to a random float value between 0 and 1.


I am not sure whether newid will be evaluated once per row if you use it only ordering, and even if it is, whether SQL server will be too smart and know that you are trying to trick it and just get you the first N rows that it can get at. It must be one of those if the MSDN recommendation is to include a column from the table along with newid function to ensure randomness.

Of course, it all depends on how random you want your data to be. I was concerned because you said you want "tru random rows" in the title of your posting.
Go to Top of Page

slimt_slimt
Aged Yak Warrior

746 Posts

Posted - 2011-05-28 : 03:38:58
Sunitabeck,

i usually use this:

select top 1200 * from Sales.SalesOrderDetail
order by
RAND(CHECKSUM(NEWID())%1000000000 + SalesOrderID)


which is the closest (so to say) to random naive bayes.
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-05-28 : 07:15:15
Hi T, That sounds like it should work, since that should force a per-row evaluation of the newid function. What is not clear to me is why the MSDN page suggested the incomprehensible (to me) approach.
Go to Top of Page
   

- Advertisement -