Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
jassie
Constraint Violating Yak Guru
332 Posts |
Posted - 2011-05-27 : 12:25:22
|
| I would like to know how to select random records from data in both sql server 2005 standard and sql server 2008 r2 using t-sql. |
|
|
sunitabeck
Master Smack Fu Yak Hacker
5155 Posts |
Posted - 2011-05-27 : 12:40:14
|
You can use the tablesample function described here: http://msdn.microsoft.com/en-us/library/ms189108(v=SQL.90).aspxHowever, they do indicate that it is not "truly random". If you really want truly random, you can doselect top (n) *from YourTableorder by newid(); |
 |
|
|
sunitabeck
Master Smack Fu Yak Hacker
5155 Posts |
Posted - 2011-05-27 : 19:54:08
|
| I may have given you wrong information here when I said that you can get truly random rows simply by ordering by newid(). The reason I suspect I was wrong is this snippet from the MSDN page that I referred to in my previous post:If you really want a random sample of individual rows, modify your query to filter out rows randomly, instead of using TABLESAMPLE. For example, the following query uses the NEWID function to return approximately one percent of the rows of the Sales.SalesOrderDetail table:SELECT * FROM Sales.SalesOrderDetailWHERE 0.01 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float)/ CAST (0x7fffffff AS int)The SalesOrderID column is included in the CHECKSUM expression so that NEWID() evaluates once per row to achieve sampling on a per-row basis. The expression CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float / CAST (0x7fffffff AS int) evaluates to a random float value between 0 and 1.I am not sure whether newid will be evaluated once per row if you use it only ordering, and even if it is, whether SQL server will be too smart and know that you are trying to trick it and just get you the first N rows that it can get at. It must be one of those if the MSDN recommendation is to include a column from the table along with newid function to ensure randomness.Of course, it all depends on how random you want your data to be. I was concerned because you said you want "tru random rows" in the title of your posting. |
 |
|
|
slimt_slimt
Aged Yak Warrior
746 Posts |
Posted - 2011-05-28 : 03:38:58
|
Sunitabeck,i usually use this:select top 1200 * from Sales.SalesOrderDetailorder byRAND(CHECKSUM(NEWID())%1000000000 + SalesOrderID) which is the closest (so to say) to random naive bayes. |
 |
|
|
sunitabeck
Master Smack Fu Yak Hacker
5155 Posts |
Posted - 2011-05-28 : 07:15:15
|
| Hi T, That sounds like it should work, since that should force a per-row evaluation of the newid function. What is not clear to me is why the MSDN page suggested the incomprehensible (to me) approach. |
 |
|
|
|
|
|
|
|