Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 get random samples

Author  Topic 

xpandre
Posting Yak Master

212 Posts

Posted - 2011-05-11 : 12:56:37
Hi,

What would be the most efficient way to get a random sample set of data from a huge set of data from a table.

For eg :
I need to get the no of call made by a subscriber between date 01/01/2011 and 21/04/2011. In my DB, I would have more than a million records. But I want just 2000 of that. But I need the dates of the call to be truly scattered in the time interval given.
like I need to have at least 1 record from Jan, 1 record from Feb, 1 record from March and 1 record from April.

I tried using NEWID() --> But this is very costly.
I tried using BINARY_CHECKSUM --> But this for some reason doesn't get me the sample as i want..like it gets me all 2000 from Jan and nothing from other months.

Thank you guys in advance
Sam

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-05-11 : 13:08:58
Is the cost for the newid() method in generation of the newid, or in the sorting based on newid? If it is in the sorting, you might try crypt_gen_random function instead of newid. It generates hexadecimal number (of any length you specify), so sorting may be faster.
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2011-05-11 : 16:04:23
http://msdn.microsoft.com/en-us/library/cc441928.aspx
http://msdn.microsoft.com/en-us/library/ms189108.aspx
Go to Top of Page

xpandre
Posting Yak Master

212 Posts

Posted - 2011-05-12 : 03:47:23
Thank you :-)..i shall try these and let you know my results..
Go to Top of Page
   

- Advertisement -