SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Query: randomly select 20% rows in distinct groups
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

siftekhar
Starting Member

USA
3 Posts

Posted - 07/19/2013 :  17:20:26  Show Profile  Reply with Quote
Hello,

I am trying to randomly select 20% rows in each category (distinct values).

Here is what I have,

A table with about 10million rows. On Column A (not primary key), there are about 50 distinct values, repeated. For example, value 1 appears 200 thousand times, value 2 appears 5 thousand times, value 3 appears 20 thousand times etc.

I want to select a 20% sample from each group (Column A). For example, for 200,000 value 1 rows in Column A, I will have 40 thousand rows randomly selected, for value 2, out of 5000 rows, I will have 1000 random rows, for value 3, I will have 4000 rows etc.

The output will be in a single table.

What would be the query syntax?

Thanks,

Siftekhar

Edited by - siftekhar on 07/19/2013 17:21:17

visakh16
Very Important crosS Applying yaK Herder

India
52309 Posts

Posted - 07/20/2013 :  08:23:49  Show Profile  Reply with Quote
You can use a logic like below

SELECT  *
FROM
(
SELECT NTILE(5) OVER (PARTITION BY ColA ORDER BY PrimaryKey) as rn,*
from Table
)t
WHERE rn=1


this will give you 20 % of rows for each values of category in ColA

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

Sweden
30113 Posts

Posted - 07/20/2013 :  12:14:25  Show Profile  Visit SwePeso's Homepage  Reply with Quote
ORDER BY NEWID() will give you random rows.



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

siftekhar
Starting Member

USA
3 Posts

Posted - 07/22/2013 :  14:15:27  Show Profile  Reply with Quote
quote:
Originally posted by visakh16

You can use a logic like below

SELECT  *
FROM
(
SELECT NTILE(5) OVER (PARTITION BY ColA ORDER BY PrimaryKey) as rn,*
from Table
)t
WHERE rn=1


this will give you 20 % of rows for each values of category in ColA





This gives me an error
==
Msg 156, Level 15, State 1, Line 5
Incorrect syntax near the keyword 'Table'.
==
Go to Top of Page

James K
Flowing Fount of Yak Knowledge

3559 Posts

Posted - 07/22/2013 :  14:38:13  Show Profile  Reply with Quote
quote:
Originally posted by siftekhar

quote:
Originally posted by visakh16

You can use a logic like below

SELECT  *
FROM
(
SELECT NTILE(5) OVER (PARTITION BY ColA ORDER BY PrimaryKey) as rn,*
from Table
)t
WHERE rn=1


this will give you 20 % of rows for each values of category in ColA





This gives me an error
==
Msg 156, Level 15, State 1, Line 5
Incorrect syntax near the keyword 'Table'.
==


He was just showing you an example because you didn't say what your table name was. Replace the word "Table" with the name of your table.
Go to Top of Page

siftekhar
Starting Member

USA
3 Posts

Posted - 07/22/2013 :  15:10:34  Show Profile  Reply with Quote
quote:
Originally posted by James K

He was just showing you an example because you didn't say what your table name was. Replace the word "Table" with the name of your table.




I get it now - thanks.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52309 Posts

Posted - 07/23/2013 :  01:00:29  Show Profile  Reply with Quote
welcome

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.06 seconds. Powered By: Snitz Forums 2000