Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Query: randomly select 20% rows in distinct groups

Author  Topic 

siftekhar
Starting Member

3 Posts

Posted - 2013-07-19 : 17:20:26
Hello,

I am trying to randomly select 20% rows in each category (distinct values).

Here is what I have,

A table with about 10million rows. On Column A (not primary key), there are about 50 distinct values, repeated. For example, value 1 appears 200 thousand times, value 2 appears 5 thousand times, value 3 appears 20 thousand times etc.

I want to select a 20% sample from each group (Column A). For example, for 200,000 value 1 rows in Column A, I will have 40 thousand rows randomly selected, for value 2, out of 5000 rows, I will have 1000 random rows, for value 3, I will have 4000 rows etc.

The output will be in a single table.

What would be the query syntax?

Thanks,

Siftekhar

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2013-07-20 : 08:23:49
You can use a logic like below

SELECT *
FROM
(
SELECT NTILE(5) OVER (PARTITION BY ColA ORDER BY PrimaryKey) as rn,*
from Table
)t
WHERE rn=1


this will give you 20 % of rows for each values of category in ColA

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2013-07-20 : 12:14:25
ORDER BY NEWID() will give you random rows.



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

siftekhar
Starting Member

3 Posts

Posted - 2013-07-22 : 14:15:27
quote:
Originally posted by visakh16

You can use a logic like below

SELECT *
FROM
(
SELECT NTILE(5) OVER (PARTITION BY ColA ORDER BY PrimaryKey) as rn,*
from Table
)t
WHERE rn=1


this will give you 20 % of rows for each values of category in ColA





This gives me an error
==
Msg 156, Level 15, State 1, Line 5
Incorrect syntax near the keyword 'Table'.
==
Go to Top of Page

James K
Master Smack Fu Yak Hacker

3873 Posts

Posted - 2013-07-22 : 14:38:13
quote:
Originally posted by siftekhar

quote:
Originally posted by visakh16

You can use a logic like below

SELECT *
FROM
(
SELECT NTILE(5) OVER (PARTITION BY ColA ORDER BY PrimaryKey) as rn,*
from Table
)t
WHERE rn=1


this will give you 20 % of rows for each values of category in ColA





This gives me an error
==
Msg 156, Level 15, State 1, Line 5
Incorrect syntax near the keyword 'Table'.
==


He was just showing you an example because you didn't say what your table name was. Replace the word "Table" with the name of your table.
Go to Top of Page

siftekhar
Starting Member

3 Posts

Posted - 2013-07-22 : 15:10:34
quote:
Originally posted by James K

He was just showing you an example because you didn't say what your table name was. Replace the word "Table" with the name of your table.




I get it now - thanks.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2013-07-23 : 01:00:29
welcome

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page
   

- Advertisement -