Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
siftekhar
Starting Member
3 Posts |
Posted - 2013-07-19 : 17:20:26
|
Hello, I am trying to randomly select 20% rows in each category (distinct values).Here is what I have,A table with about 10million rows. On Column A (not primary key), there are about 50 distinct values, repeated. For example, value 1 appears 200 thousand times, value 2 appears 5 thousand times, value 3 appears 20 thousand times etc. I want to select a 20% sample from each group (Column A). For example, for 200,000 value 1 rows in Column A, I will have 40 thousand rows randomly selected, for value 2, out of 5000 rows, I will have 1000 random rows, for value 3, I will have 4000 rows etc. The output will be in a single table. What would be the query syntax?Thanks,Siftekhar |
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2013-07-20 : 08:23:49
|
You can use a logic like belowSELECT *FROM(SELECT NTILE(5) OVER (PARTITION BY ColA ORDER BY PrimaryKey) as rn,*from Table)tWHERE rn=1 this will give you 20 % of rows for each values of category in ColA------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/https://www.facebook.com/VmBlogs |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2013-07-20 : 12:14:25
|
ORDER BY NEWID() will give you random rows. N 56°04'39.26"E 12°55'05.63" |
|
|
siftekhar
Starting Member
3 Posts |
Posted - 2013-07-22 : 14:15:27
|
quote: Originally posted by visakh16 You can use a logic like belowSELECT *FROM(SELECT NTILE(5) OVER (PARTITION BY ColA ORDER BY PrimaryKey) as rn,*from Table)tWHERE rn=1 this will give you 20 % of rows for each values of category in ColA
This gives me an error==Msg 156, Level 15, State 1, Line 5Incorrect syntax near the keyword 'Table'.== |
|
|
James K
Master Smack Fu Yak Hacker
3873 Posts |
Posted - 2013-07-22 : 14:38:13
|
quote: Originally posted by siftekhar
quote: Originally posted by visakh16 You can use a logic like belowSELECT *FROM(SELECT NTILE(5) OVER (PARTITION BY ColA ORDER BY PrimaryKey) as rn,*from Table)tWHERE rn=1 this will give you 20 % of rows for each values of category in ColA
This gives me an error==Msg 156, Level 15, State 1, Line 5Incorrect syntax near the keyword 'Table'.==
He was just showing you an example because you didn't say what your table name was. Replace the word "Table" with the name of your table. |
|
|
siftekhar
Starting Member
3 Posts |
Posted - 2013-07-22 : 15:10:34
|
quote: Originally posted by James KHe was just showing you an example because you didn't say what your table name was. Replace the word "Table" with the name of your table.
I get it now - thanks. |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2013-07-23 : 01:00:29
|
welcome------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/https://www.facebook.com/VmBlogs |
|
|
|
|
|
|
|