Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
aiken
Aged Yak Warrior
525 Posts |
Posted - 2003-05-25 : 01:18:58
|
I've got a table; like so:user data---- -----------123 x0 y765 z0 a875 b875 c0 d123 e ...I need to do something like SELECT COUNT(DISTINCT USER), except I need it to count each 0 as a distinct user. So, in that example, I'd be looking for a result of 6.I can definitely see how to do it with a temporary table, but I can't see how to do it with a single select. Any ideas?Thanks-b |
|
|
Merkin
Funky Drop Bear Fearing SQL Dude!
4970 Posts |
Posted - 2003-05-25 : 01:39:43
|
HiOK, This is a hack CREATE TABLE USERDATA ( USERID INT, DATA CHAR(10))GOINSERT INTO USERDATA VALUES (123, 'X')INSERT INTO USERDATA VALUES (0, 'Y')INSERT INTO USERDATA VALUES (756, 'Z')INSERT INTO USERDATA VALUES (0, 'A')INSERT INTO USERDATA VALUES (875, 'B')INSERT INTO USERDATA VALUES (875, 'C')INSERT INTO USERDATA VALUES (0, 'D')INSERT INTO USERDATA VALUES (123, 'E')GOSelect Count(DISTINCT CASE WHEN UserID = 0 Then Checksum(newID()) ELSE UserID END)FROMUserDataWhat do you think ?Damian |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2003-05-25 : 06:44:06
|
I kind of like Damian's hack, but it's a bit fragile: in 100 trials of 100000 CHECKSUM(NEWID()) values, the number of duplicates I got were (SQL Server 2000 sp mumble):dups count0 291 402 233 76 1 So that's only about 30% success on that size dataset.SELECT SUM(ct)FROM ( SELECT CASE WHEN userID = 0 THEN COUNT(*) ELSE 1 END AS ct FROM UserData GROUP BY userid ) AS T But I suppose technically, that should be COALESCE(SUM(ct), 0), if you care about the no-users case.Technical footnote: according to my calculations, you would expect about 31.2%. If d is the number of possible (and equally likely) number of outcomes of CHECKSUM(NEWID()) and n is the number of samples, then the probability of all samples being distinct is d!/((d-n)! * d^n), which approximates to e^(-n * (n-1) / (2*d)).Plugging in the values d = 2^32 and n = 100000, we get 0.3122.See: http://mathworld.wolfram.com/BirthdayProblem.htmlDisplacement activity? Surely not! Edited by - Arnold Fribble on 05/25/2003 07:50:58 |
 |
|
|
mono
Starting Member
36 Posts |
Posted - 2003-05-25 : 07:30:57
|
| Do your select distinct for non-zero rows (=3) and add the number of zero rows:select (select count(userid) from USERDATA where userid = 0) + (select count(distinct USERID) from USERDATA where userid <> 0)Less haste, more speed. |
 |
|
|
aiken
Aged Yak Warrior
525 Posts |
Posted - 2003-05-25 : 16:53:54
|
| I like mono's simple solution; that works. Though bonus points for cleverness on the newID() thing; that's still an interesting technique. For large samples, you could even factor that 31% thing back in and make it statistically accurate, if not exact.Cheers-b |
 |
|
|
Merkin
Funky Drop Bear Fearing SQL Dude!
4970 Posts |
Posted - 2003-05-26 : 09:32:58
|
I had a sneaking suspicion when I posted that Arnold would post somehting just like that Damian |
 |
|
|
|
|
|
|
|