Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Need clever idea

Author  Topic 

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-05-25 : 01:18:58
I've got a table; like so:

user data
---- -----------
123 x
0 y
765 z
0 a
875 b
875 c
0 d
123 e


...I need to do something like SELECT COUNT(DISTINCT USER), except I need it to count each 0 as a distinct user. So, in that example, I'd be looking for a result of 6.

I can definitely see how to do it with a temporary table, but I can't see how to do it with a single select. Any ideas?

Thanks
-b

Merkin
Funky Drop Bear Fearing SQL Dude!

4970 Posts

Posted - 2003-05-25 : 01:39:43
Hi

OK, This is a hack



CREATE TABLE USERDATA (
USERID INT,
DATA CHAR(10)
)
GO
INSERT INTO USERDATA VALUES (123, 'X')
INSERT INTO USERDATA VALUES (0, 'Y')
INSERT INTO USERDATA VALUES (756, 'Z')
INSERT INTO USERDATA VALUES (0, 'A')
INSERT INTO USERDATA VALUES (875, 'B')
INSERT INTO USERDATA VALUES (875, 'C')
INSERT INTO USERDATA VALUES (0, 'D')
INSERT INTO USERDATA VALUES (123, 'E')

GO

Select
Count(DISTINCT CASE WHEN UserID = 0 Then Checksum(newID()) ELSE UserID END)
FROM
UserData



What do you think ?

Damian
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2003-05-25 : 06:44:06
I kind of like Damian's hack, but it's a bit fragile: in 100 trials of 100000 CHECKSUM(NEWID()) values, the number of duplicates I got were (SQL Server 2000 sp mumble):

dups count
0 29
1 40
2 23
3 7
6 1

 So that's only about 30% success on that size dataset.

SELECT SUM(ct)
FROM (
SELECT CASE WHEN userID = 0 THEN COUNT(*) ELSE 1 END AS ct
FROM UserData
GROUP BY userid
) AS T

 
But I suppose technically, that should be COALESCE(SUM(ct), 0), if you care about the no-users case.

Technical footnote: according to my calculations, you would expect about 31.2%. If d is the number of possible (and equally likely) number of outcomes of CHECKSUM(NEWID()) and n is the number of samples, then the probability of all samples being distinct is d!/((d-n)! * d^n), which approximates to e^(-n * (n-1) / (2*d)).
Plugging in the values d = 2^32 and n = 100000, we get 0.3122.
See: http://mathworld.wolfram.com/BirthdayProblem.html

Displacement activity? Surely not!


Edited by - Arnold Fribble on 05/25/2003 07:50:58
Go to Top of Page

mono
Starting Member

36 Posts

Posted - 2003-05-25 : 07:30:57
Do your select distinct for non-zero rows (=3) and add the number of zero rows:

select (select count(userid) from USERDATA where userid = 0) + (select count(distinct USERID) from USERDATA where userid <> 0)



Less haste, more speed.
Go to Top of Page

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-05-25 : 16:53:54
I like mono's simple solution; that works. Though bonus points for cleverness on the newID() thing; that's still an interesting technique. For large samples, you could even factor that 31% thing back in and make it statistically accurate, if not exact.

Cheers
-b

Go to Top of Page

Merkin
Funky Drop Bear Fearing SQL Dude!

4970 Posts

Posted - 2003-05-26 : 09:32:58
I had a sneaking suspicion when I posted that Arnold would post somehting just like that

Damian
Go to Top of Page
   

- Advertisement -