Need clever idea - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

Need clever idea

Author

Topic

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-05-25 : 01:18:58

I've got a table; like so:


user       data
----       -----------
123        x
0          y
765        z
0          a
875        b
875        c
0          d
123        e

...I need to do something like SELECT COUNT(DISTINCT USER), except I need it to count each 0 as a distinct user. So, in that example, I'd be looking for a result of 6.

I can definitely see how to do it with a temporary table, but I can't see how to do it with a single select. Any ideas?

Thanks
-b

Merkin
Funky Drop Bear Fearing SQL Dude!

4970 Posts

Posted - 2003-05-25 : 01:39:43

Hi

OK, This is a hack

CREATE TABLE USERDATA (
USERID INT,
DATA CHAR(10)
)
GO
INSERT INTO USERDATA VALUES (123, 'X')
INSERT INTO USERDATA VALUES (0, 'Y')
INSERT INTO USERDATA VALUES (756, 'Z')
INSERT INTO USERDATA VALUES (0, 'A')
INSERT INTO USERDATA VALUES (875, 'B')
INSERT INTO USERDATA VALUES (875, 'C')
INSERT INTO USERDATA VALUES (0, 'D')
INSERT INTO USERDATA VALUES (123, 'E')

GO

Select
Count(DISTINCT CASE WHEN UserID = 0 Then Checksum(newID()) ELSE UserID END)
FROM
UserData

What do you think ?

Damian

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2003-05-25 : 06:44:06

I kind of like Damian's hack, but it's a bit fragile: in 100 trials of 100000 CHECKSUM(NEWID()) values, the number of duplicates I got were (SQL Server 2000 sp mumble):


dups    count
0       29
1       40
2       23
3        7
6        1

So that's only about 30% success on that size dataset.


SELECT SUM(ct)
FROM (
    SELECT CASE WHEN userID = 0 THEN COUNT(*) ELSE 1 END AS ct
    FROM UserData
    GROUP BY userid
  ) AS T

But I suppose technically, that should be COALESCE(SUM(ct), 0), if you care about the no-users case.

Technical footnote: according to my calculations, you would expect about 31.2%. If d is the number of possible (and equally likely) number of outcomes of CHECKSUM(NEWID()) and n is the number of samples, then the probability of all samples being distinct is d!/((d-n)! * d^n), which approximates to e^(-n * (n-1) / (2*d)).
Plugging in the values d = 2^32 and n = 100000, we get 0.3122.
See: http://mathworld.wolfram.com/BirthdayProblem.html

Displacement activity? Surely not!

Edited by - Arnold Fribble on 05/25/2003 07:50:58

mono
Starting Member

36 Posts

Posted - 2003-05-25 : 07:30:57

Do your select distinct for non-zero rows (=3) and add the number of zero rows:

select (select count(userid) from USERDATA where userid = 0) + (select count(distinct USERID) from USERDATA where userid <> 0)

Less haste, more speed.

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-05-25 : 16:53:54

I like mono's simple solution; that works. Though bonus points for cleverness on the newID() thing; that's still an interesting technique. For large samples, you could even factor that 31% thing back in and make it statistically accurate, if not exact.

Cheers
-b

Merkin
Funky Drop Bear Fearing SQL Dude!

4970 Posts

Posted - 2003-05-26 : 09:32:58

I had a sneaking suspicion when I posted that Arnold would post somehting just like that

Damian

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources