Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Just for fun Lotto SQL

Author  Topic 

scripter
Starting Member

49 Posts

Posted - 2006-12-03 : 01:13:23
The data table looks like this

date num1 num2 num3 num4 num5 num6
1/1/06 1 45 32 21 15 48

SQL script 1 should be able to do the following
show the Top 3 number pairs
another words if the pair 45 and 15 shows up 98 times and 32 and 12 shows up 3 times

then I want to have 45 and 15 show as one of the top 3 number pairs

I know this is a difficult one


So that being said see if I can re-explain it here
I have say two dates
1/1/2006 and 1/2/2006
on the first date I have six numbers being
1,24,45,35,47,14
and the second date I have
24,35,9,15,46,17

Now I can if it is needed to re do the data structure to reflect it like this
date num
1/1/2006 1
1/1/2006 24
1/1/2006 45
1/1/2006 35
1/1/2006 47
1/1/2006 14

1/2/2006 24
1/2/2006 35
1/2/2006 9
1/2/2006 15
1/2/2006 46
1/2/2006 17



Now somehow I am not sure how but I need to take the six numbers from each date and compare them with every other date.

so another words from date one take 1 and 24 and see if that combo matches other dates then go to 1 and 45 and do the same and so on.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-12-03 : 05:25:51
Use second table layout. Always try to normalize your data.
Now, use this code
-- prepare test data
declare @test table (dt datetime, num tinyint)

insert @test
select '1/1/2006', 1 union all
select '1/1/2006', 24 union all
select '1/1/2006', 45 union all
select '1/1/2006', 35 union all
select '1/1/2006', 47 union all
select '1/1/2006', 14 union all
select '1/2/2006', 24 union all
select '1/2/2006', 35 union all
select '1/2/2006', 9 union all
select '1/2/2006', 15 union all
select '1/2/2006', 46 union all
select '1/2/2006', 17

-- do the magic
select top 3 with ties
t1.num,
t2.num,
count(*) cnt
from @test t1
inner join @test t2 on t2.dt = t1.dt AND t2.num > t1.num
inner join @test t3 on t3.dt <> t1.dt and t3.num = t1.num
inner join @test t4 on t4.dt = t3.dt and t4.num = t2.num
group by t1.num,
t2.num
order by 3 desc,
1,
2


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

scripter
Starting Member

49 Posts

Posted - 2006-12-03 : 09:49:35
Ok I also posted this same thing in a oracle forum (I know PL SQL is different) I am using SQL server 2005 can someone tell me how to convert this so it works with SQL 2005

I think this person is on the right track with it


SELECT a.num1, b.num2, count(*)
FROM (SELECT rownum num1
FROM all_objects
WHERE rownum <= <<MAX VALUE>>) a,
(SELECT rownum num2
FROM all_objects
WHERE rownum <= <<MAX VALUE>>) b,
your_table c
WHERE a.num1 != b.num2
AND
(a.num1 = c.num1 OR a.num1 = c.num2 OR a.num1 = c.num3 OR a.num1 = c.num4 OR a.num1 = c.num5 OR a.num1 = c.num6)
AND (b.num2 = c.num1 OR b.num2 = c.num2 OR b.num2 = c.num3 OR b.num2 = c.num4 OR b.num2 = c.num5 OR b.num2 = c.num6) GROUP BY a.num1, b.num2
Go to Top of Page

scripter
Starting Member

49 Posts

Posted - 2006-12-03 : 09:55:33
Oh Peso I think I just realized what you were attempting to do and I am testing it out right now
Go to Top of Page

Jeff Moden
Aged Yak Warrior

652 Posts

Posted - 2006-12-03 : 10:02:40
Something's wrong... 24,35 should show 6 times, not 3 and 24,45 should show twice but does not show at all. There's a bunch more that should also show up...

-- prepare test data
declare @test table (dt datetime, num tinyint)

insert @test
select '1/1/2006', 1 union all
select '1/1/2006', 24 union all
select '1/1/2006', 45 union all
select '1/1/2006', 35 union all
select '1/1/2006', 47 union all
select '1/1/2006', 14 union all
select '1/2/2006', 24 union all
select '1/2/2006', 35 union all
select '1/2/2006', 9 union all
select '1/2/2006', 15 union all
select '1/2/2006', 46 union all
select '1/2/2006', 47 union all
select '1/3/2006', 1 union all
select '1/3/2006', 24 union all
select '1/3/2006', 45 union all
select '1/3/2006', 35 union all
select '1/3/2006', 46 union all
select '1/3/2006', 14
-- do the magic
select top 3 with ties
t1.num,
t2.num,
count(*) cnt
from @test t1
inner join @test t2 on t2.dt = t1.dt AND t2.num > t1.num
inner join @test t3 on t3.dt <> t1.dt and t3.num = t1.num
inner join @test t4 on t4.dt = t3.dt and t4.num = t2.num
group by t1.num,
t2.num
order by 3 desc,
1,
2

--Jeff Moden
Go to Top of Page

scripter
Starting Member

49 Posts

Posted - 2006-12-03 : 10:40:18
Ok we have some issues here.
first to note the issue I ran across with Peso's which by the way so far you are the only one in all of the forums I have posted this in to show that you are atleast on the right track with this.


anyhow when I ran Pesos script with the database I have which has 8040 records in it I get the following as a return

Num Num cnt
12 44 812
15 40 812
7 15 702

as you can see the number 15 has shown up twice in this which would defeat this purpose so is there any way to not have any duplicates in the return so in other words if 7 and 15 show up 702 times but the number 15 has already been associated to 40 can you get it to go with the next most common number pair?


Also when i ran
Jeff Moden
script which from what I can tell is the same one with additional data I get something else that is a bit disturbing

Num Num Cnt
24 35 6
1 14 2
1 24 2

Reason this is disturbing is well you just take a look at his data there are only 3 dates how could there possibly be 6 pairs of anything? if there is 3 dates the most amount of times 24 and 35 could be paired up is 3 times which if you look at his data you will see that they are paired up exactly 3 times. This make me believe that the script is not actually using the date as a comparison (not sure if I explained the last part about the date correctly if you feel confused about the last part then skip it and pay attention to how many times they are paired)
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-12-03 : 13:24:27
quote:
Originally posted by Jeff Moden

Something's wrong... 24,35 should show 3 times, not 6 and 24,45 should show twice but does not show at all. There's a bunch more that should also show up...

Well... The COUNT(*) is only for doing some basic calculation, not presenting the actual count. That was not a requirement in OP.
Exchange COUNT(*) with COUNT(distinct t3.dt) if that is an issue.

24 and 45 is not showing because the order by for the first and second column. Remove 1, 2 from the order by and you're set.

Full new code here.
-- prepare test data
declare @test table (dt datetime, num tinyint)

insert @test
select '1/1/2006', 1 union all
select '1/1/2006', 24 union all
select '1/1/2006', 45 union all
select '1/1/2006', 35 union all
select '1/1/2006', 47 union all
select '1/1/2006', 14 union all
select '1/2/2006', 24 union all
select '1/2/2006', 35 union all
select '1/2/2006', 9 union all
select '1/2/2006', 15 union all
select '1/2/2006', 46 union all
select '1/2/2006', 47 union all
select '1/3/2006', 1 union all
select '1/3/2006', 24 union all
select '1/3/2006', 45 union all
select '1/3/2006', 35 union all
select '1/3/2006', 46 union all
select '1/3/2006', 14

-- do the magic
select num1,
num2,
cnt
from (
select top 3 with ties
t1.num num1,
t2.num num2,
count(distinct t3.dt) cnt
from @test t1
inner join @test t2 on t2.dt = t1.dt AND t2.num > t1.num
inner join @test t3 on t3.dt <> t1.dt and t3.num = t1.num
inner join @test t4 on t4.dt = t3.dt and t4.num = t2.num
group by t1.num,
t2.num
order by 3 desc
)q
order by 3 desc,
1,
2

Peter Larsson
Helsingborg, Sweden
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-12-03 : 13:27:54
Or simply
select top 3	with ties
t1.num num1,
t2.num num2,
count(distinct t3.dt) cnt
from @test t1
inner join @test t2 on t2.dt = t1.dt AND t2.num > t1.num
inner join @test t3 on t3.dt <> t1.dt and t3.num = t1.num
inner join @test t4 on t4.dt = t3.dt and t4.num = t2.num
group by t1.num,
t2.num
order by 3 desc
if the order of num1 and num2 doesn't matter...

Peter Larsson
Helsingborg, Sweden
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-12-03 : 13:50:23
quote:
Originally posted by Peso

Or simply
select top 3	with ties
t1.num num1,
t2.num num2,
count(*) cnt
from @test t1
inner join @test t2 on t2.dt = t1.dt AND t2.num > t1.num
inner join @test t3 on t3.dt > t1.dt and t3.num = t1.num
inner join @test t4 on t4.dt = t3.dt and t4.num = t2.num
group by t1.num,
t2.num
order by 3 desc
if the order of num1 and num2 doesn't matter...

Peter Larsson
Helsingborg, Sweden

Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-12-03 : 16:42:33
I think we can speed it up...
The SQL1 is basically a rewrite of Peso's,
and yes he was right about the distinct, we need something to not double count permutations.

I nicked the reverse triangular # function from: http://search.cpan.org/~dpchrist/Math-TriangularNumbers-r0_03/lib/Math/TriangularNumbers.pm

-- prepare test data
-- generate 365 testseries
set nocount on

print 'generating 365*7 testseries...'

create table #test
(
dt datetime
,num tinyint
,primary key(dt,num)
)
go

declare @dt datetime; set @dt = '20061101'
declare @num tinyint; declare @cnt tinyint; set @cnt = 1

while 1=1
begin
if @cnt%8 <> 0
begin
set @num = 1+cast(45*rand() as tinyint)
insert #test
select @dt, @num
where not exists(select * from #test where dt = @dt and num = @num)
set @cnt = @cnt + @@rowcount
end
else
begin
set @cnt = 1
set @dt = @dt +1
end
if @dt = '20071101' break
end
go

-- SQL1
print 'SQL1'
declare @d datetime; set @d = getdate()

select top 15 with ties p1.n1
,p1.n2
,count(distinct p1.dt) as cnt
from (
select t1.dt ,t1.num ,t2.num
from #test t1 join #test t2 on t1.dt = t2.dt and t1.num < t2.num ) p1(dt,n1,n2)
join
(
select t1.dt ,t1.num ,t2.num
from #test t1 join #test t2 on t1.dt = t2.dt and t1.num < t2.num ) p2(dt,n1,n2)
on p1.n1 = p2.n1
and p1.n2 = p2.n2
and p1.dt <> p2.dt
group by
p1.n1
,p1.n2
order by
cnt desc
,p1.n1
,p1.n2

select datediff(millisecond,@d,getdate()) as execMillisecondsSQL1
GO

--SQL2
print 'SQL2'

declare @d datetime; set @d = getdate()

select top 15 with ties
common.n1
,common.n2
,cast((1 + sqrt(1 + 8*(common.cnt-1))) / 2 as int) as cnt
from
(
select p1.n1
,p1.n2
,count(*)
from (
select t1.dt ,t1.num ,t2.num
from #test t1 join #test t2 on t1.dt = t2.dt and t1.num < t2.num ) p1(dt,n1,n2)
left join
(
select t1.dt ,t1.num ,t2.num
from #test t1 join #test t2 on t1.dt = t2.dt and t1.num < t2.num ) p2(dt,n1,n2)
on p1.n1 = p2.n1
and p1.n2 = p2.n2
and p1.dt < p2.dt
group by
p1.n1
,p1.n2
) as common(n1,n2,cnt)
order by
common.cnt desc
,common.n1
,common.n2

select datediff(millisecond,@d,getdate()) as execMillisecondsSQL2


-- this is some of the testdata
--select top 20 * from #test

-- clean up
drop table #test

/*******************************************************************************************
generating 365*7 testseries...
SQL1
n1 n2 cnt
---- ---- -----------
6 43 17
8 10 17
10 44 17
13 22 16
1 13 15
4 9 15
8 28 15
10 35 15
19 34 15
23 44 15
25 43 15
32 44 15
43 45 15
1 31 14
1 36 14

execMillisecondsSQL1
--------------------
2330

SQL2
n1 n2 cnt
---- ---- -----------
6 43 17
8 10 17
10 44 17
13 22 16
1 13 15
4 9 15
8 28 15
10 35 15
19 34 15
23 44 15
25 43 15
32 44 15
43 45 15
1 31 14
1 36 14

execMillisecondsSQL2
--------------------
110
*********************************************************************************************/



rockmoose
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-12-04 : 04:06:07
Good speed! But still suffers from same TOP 15 shortcomings as I did.
select	
common.n1
,common.n2
,cast((1 + sqrt(1 + 8*(common.cnt-1))) / 2 as int) as cnt
from
(
select top 15 with ties p1.n1
,p1.n2
,count(*)
from (
select t1.dt ,t1.num ,t2.num
from @test t1 join @test t2 on t1.dt = t2.dt and t1.num < t2.num ) p1(dt,n1,n2)
left join
(
select t1.dt ,t1.num ,t2.num
from @test t1 join @test t2 on t1.dt = t2.dt and t1.num < t2.num ) p2(dt,n1,n2)
on p1.n1 = p2.n1
and p1.n2 = p2.n2
and p1.dt < p2.dt
group by
p1.n1
,p1.n2
order by
3 desc
) as common(n1,n2,cnt)
order by
common.cnt desc
,common.n1
,common.n2

Peter Larsson
Helsingborg, Sweden
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-12-04 : 09:26:26
A matter of syntax really, for the TOP X problem.

The interesting thing I thought was the inequality comparison on the date:
With:
p1.dt < p2.dt or p1.dt <> p2.dt for that matter, then the ones with count = 1 are filtered away.

But with p1.dt <= p2.dt it works, but it was not as efficient as:
p1.dt < p2.dt and use a left join!

The join type matters when comparing larger amounts of data like this, merge seems the best and can be hinted,
as sql server seemed to prefer the hash join in the example I set up. (but with left join it used merge).

rockmoose
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-12-04 : 09:36:38
On a faster server (SQL2005) than the one I have at home:

PS.
On SQL2000 this same calculation is about 4 times as slow!

generating 6*365*7 testseries...
SQL1
n1 n2 cnt
---- ---- -----------
16 18 69
18 33 68
10 20 66
18 20 66
9 18 65
10 26 65
13 25 65
7 19 64
9 30 64
20 30 64
2 18 62
6 28 62
18 25 62
24 42 62
26 30 62

execMillisecondsSQL1
--------------------
13266

SQL2
n1 n2 cnt
---- ---- -----------
16 18 69
18 33 68
10 20 66
18 20 66
9 18 65
10 26 65
13 25 65
7 19 64
9 30 64
20 30 64
2 18 62
6 28 62
18 25 62
24 42 62
26 30 62

execMillisecondsSQL2
--------------------
343
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-12-04 : 10:04:32
I agree. All pair of numbers has at least 1 count. No need to calculate them. I am trying to understand why SQL2 is faster than SQL1, since there are 4 table scans in each.


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-12-04 : 12:24:15
SQL1 compares all <> which means (n-1)*n comparisons and a subsequently higher amount of rows scanned.
SQL2 compares all < which means (n-1)*n/2 comparisons (minus a distinct aggregate)


rockmoose
Go to Top of Page

scripter
Starting Member

49 Posts

Posted - 2006-12-04 : 14:28:44
Ok I was busy and was not able to test out the last few postings to check how it comes out on my end which I am about to do here shortly. I do on the other hand have a couple questions regarding this,,,,, Ok actually one question
What is SQL1 and SQL2?
are you referring to the differences between 2000 and 2005 server?
or is it an actual difference in the language and can be used on both versions of the server?

as you can tell by that question I know just enough to be dangerous. My back ground has been mainly in Classic ASP but have recently discovered that many companies out there do not want to spend money on a Web Developer as well as a DBA so they are looking for a combination of the two. Hence why I am coming up with these little projects for myself. Projects have always been my easiest way of learning.
Go to Top of Page

scripter
Starting Member

49 Posts

Posted - 2006-12-04 : 14:35:04
Ok so I just answered a part of that question by running the script above in SQL 2005 Express. SQL1 and SQL2 can both be run on it so can both be run in 2000?
Go to Top of Page

scripter
Starting Member

49 Posts

Posted - 2006-12-04 : 14:49:17
By using this script I still get duplicates from the actual database. not sure if I mentioned it in here already but the database is 8040 records in length


print 'SQL1'
declare @d datetime; set @d = getdate()

select top 3 with ties p1.n1
,p1.n2
,count(distinct p1.dt) as cnt
from (
select t1.dt ,t1.num ,t2.num
from test t1 join test t2 on t1.dt = t2.dt and t1.num < t2.num ) p1(dt,n1,n2)
join
(
select t1.dt ,t1.num ,t2.num
from test t1 join test t2 on t1.dt = t2.dt and t1.num < t2.num ) p2(dt,n1,n2)
on p1.n1 = p2.n1
and p1.n2 = p2.n2
and p1.dt <> p2.dt
group by
p1.n1
,p1.n2
order by
cnt desc
,p1.n1
,p1.n2

select datediff(millisecond,@d,getdate()) as execMillisecondsSQL1
GO

--SQL2
print 'SQL2'

declare @d datetime; set @d = getdate()

select top 3 with ties
common.n1
,common.n2
,cast((1 + sqrt(1 + 8*(common.cnt-1))) / 2 as int) as cnt
from
(
select p1.n1
,p1.n2
,count(*)
from (
select t1.dt ,t1.num ,t2.num
from test t1 join test t2 on t1.dt = t2.dt and t1.num < t2.num ) p1(dt,n1,n2)
left join
(
select t1.dt ,t1.num ,t2.num
from test t1 join test t2 on t1.dt = t2.dt and t1.num < t2.num ) p2(dt,n1,n2)
on p1.n1 = p2.n1
and p1.n2 = p2.n2
and p1.dt < p2.dt
group by
p1.n1
,p1.n2
) as common(n1,n2,cnt)
order by
common.cnt desc
,common.n1
,common.n2

select datediff(millisecond,@d,getdate()) as execMillisecondsSQL2


Oh and the time difference between SQL1 and SQL2 are 2826 for SQL1 and 266 for SQL2 obvious big difference


the returned records look like this

n1 n2 cnt
12 44 29
15 40 29
7 15 27

as you can see the number 15 has shown up twice in here is there anyway of preventing a number from showing twice?
Go to Top of Page

scripter
Starting Member

49 Posts

Posted - 2006-12-04 : 15:01:22
oh and I went ahead and looked up the SQL1 SQL2 question and here is a bit of what I found out.
I am a "Moron" ok for real though reason as to why I was not aware besides not being much of a Database junkie is due to the how long it has been since SQL1 has been out of standard

Now although I have learned majority of my knowledge through other people and well really have no idea how to actually spot the difference between the two versions. do you by any chance have any good sites or what not that gives good direction on that. I of course would perfer all my scripts to run as fast as possible.


P.S. I wasnt trying to get off the main topic on this one but you brought up some good info that I couldnt let go away with out questioning
Go to Top of Page

scripter
Starting Member

49 Posts

Posted - 2006-12-04 : 15:01:58
http://www2.yk.psu.edu/~lxn/IST_210/sql1_versus_sql2.html

Oh yeah and here is the site that I found that gave me a brief knowledge of the difference
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-12-04 : 16:31:39
About SQL1 and SQL2 I was just referring to 2 alternative queries,
and I was a bit intrigued by the performance differences, that's why I posted them both.

They both work on MS sql2000 or MS sql2005.

About your question of removing the duplicate number (15) in:
n1 n2 cnt
12 44 29
15 40 29
7 15 27

It's kind of complicated, and also I think it is not very clear.

Suppose the result looks like this:
n1 n2 cnt
12 44 29
15 40 27
7 15 27


Now, which row with 15 should be removed?

In a way I think you are creating a badly defined requirement.
First we are interested in the most common pairs.
Then only in the pairs that don't have numbers in common, so how do we rank the worth of the numbers in one pair more than the other?


rockmoose
Go to Top of Page
    Next Page

- Advertisement -