Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
scripter
Starting Member
49 Posts |
Posted - 2006-12-03 : 01:13:23
|
The data table looks like thisdate num1 num2 num3 num4 num5 num61/1/06 1 45 32 21 15 48SQL script 1 should be able to do the followingshow the Top 3 number pairsanother words if the pair 45 and 15 shows up 98 times and 32 and 12 shows up 3 times then I want to have 45 and 15 show as one of the top 3 number pairs I know this is a difficult oneSo that being said see if I can re-explain it hereI have say two dates 1/1/2006 and 1/2/2006on the first date I have six numbers being1,24,45,35,47,14and the second date I have24,35,9,15,46,17Now I can if it is needed to re do the data structure to reflect it like thisdate num1/1/2006 11/1/2006 241/1/2006 451/1/2006 351/1/2006 471/1/2006 141/2/2006 241/2/2006 351/2/2006 91/2/2006 151/2/2006 461/2/2006 17Now somehow I am not sure how but I need to take the six numbers from each date and compare them with every other date.so another words from date one take 1 and 24 and see if that combo matches other dates then go to 1 and 45 and do the same and so on. |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-12-03 : 05:25:51
|
Use second table layout. Always try to normalize your data.Now, use this code-- prepare test datadeclare @test table (dt datetime, num tinyint)insert @testselect '1/1/2006', 1 union allselect '1/1/2006', 24 union allselect '1/1/2006', 45 union allselect '1/1/2006', 35 union allselect '1/1/2006', 47 union allselect '1/1/2006', 14 union allselect '1/2/2006', 24 union allselect '1/2/2006', 35 union allselect '1/2/2006', 9 union allselect '1/2/2006', 15 union allselect '1/2/2006', 46 union allselect '1/2/2006', 17-- do the magicselect top 3 with ties t1.num, t2.num, count(*) cntfrom @test t1inner join @test t2 on t2.dt = t1.dt AND t2.num > t1.numinner join @test t3 on t3.dt <> t1.dt and t3.num = t1.numinner join @test t4 on t4.dt = t3.dt and t4.num = t2.numgroup by t1.num, t2.numorder by 3 desc, 1, 2 Peter LarssonHelsingborg, Sweden |
|
|
scripter
Starting Member
49 Posts |
Posted - 2006-12-03 : 09:49:35
|
Ok I also posted this same thing in a oracle forum (I know PL SQL is different) I am using SQL server 2005 can someone tell me how to convert this so it works with SQL 2005 I think this person is on the right track with itSELECT a.num1, b.num2, count(*) FROM (SELECT rownum num1 FROM all_objects WHERE rownum <= <<MAX VALUE>>) a, (SELECT rownum num2 FROM all_objects WHERE rownum <= <<MAX VALUE>>) b, your_table c WHERE a.num1 != b.num2 AND (a.num1 = c.num1 OR a.num1 = c.num2 OR a.num1 = c.num3 OR a.num1 = c.num4 OR a.num1 = c.num5 OR a.num1 = c.num6) AND (b.num2 = c.num1 OR b.num2 = c.num2 OR b.num2 = c.num3 OR b.num2 = c.num4 OR b.num2 = c.num5 OR b.num2 = c.num6) GROUP BY a.num1, b.num2 |
|
|
scripter
Starting Member
49 Posts |
Posted - 2006-12-03 : 09:55:33
|
Oh Peso I think I just realized what you were attempting to do and I am testing it out right now |
|
|
Jeff Moden
Aged Yak Warrior
652 Posts |
Posted - 2006-12-03 : 10:02:40
|
Something's wrong... 24,35 should show 6 times, not 3 and 24,45 should show twice but does not show at all. There's a bunch more that should also show up...-- prepare test datadeclare @test table (dt datetime, num tinyint)insert @testselect '1/1/2006', 1 union allselect '1/1/2006', 24 union allselect '1/1/2006', 45 union allselect '1/1/2006', 35 union allselect '1/1/2006', 47 union allselect '1/1/2006', 14 union allselect '1/2/2006', 24 union allselect '1/2/2006', 35 union allselect '1/2/2006', 9 union allselect '1/2/2006', 15 union allselect '1/2/2006', 46 union allselect '1/2/2006', 47 union allselect '1/3/2006', 1 union allselect '1/3/2006', 24 union allselect '1/3/2006', 45 union allselect '1/3/2006', 35 union allselect '1/3/2006', 46 union allselect '1/3/2006', 14-- do the magicselect top 3 with ties t1.num, t2.num, count(*) cntfrom @test t1inner join @test t2 on t2.dt = t1.dt AND t2.num > t1.numinner join @test t3 on t3.dt <> t1.dt and t3.num = t1.numinner join @test t4 on t4.dt = t3.dt and t4.num = t2.numgroup by t1.num, t2.numorder by 3 desc, 1, 2--Jeff Moden |
|
|
scripter
Starting Member
49 Posts |
Posted - 2006-12-03 : 10:40:18
|
Ok we have some issues here.first to note the issue I ran across with Peso's which by the way so far you are the only one in all of the forums I have posted this in to show that you are atleast on the right track with this.anyhow when I ran Pesos script with the database I have which has 8040 records in it I get the following as a returnNum Num cnt 12 44 81215 40 8127 15 702as you can see the number 15 has shown up twice in this which would defeat this purpose so is there any way to not have any duplicates in the return so in other words if 7 and 15 show up 702 times but the number 15 has already been associated to 40 can you get it to go with the next most common number pair?Also when i ran Jeff Moden script which from what I can tell is the same one with additional data I get something else that is a bit disturbingNum Num Cnt24 35 61 14 21 24 2Reason this is disturbing is well you just take a look at his data there are only 3 dates how could there possibly be 6 pairs of anything? if there is 3 dates the most amount of times 24 and 35 could be paired up is 3 times which if you look at his data you will see that they are paired up exactly 3 times. This make me believe that the script is not actually using the date as a comparison (not sure if I explained the last part about the date correctly if you feel confused about the last part then skip it and pay attention to how many times they are paired) |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-12-03 : 13:24:27
|
quote: Originally posted by Jeff Moden Something's wrong... 24,35 should show 3 times, not 6 and 24,45 should show twice but does not show at all. There's a bunch more that should also show up...
Well... The COUNT(*) is only for doing some basic calculation, not presenting the actual count. That was not a requirement in OP.Exchange COUNT(*) with COUNT(distinct t3.dt) if that is an issue.24 and 45 is not showing because the order by for the first and second column. Remove 1, 2 from the order by and you're set.Full new code here.-- prepare test datadeclare @test table (dt datetime, num tinyint)insert @testselect '1/1/2006', 1 union allselect '1/1/2006', 24 union allselect '1/1/2006', 45 union allselect '1/1/2006', 35 union allselect '1/1/2006', 47 union allselect '1/1/2006', 14 union allselect '1/2/2006', 24 union allselect '1/2/2006', 35 union allselect '1/2/2006', 9 union allselect '1/2/2006', 15 union allselect '1/2/2006', 46 union allselect '1/2/2006', 47 union allselect '1/3/2006', 1 union allselect '1/3/2006', 24 union allselect '1/3/2006', 45 union allselect '1/3/2006', 35 union allselect '1/3/2006', 46 union allselect '1/3/2006', 14-- do the magicselect num1, num2, cntfrom ( select top 3 with ties t1.num num1, t2.num num2, count(distinct t3.dt) cnt from @test t1 inner join @test t2 on t2.dt = t1.dt AND t2.num > t1.num inner join @test t3 on t3.dt <> t1.dt and t3.num = t1.num inner join @test t4 on t4.dt = t3.dt and t4.num = t2.num group by t1.num, t2.num order by 3 desc )q order by 3 desc, 1, 2 Peter LarssonHelsingborg, Sweden |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-12-03 : 13:27:54
|
Or simplyselect top 3 with ties t1.num num1, t2.num num2, count(distinct t3.dt) cntfrom @test t1inner join @test t2 on t2.dt = t1.dt AND t2.num > t1.numinner join @test t3 on t3.dt <> t1.dt and t3.num = t1.numinner join @test t4 on t4.dt = t3.dt and t4.num = t2.numgroup by t1.num, t2.numorder by 3 desc if the order of num1 and num2 doesn't matter...Peter LarssonHelsingborg, Sweden |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-12-03 : 13:50:23
|
quote: Originally posted by Peso Or simplyselect top 3 with ties t1.num num1, t2.num num2, count(*) cntfrom @test t1inner join @test t2 on t2.dt = t1.dt AND t2.num > t1.numinner join @test t3 on t3.dt > t1.dt and t3.num = t1.numinner join @test t4 on t4.dt = t3.dt and t4.num = t2.numgroup by t1.num, t2.numorder by 3 desc if the order of num1 and num2 doesn't matter...Peter LarssonHelsingborg, Sweden
|
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-12-03 : 16:42:33
|
I think we can speed it up...The SQL1 is basically a rewrite of Peso's,and yes he was right about the distinct, we need something to not double count permutations.I nicked the reverse triangular # function from: http://search.cpan.org/~dpchrist/Math-TriangularNumbers-r0_03/lib/Math/TriangularNumbers.pm-- prepare test data-- generate 365 testseriesset nocount onprint 'generating 365*7 testseries...'create table #test( dt datetime ,num tinyint ,primary key(dt,num))godeclare @dt datetime; set @dt = '20061101'declare @num tinyint; declare @cnt tinyint; set @cnt = 1while 1=1begin if @cnt%8 <> 0 begin set @num = 1+cast(45*rand() as tinyint) insert #test select @dt, @num where not exists(select * from #test where dt = @dt and num = @num) set @cnt = @cnt + @@rowcount end else begin set @cnt = 1 set @dt = @dt +1 end if @dt = '20071101' breakendgo-- SQL1print 'SQL1'declare @d datetime; set @d = getdate()select top 15 with ties p1.n1 ,p1.n2 ,count(distinct p1.dt) as cntfrom ( select t1.dt ,t1.num ,t2.num from #test t1 join #test t2 on t1.dt = t2.dt and t1.num < t2.num ) p1(dt,n1,n2) join ( select t1.dt ,t1.num ,t2.num from #test t1 join #test t2 on t1.dt = t2.dt and t1.num < t2.num ) p2(dt,n1,n2) on p1.n1 = p2.n1 and p1.n2 = p2.n2 and p1.dt <> p2.dtgroup by p1.n1 ,p1.n2order by cnt desc ,p1.n1 ,p1.n2select datediff(millisecond,@d,getdate()) as execMillisecondsSQL1GO--SQL2print 'SQL2'declare @d datetime; set @d = getdate()select top 15 with ties common.n1 ,common.n2 ,cast((1 + sqrt(1 + 8*(common.cnt-1))) / 2 as int) as cntfrom( select p1.n1 ,p1.n2 ,count(*) from ( select t1.dt ,t1.num ,t2.num from #test t1 join #test t2 on t1.dt = t2.dt and t1.num < t2.num ) p1(dt,n1,n2) left join ( select t1.dt ,t1.num ,t2.num from #test t1 join #test t2 on t1.dt = t2.dt and t1.num < t2.num ) p2(dt,n1,n2) on p1.n1 = p2.n1 and p1.n2 = p2.n2 and p1.dt < p2.dt group by p1.n1 ,p1.n2) as common(n1,n2,cnt)order by common.cnt desc ,common.n1 ,common.n2select datediff(millisecond,@d,getdate()) as execMillisecondsSQL2-- this is some of the testdata--select top 20 * from #test-- clean updrop table #test/*******************************************************************************************generating 365*7 testseries...SQL1n1 n2 cnt ---- ---- ----------- 6 43 178 10 1710 44 1713 22 161 13 154 9 158 28 1510 35 1519 34 1523 44 1525 43 1532 44 1543 45 151 31 141 36 14execMillisecondsSQL1 -------------------- 2330SQL2n1 n2 cnt ---- ---- ----------- 6 43 178 10 1710 44 1713 22 161 13 154 9 158 28 1510 35 1519 34 1523 44 1525 43 1532 44 1543 45 151 31 141 36 14execMillisecondsSQL2 -------------------- 110*********************************************************************************************/ rockmoose |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-12-04 : 04:06:07
|
Good speed! But still suffers from same TOP 15 shortcomings as I did.select common.n1 ,common.n2 ,cast((1 + sqrt(1 + 8*(common.cnt-1))) / 2 as int) as cntfrom( select top 15 with ties p1.n1 ,p1.n2 ,count(*) from ( select t1.dt ,t1.num ,t2.num from @test t1 join @test t2 on t1.dt = t2.dt and t1.num < t2.num ) p1(dt,n1,n2) left join ( select t1.dt ,t1.num ,t2.num from @test t1 join @test t2 on t1.dt = t2.dt and t1.num < t2.num ) p2(dt,n1,n2) on p1.n1 = p2.n1 and p1.n2 = p2.n2 and p1.dt < p2.dt group by p1.n1 ,p1.n2order by 3 desc) as common(n1,n2,cnt)order by common.cnt desc ,common.n1 ,common.n2 Peter LarssonHelsingborg, Sweden |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-12-04 : 09:26:26
|
A matter of syntax really, for the TOP X problem.The interesting thing I thought was the inequality comparison on the date:With:p1.dt < p2.dt or p1.dt <> p2.dt for that matter, then the ones with count = 1 are filtered away.But with p1.dt <= p2.dt it works, but it was not as efficient as:p1.dt < p2.dt and use a left join!The join type matters when comparing larger amounts of data like this, merge seems the best and can be hinted,as sql server seemed to prefer the hash join in the example I set up. (but with left join it used merge).rockmoose |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-12-04 : 09:36:38
|
On a faster server (SQL2005) than the one I have at home:PS.On SQL2000 this same calculation is about 4 times as slow!generating 6*365*7 testseries...SQL1n1 n2 cnt---- ---- -----------16 18 6918 33 6810 20 6618 20 669 18 6510 26 6513 25 657 19 649 30 6420 30 642 18 626 28 6218 25 6224 42 6226 30 62execMillisecondsSQL1--------------------13266SQL2n1 n2 cnt---- ---- -----------16 18 6918 33 6810 20 6618 20 669 18 6510 26 6513 25 657 19 649 30 6420 30 642 18 626 28 6218 25 6224 42 6226 30 62execMillisecondsSQL2--------------------343 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-12-04 : 10:04:32
|
I agree. All pair of numbers has at least 1 count. No need to calculate them. I am trying to understand why SQL2 is faster than SQL1, since there are 4 table scans in each.Peter LarssonHelsingborg, Sweden |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-12-04 : 12:24:15
|
SQL1 compares all <> which means (n-1)*n comparisons and a subsequently higher amount of rows scanned.SQL2 compares all < which means (n-1)*n/2 comparisons (minus a distinct aggregate)rockmoose |
|
|
scripter
Starting Member
49 Posts |
Posted - 2006-12-04 : 14:28:44
|
Ok I was busy and was not able to test out the last few postings to check how it comes out on my end which I am about to do here shortly. I do on the other hand have a couple questions regarding this,,,,, Ok actually one question What is SQL1 and SQL2?are you referring to the differences between 2000 and 2005 server?or is it an actual difference in the language and can be used on both versions of the server?as you can tell by that question I know just enough to be dangerous. My back ground has been mainly in Classic ASP but have recently discovered that many companies out there do not want to spend money on a Web Developer as well as a DBA so they are looking for a combination of the two. Hence why I am coming up with these little projects for myself. Projects have always been my easiest way of learning. |
|
|
scripter
Starting Member
49 Posts |
Posted - 2006-12-04 : 14:35:04
|
Ok so I just answered a part of that question by running the script above in SQL 2005 Express. SQL1 and SQL2 can both be run on it so can both be run in 2000? |
|
|
scripter
Starting Member
49 Posts |
Posted - 2006-12-04 : 14:49:17
|
By using this script I still get duplicates from the actual database. not sure if I mentioned it in here already but the database is 8040 records in length print 'SQL1'declare @d datetime; set @d = getdate()select top 3 with ties p1.n1 ,p1.n2 ,count(distinct p1.dt) as cntfrom ( select t1.dt ,t1.num ,t2.num from test t1 join test t2 on t1.dt = t2.dt and t1.num < t2.num ) p1(dt,n1,n2) join ( select t1.dt ,t1.num ,t2.num from test t1 join test t2 on t1.dt = t2.dt and t1.num < t2.num ) p2(dt,n1,n2) on p1.n1 = p2.n1 and p1.n2 = p2.n2 and p1.dt <> p2.dtgroup by p1.n1 ,p1.n2order by cnt desc ,p1.n1 ,p1.n2select datediff(millisecond,@d,getdate()) as execMillisecondsSQL1GO--SQL2print 'SQL2'declare @d datetime; set @d = getdate()select top 3 with ties common.n1 ,common.n2 ,cast((1 + sqrt(1 + 8*(common.cnt-1))) / 2 as int) as cntfrom( select p1.n1 ,p1.n2 ,count(*) from ( select t1.dt ,t1.num ,t2.num from test t1 join test t2 on t1.dt = t2.dt and t1.num < t2.num ) p1(dt,n1,n2) left join ( select t1.dt ,t1.num ,t2.num from test t1 join test t2 on t1.dt = t2.dt and t1.num < t2.num ) p2(dt,n1,n2) on p1.n1 = p2.n1 and p1.n2 = p2.n2 and p1.dt < p2.dt group by p1.n1 ,p1.n2) as common(n1,n2,cnt)order by common.cnt desc ,common.n1 ,common.n2select datediff(millisecond,@d,getdate()) as execMillisecondsSQL2Oh and the time difference between SQL1 and SQL2 are 2826 for SQL1 and 266 for SQL2 obvious big differencethe returned records look like thisn1 n2 cnt12 44 2915 40 297 15 27as you can see the number 15 has shown up twice in here is there anyway of preventing a number from showing twice? |
|
|
scripter
Starting Member
49 Posts |
Posted - 2006-12-04 : 15:01:22
|
oh and I went ahead and looked up the SQL1 SQL2 question and here is a bit of what I found out. I am a "Moron" ok for real though reason as to why I was not aware besides not being much of a Database junkie is due to the how long it has been since SQL1 has been out of standardNow although I have learned majority of my knowledge through other people and well really have no idea how to actually spot the difference between the two versions. do you by any chance have any good sites or what not that gives good direction on that. I of course would perfer all my scripts to run as fast as possible.P.S. I wasnt trying to get off the main topic on this one but you brought up some good info that I couldnt let go away with out questioning |
|
|
scripter
Starting Member
49 Posts |
Posted - 2006-12-04 : 15:01:58
|
http://www2.yk.psu.edu/~lxn/IST_210/sql1_versus_sql2.htmlOh yeah and here is the site that I found that gave me a brief knowledge of the difference |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-12-04 : 16:31:39
|
About SQL1 and SQL2 I was just referring to 2 alternative queries,and I was a bit intrigued by the performance differences, that's why I posted them both.They both work on MS sql2000 or MS sql2005.About your question of removing the duplicate number (15) in:n1 n2 cnt12 44 2915 40 297 15 27 It's kind of complicated, and also I think it is not very clear.Suppose the result looks like this:n1 n2 cnt12 44 2915 40 277 15 27 Now, which row with 15 should be removed?In a way I think you are creating a badly defined requirement.First we are interested in the most common pairs.Then only in the pairs that don't have numbers in common, so how do we rank the worth of the numbers in one pair more than the other?rockmoose |
|
|
Next Page
|
|
|
|
|