Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
rme8494
Yak Posting Veteran
98 Posts |
Posted - 2002-11-07 : 21:21:40
|
Hi Everyone, Here is the situation, I have two tables, a report and reportcat table.ReportCat TableCat_IDCatNamePostdateReport TableReport_IDCat_IDFileLocationPostDateI'm doing an inner join on the Cat_ID on a query to output them on a web page. This query works great to out put ALL the reports in the database, however I would like to do something different. I would like the user to be able select how many reports they would like to see output in each category. So say I have 20 reports total, and 4 categories. Each category has 5 reports in it. How do I only return only 2 reports from each category? If I do a TOP 8 in the query it wouldn't give me 2 from each category just 8 random records. Make sense? Any ideas?RyanRyan EverhartSBC Ameritech |
|
simondeutsch
Aged Yak Warrior
547 Posts |
Posted - 2002-11-07 : 21:44:27
|
Could definitely do it with UNION. But there should be a better way.Otherwise:SELECT TOP n * FROM ReportCat INNER JOIN Report ON CAt_Id = cat_id where catname = 'a'UNIONSELECT TOP n * FROM ReportCat INNER JOIN Report ON CAt_Id = cat_id where catname = 'b'etc.Sarah Berger MCSD |
|
|
rme8494
Yak Posting Veteran
98 Posts |
Posted - 2002-11-07 : 22:07:57
|
Sarah, Thanks for your reply. I am looking for something in single query though. I do think the union is feasable as the number of categories will change. Any other ideas?ThanksRyanRyan EverhartSBC Ameritech |
|
|
oitsubob
Yak Posting Veteran
70 Posts |
Posted - 2002-11-07 : 22:20:08
|
Hi Everyone...Ryan and I work together at SBC (a thousand miles apart) and have been bouncing these ideas off one another. Essentially, we're both trying to accomplish the same basic goal but in slightly different applications.Just to give you an idea of what I'm trying to do, here's the text of a post I submitted to a Cold Fusion user group on Macromedia.com:Begin Post=============What I'm trying to do is build a single query that will join an "employees" table and "department" table and return the top X senior employees from each department where X is a value in the limit column of the "department" table.For the sake of this example, let's say the "employee" table has 100 records and 4 columns: First Name, Last Name, Department ID and Hire Date while the "department" table has 5 records 3 columns: Department ID, Department Name and Limit. To make it simple, lets also just assume that the value of the limit column is '2' employee's for each department. When all's said and done, the result set should contain 10 records (2 people for each of the 5 departments).Note: In actuality, we have tens of thousands of employees and several hundred departments. For that reason, it's not feasible or good practice to write a query for each individual department.In all honesty, I don't even know where to start on this. I'm hoping that someone in the community has come across a similar situation and can provide me with some much needed direction.=============End PostIf anyone has any suggestions, both Ryan and I would appreciate it.Thanks in Advance,Bob WestonSBC Pacific Bell |
|
|
simondeutsch
Aged Yak Warrior
547 Posts |
Posted - 2002-11-07 : 22:34:47
|
Check this out:http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=21334see Ilya Zaltsman's post. The situation seems similar to yours, although you will have to change the hardcoded TOP operator, maybe by using dynamic SQL, because otherwise I'm not sure how you can select TOP n of an unknown number. Dynamic sql will slow you down somewhat, though.Sarah Berger MCSD |
|
|
hoagm
Starting Member
1 Post |
Posted - 2002-11-07 : 23:45:20
|
This should work. I'm sure it's longer than you were looking for but it will take into consideration table changes to the Report table as well as the variable number of categories. I've commented liberally... let me know if you have questions.--Create #temp_reports then clear it, this makes sure our temp table is always mirrored after our real tableSELECT TOP 1 * INTO #temp_reports FROM ReportDELETE #temp_reports--variable to hold our current category IDDECLARE @Cat_ID Int--cursor variable so we can step through each cat_id and perform actionsDECLARE Category_Cursor CURSOR FOR SELECT Cat_ID FROM ReportCat--open the cursor and prime itOPEN Category_CursorFETCH NEXT FROM Category_Cursor INTO @Cat_ID--as long as we've not reached the end of the cat_id's, we loopWHILE @@FETCH_STATUS = 0 BEGIN--insert top n reports from cat_id into temp INSERT #temp_reports SELECT TOP n * FROM Report WHERE Cat_ID = @Cat_ID--fetch the next cat_id FETCH NEXT FROM Category_Cursor INTO @Cat_ID END--close your cursor and clear it from memoryCLOSE Category_CursorDEALLOCATE Category_Cursor--select your resultsSELECT * FROM #temp_reportsEdited by - hoagm on 11/07/2002 23:49:44 |
|
|
r937
Posting Yak Master
112 Posts |
Posted - 2002-11-08 : 00:17:08
|
caution: not testedselect D.DepartmentName , E.FirstName , E.LastName , E.HireDate from department Dinner join employee E on D.ID = E.DepartmentID where D.Limit > ( select count(*) from employee where DepartmentID = E.DepartmentID and HireDate > E.HireDate ) order by D.DepartmentName , E.HireDate descending the limit will be exceeded in departments where there are ties across the last placerudy |
|
|
mohamedyousuff@yahoo.com
Starting Member
24 Posts |
Posted - 2002-11-08 : 01:11:28
|
Try this queryselect * from reportcat rc inner join report r on rc.cat_id = r.cat_id where report_id in ( select top 2 report_id from report where cat_id = rc.cat_id ) The solution solves it in a single query as you expected. As u see above, it does not involve any temporary table but it is a correlated query. I have tested it and is working fine. I am thinking of alternatives without correlated query (performance reasons) and without temporary table and with a single query.quote: Hi Everyone, Here is the situation, I have two tables, a report and reportcat table.ReportCat TableCat_IDCatNamePostdateReport TableReport_IDCat_IDFileLocationPostDateI'm doing an inner join on the Cat_ID on a query to output them on a web page. This query works great to out put ALL the reports in the database, however I would like to do something different. I would like the user to be able select how many reports they would like to see output in each category. So say I have 20 reports total, and 4 categories. Each category has 5 reports in it. How do I only return only 2 reports from each category? If I do a TOP 8 in the query it wouldn't give me 2 from each category just 8 random records. Make sense? Any ideas?RyanRyan EverhartSBC Ameritech
|
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-11-08 : 07:49:50
|
This question of top N of a group has come up a few times lately and I have spent some time looking at the various suggestions. I will outline what I have found here.First some sample data to work with ...create table #temp ( someid int, somevc varchar(10) )insert into #tempselect n2.n, left( char(round((rand(convert(binary(4),newid())) * 25),0)+97) + char(round((rand(convert(binary(4),newid())) * 25),0)+97) + char(round((rand(convert(binary(4),newid())) * 25),0)+97) + char(round((rand(convert(binary(4),newid())) * 25),0)+97) + char(round((rand(convert(binary(4),newid())) * 25),0)+97) + char(round((rand(convert(binary(4),newid())) * 25),0)+97) + char(round((rand(convert(binary(4),newid())) * 25),0)+97) + char(round((rand(convert(binary(4),newid())) * 25),0)+97) + char(round((rand(convert(binary(4),newid())) * 25),0)+97) + char(round((rand(convert(binary(4),newid())) * 25),0)+97), floor(rand(convert(binary(4),newid())) * 10)+1)from (select n from toolbox.dbo.numbers where n <= 500) n1 cross join toolbox.dbo.numbers n2where n2.n < 50 This is a two column table. The first column is an int which will represent the groupings and the seconds is a varchar which is the values we want to TOP. The sample data consists of 500 values in each group and 50 groups. (toolbox.dbo.numbers is simply a tally table).The first solution is one I posted a ways back. I don't remember where it came from, but I do remember it was something setbasedisthetruepath and I were working one.select someid, somevcfrom #temp twhere exists ( select 1 from #temp where someid = t.someid and somevc = t.somevc group by someid having ( select count(*) from #temp where someid = t.someid and somevc < t.somevc ) < 3 )order by someid, somevc This query will return the top three values for each group and duplicates will be reported. Meaning, if there are two rows of 1,a then both will be returned. This query presents a subtree cost of 11.5 (very poor) and runs on my laptop with the above data set in around 3.6 seconds.The next is derived from Ilya and the post mentioned above.select someid, somevcfrom #temp twhere somevc in ( select top 3 somevc from #temp where t.someid = someid order by somevc )order by someid, somevc This query returns the exact same rowset as the previous method. Now, while this query looks much simpler, it presents a subtree cost of 467 and takes around 19.6 seconds under the same conditions as above.The final method comes from Arnold Fribble from a post I can't seem to find at the moment. A post dealing with bottom 2 of a group.select someid, somevcfrom ( select someid, case when col = 0 then somevc0 when col = 1 then somevc1 else somevc2 end somevc from ( select t0.someid, min(t0.somevc) as somevc0, min(t1.somevc) as somevc1, min(t2.somevc) as somevc2 from ( select someid, min(somevc) as somevc from #temp group by someid ) as t0 left join #temp t1 on t0.someid = t1.someid and t0.somevc < t1.somevc left join #temp t2 on t1.someid = t2.someid and t1.somevc < t2.somevc group by t0.someid ) as a cross join ( select 0 as col union all select 1 union all select 2) as n ) as awhere somevc is not nullorder by someid, somevc Now Frib's piece here is a bit different than the two above in that duplicates are not shown. So if there are two rows in the sample data of 1,a only one of them will appear in the final rowset. While this query gives more reasonable 16.6 subtree cost, it took about 38.4 seconds to run.So from that, it would appear that my original method is the way to go. Now, that being said, with a smaller data set such as ..select 1,'a'union select 1,'b'union select 1,'c'union select 1,'d'union select 1,'f'union select 1,'g'union select 1,'h'union select 1,'i'union select 2,'a'union select 2,'b'union select 2,'c'union select 2,'d'union select 2,'e'union select 2,'f'union select 2,'g'union select 2,'h'union select 2,'i'union select 2,'j' ..all three perform resonably well.Jay White{0}Edited by - Page47 on 11/08/2002 07:57:27 |
|
|
r937
Posting Yak Master
112 Posts |
Posted - 2002-11-08 : 09:26:18
|
jay, which, if any, of your methods will return the top X of each department, where the department itself specifies how many X to list?not to be obstinate, but i believe that was the original problemrudy |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-11-08 : 09:44:57
|
quote: not to be obstinate, but i believe that was the original problem
Rudy, I was looking at the approach rather than the exact solution. Certainly, you can understand the value of discussing different approaches that might be adapted vs. giving cut and paste code on a forum such as this.The determining TOP X is just a variation on the theme, right? If you can do TOP 3, you can do TOP X. You do a good job of adapting the second approach (Ilya's) I present to this specific problem. For my first approach (the one that I think performs best), it is just as easy to adapt by coding the having clause to be less than a column rather than '3'. Now Frib's approach is a bit more X specific but I think you can do the same thing by limiting the cross joined table 'a' ... I'd have to think about that a bit more.Jay White{0} |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-11-08 : 11:54:19
|
Sorry Jay, you've flattened that rather too much.It should look like this:SELECT someid, somevcFROM ( SELECT someid, CASE col WHEN 0 THEN somevc0 WHEN 1 THEN somevc1 ELSE somevc2 END somevc FROM ( SELECT T1.someid, MIN(somevc0) AS somevc0, MIN(T1.somevc1) AS somevc1, MIN(somevc) AS somevc2 FROM ( SELECT T0.someid, MIN(somevc0) AS somevc0, MIN(somevc) somevc1 FROM ( SELECT someid, MIN(somevc) AS somevc0 FROM xtemp GROUP BY someid ) AS T0 LEFT JOIN xtemp AS T1 ON T0.someid = T1.someid AND T0.somevc0 < T1.somevc GROUP BY T0.someid ) AS T1 LEFT JOIN xtemp AS T2 ON T1.someid = T2.someid AND T1.somevc1 < T2.somevc GROUP BY T1.someid ) T2 CROSS JOIN (SELECT 0 AS col UNION ALL SELECT 1 UNION ALL SELECT 2) n ) aWHERE somevc IS NOT NULLORDER BY someid, somevc But I agree that you have to be really sure that you want 3 values to use this!Edited by - Arnold Fribble on 11/08/2002 12:08:22 |
|
|
ConfusedOfLife
Starting Member
35 Posts |
Posted - 2002-11-08 : 11:58:27
|
Suppose this is our table :someid somevc ----------- ---------- 1 Oopse1 Oopse1 Oopse1 Oopse1 Oopse2 Wow! You said that dublicates will be reported, so, I tried your code ( the first one and the second ) on this table and for the first row ( 1,Oopse) it returns all of them ( 5 rows ), well, it should, but what if we wanted to have only ( and only! ) 3 rows for each group in the results? ( i.e. I do not get 5 rows for this oopse! ) |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-11-08 : 12:02:50
|
Holy crap Frib, that is scary fast!!! I'm glad you chimed in ...quote: You said that dublicates will be reported, so, I tried your code ( the first one and the second ) on this table and for the first row ( 1,Oopse) it returns all of them ( 5 rows ), well, it should, but what if we wanted to have only ( and only! ) 3 rows for each group in the results? ( i.e. I do not get 5 rows for this oopse! )
Then use Fribbles method or modify one of the others so it meets your needs ...Jay White{0}Edited by - Page47 on 11/08/2002 12:08:26 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-11-08 : 12:11:51
|
Sigh. I had to modify it again: it was selecting the wrong someid values -- doesn't matter unless the data doesn't have enough to fill all 3... |
|
|
MuffinMan
Posting Yak Master
107 Posts |
Posted - 2002-11-08 : 12:25:34
|
Ok, how about this? It reports ties, but might not be completely true to the TOP X criteria (see the output below for the TOP 3 from each grouping).set nocount oncreate table #sale (Company varchar(20) , Division varchar(20) , sale int)insert #sale select 'Allstar Shipping' , 'Marketing' , 9000insert #sale select 'Allstar Shipping' , 'IT' , 12000insert #sale select 'Allstar Shipping' , 'Operations' , 9000insert #sale select 'Allstar Shipping' , 'Engineering' , 9000insert #sale select 'Allstar Shipping' , 'Sales' , 7000insert #sale select 'Bombay Allied' , 'North' , 140000insert #sale select 'Bombay Allied' , 'South' , 125000insert #sale select 'Bombay Allied' , 'East' , 86000insert #sale select 'Bombay Allied' , 'West' , 168000insert #sale select 'Cat Walkers, Ltd.' , 'Persian' , 18000insert #sale select 'Cat Walkers, Ltd.' , 'Domestic' , 15000insert #sale select 'Cat Walkers, Ltd.' , 'Fluffy' , 22500declare @RecsPerGroup int , @msg varchar(100)select @RecsPerGroup = 2select @msg = '** Show TOP ' + convert(varchar(10), @RecsPerGroup) + ' records per grouping **'print @msgselect * from ( select top 100 percent * , Record = ( select count(*) + 1 as Cnt from #sale s2 where s2.Company = s.Company and s2.sale > s.sale ) from #sale s order by Company, Sale Desc , Division) awhere record <= @RecsPerGroupprint ''select @RecsPerGroup = 3select @msg = '** Show TOP ' + convert(varchar(10), @RecsPerGroup) + ' records per grouping **'print @msgselect * from ( select top 100 percent * , Record = ( select count(*) + 1 as Cnt from #sale s2 where s2.Company = s.Company and s2.sale > s.sale ) from #sale s order by Company, Sale Desc , Division) awhere record <= @RecsPerGroupprint ''select @RecsPerGroup = 3select @msg = '** Show TOP ' + convert(varchar(10), @RecsPerGroup) + ' records per grouping USING DISTINCT **'print @msgselect * from ( select top 100 percent * , Record = ( select count(Distinct Sale) + 1 as Cnt from #sale s2 where s2.Company = s.Company and s2.sale > s.sale ) from #sale s order by Company, Sale Desc , Division) awhere record <= @RecsPerGroupdrop table #sale/**** Here are the results ***/** Show TOP 2 records per grouping **Company Division sale Record -------------------- -------------------- ----------- ----------- Allstar Shipping IT 12000 1Allstar Shipping Engineering 9000 2Allstar Shipping Marketing 9000 2Allstar Shipping Operations 9000 2Bombay Allied West 168000 1Bombay Allied North 140000 2Cat Walkers, Ltd. Fluffy 22500 1Cat Walkers, Ltd. Persian 18000 2 ** Show TOP 3 records per grouping **Company Division sale Record -------------------- -------------------- ----------- ----------- Allstar Shipping IT 12000 1Allstar Shipping Engineering 9000 2 ==Allstar Shipping Marketing 9000 2 ==> Are We getting the top 3 here?Allstar Shipping Operations 9000 2 ==Bombay Allied West 168000 1Bombay Allied North 140000 2Bombay Allied South 125000 3Cat Walkers, Ltd. Fluffy 22500 1Cat Walkers, Ltd. Persian 18000 2Cat Walkers, Ltd. Domestic 15000 3** Show TOP 3 records per grouping USING DISTINCT **Company Division sale Record -------------------- -------------------- ----------- ----------- Allstar Shipping IT 12000 1Allstar Shipping Engineering 9000 2Allstar Shipping Marketing 9000 2Allstar Shipping Operations 9000 2Allstar Shipping Sales 7000 3 == Now we get 3 distinct values!Bombay Allied West 168000 1Bombay Allied North 140000 2Bombay Allied South 125000 3Cat Walkers, Ltd. Fluffy 22500 1Cat Walkers, Ltd. Persian 18000 2Cat Walkers, Ltd. Domestic 15000 3 Edited by - muffinman on 11/08/2002 12:40:11 |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-11-08 : 12:40:23
|
Nice work Muffin ...select someid, somevcfrom ( select top 100 percent *, (select count(*) + 1 as Cnt from #temp t2 where t2.someid = t.someid and t2.somevc < t.somevc ) as Record from #temp t order by someid, somevc desc ) as awhere record <= 3order by someid, somevc ... a nice 6.16 subtree cost and running just over 3 seconds ...But Frib's re-write is still the champ (by a huge margin) running in 300ms and only a 1.06 cost.Jay White{0} |
|
|
JustinBigelow
SQL Gigolo
1157 Posts |
Posted - 2002-11-08 : 14:11:55
|
quote: But Frib's re-write is still the champ (by a huge margin) running in 300ms and only a 1.06 cost.
Which just goes to show you, always be careful when you quibble with Fribble. JustinHave you hugged your SQL Server today? |
|
|
Lavos
Posting Yak Master
200 Posts |
Posted - 2002-11-09 : 18:57:12
|
Of course, there's always another solution. I hesitate to give it, because it really depends on How the tables are used. Denormalize the table to include your ranking information, or create a rankings table. If rankings are calculated only periodically, and the listing generated often, then that would be the fastest solution over time.I'm finding this very interesting so far. I was working on my own solution for the general case, but I stopped when I couldn't wrap my head around Fribble's query :)Oh yeah, I did have a procedure that dynamically wrote a lot of UNION ALL queries which was very fast, but it quickly exceeds the varchar(8000) datatype when there are a lot of categories/departments.----------------------"O Theos mou! Echo ten labrida en te mou kephale!" |
|
|
r937
Posting Yak Master
112 Posts |
Posted - 2002-11-09 : 19:40:55
|
fribble: "But I agree that you have to be really sure that you want 3 values to use this!"i hate to keep harping on this, and i'd love to hear what the SBC Ameritech guys have to say about this, but each department was to list a different number of top rows -- department A will want the top 2 listed, department B will want the top 6 listed, and so onany luck adapting your 1st query yet, jay?rudy |
|
|
Lavos
Posting Yak Master
200 Posts |
Posted - 2002-11-09 : 23:19:52
|
Well Rudy, I think Jay said that he was more interested in attacking the general approach, opposed to coming up with a complete solution.This query would work, but I don't know how it performs, but if updates to the Employee and Department tables are rare and the query is ran often, then the denormalized approach will still outperform it overall.SELECT E.*FROM Departments D INNER JOIN (SELECT A.EmpID, A.DepID, COUNT(*) AS Rank FROM Employees A INNER JOIN Employees B ON (A.DepID = B.DepID AND A.EmpID <= B.EmpID) GROUP BY A.EmpID, A.DepID ) ER ON (ER.DepID = D.DepID AND ER.Rank <= D.RankLimit) INNER JOIN Employees E ( ON ER.EmpID = E.EmpID) ----------------------"O Theos mou! Echo ten labrida en te mou kephale!" |
|
|
Next Page
|
|
|
|
|