Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 One Query??

Author  Topic 

rme8494
Yak Posting Veteran

98 Posts

Posted - 2002-11-07 : 21:21:40
Hi Everyone,
Here is the situation, I have two tables, a report and reportcat table.

ReportCat Table
Cat_ID
CatName
Postdate

Report Table
Report_ID
Cat_ID
FileLocation
PostDate

I'm doing an inner join on the Cat_ID on a query to output them on a web page. This query works great to out put ALL the reports in the database, however I would like to do something different. I would like the user to be able select how many reports they would like to see output in each category. So say I have 20 reports total, and 4 categories. Each category has 5 reports in it. How do I only return only 2 reports from each category? If I do a TOP 8 in the query it wouldn't give me 2 from each category just 8 random records. Make sense? Any ideas?

Ryan

Ryan Everhart
SBC Ameritech

simondeutsch
Aged Yak Warrior

547 Posts

Posted - 2002-11-07 : 21:44:27
Could definitely do it with UNION. But there should be a better way.
Otherwise:
SELECT TOP n * FROM ReportCat INNER JOIN Report ON CAt_Id = cat_id where catname = 'a'
UNION
SELECT TOP n * FROM ReportCat INNER JOIN Report ON CAt_Id = cat_id where catname = 'b'
etc.

Sarah Berger MCSD
Go to Top of Page

rme8494
Yak Posting Veteran

98 Posts

Posted - 2002-11-07 : 22:07:57
Sarah,
Thanks for your reply. I am looking for something in single query though. I do think the union is feasable as the number of categories will change. Any other ideas?

Thanks
Ryan


Ryan Everhart
SBC Ameritech
Go to Top of Page

oitsubob
Yak Posting Veteran

70 Posts

Posted - 2002-11-07 : 22:20:08
Hi Everyone...
Ryan and I work together at SBC (a thousand miles apart) and have been bouncing these ideas off one another. Essentially, we're both trying to accomplish the same basic goal but in slightly different applications.

Just to give you an idea of what I'm trying to do, here's the text of a post I submitted to a Cold Fusion user group on Macromedia.com:

Begin Post
=============
What I'm trying to do is build a single query that will join an "employees" table and "department" table and return the top X senior employees from each department where X is a value in the limit column of the "department" table.

For the sake of this example, let's say the "employee" table has 100 records and 4 columns: First Name, Last Name, Department ID and Hire Date while the "department" table has 5 records 3 columns: Department ID, Department Name and Limit.

To make it simple, lets also just assume that the value of the limit column is '2' employee's for each department. When all's said and done, the result set should contain 10 records (2 people for each of the 5 departments).

Note: In actuality, we have tens of thousands of employees and several hundred departments. For that reason, it's not feasible or good practice to write a query for each individual department.

In all honesty, I don't even know where to start on this. I'm hoping that someone in the community has come across a similar situation and can provide me with some much needed direction.

=============
End Post

If anyone has any suggestions, both Ryan and I would appreciate it.

Thanks in Advance,

Bob Weston
SBC Pacific Bell

Go to Top of Page

simondeutsch
Aged Yak Warrior

547 Posts

Posted - 2002-11-07 : 22:34:47
Check this out:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=21334
see Ilya Zaltsman's post. The situation seems similar to yours, although you will have to change the hardcoded TOP operator, maybe by using dynamic SQL, because otherwise I'm not sure how you can select TOP n of an unknown number. Dynamic sql will slow you down somewhat, though.

Sarah Berger MCSD
Go to Top of Page

hoagm
Starting Member

1 Post

Posted - 2002-11-07 : 23:45:20
This should work. I'm sure it's longer than you were looking for but it will take into consideration table changes to the Report table as well as the variable number of categories. I've commented liberally... let me know if you have questions.



--Create #temp_reports then clear it, this makes sure our temp table is always mirrored after our real table
SELECT TOP 1 *
INTO #temp_reports
FROM Report

DELETE #temp_reports


--variable to hold our current category ID
DECLARE @Cat_ID Int

--cursor variable so we can step through each cat_id and perform actions
DECLARE Category_Cursor CURSOR FOR
SELECT Cat_ID FROM ReportCat

--open the cursor and prime it
OPEN Category_Cursor
FETCH NEXT FROM Category_Cursor
INTO @Cat_ID

--as long as we've not reached the end of the cat_id's, we loop
WHILE @@FETCH_STATUS = 0
BEGIN

--insert top n reports from cat_id into temp
INSERT #temp_reports
SELECT TOP n *
FROM Report
WHERE Cat_ID = @Cat_ID

--fetch the next cat_id
FETCH NEXT FROM Category_Cursor
INTO @Cat_ID
END

--close your cursor and clear it from memory
CLOSE Category_Cursor
DEALLOCATE Category_Cursor

--select your results
SELECT * FROM #temp_reports

Edited by - hoagm on 11/07/2002 23:49:44
Go to Top of Page

r937
Posting Yak Master

112 Posts

Posted - 2002-11-08 : 00:17:08
caution: not tested

select D.DepartmentName
, E.FirstName
, E.LastName
, E.HireDate
from department D
inner
join employee E
on D.ID = E.DepartmentID
where D.Limit >
( select count(*)
from employee
where DepartmentID = E.DepartmentID
and HireDate > E.HireDate )
order
by D.DepartmentName
, E.HireDate descending


the limit will be exceeded in departments where there are ties across the last place

rudy
Go to Top of Page

mohamedyousuff@yahoo.com
Starting Member

24 Posts

Posted - 2002-11-08 : 01:11:28


Try this query

select * from reportcat rc inner join report r on rc.cat_id = r.cat_id where
report_id in ( select top 2 report_id from report where cat_id = rc.cat_id )


The solution solves it in a single query as you expected. As u see above, it does not involve any temporary table but it is a correlated query. I have tested it and is working fine. I am thinking of alternatives without correlated query (performance reasons) and without temporary table and with a single query.




quote:

Hi Everyone,
Here is the situation, I have two tables, a report and reportcat table.

ReportCat Table
Cat_ID
CatName
Postdate

Report Table
Report_ID
Cat_ID
FileLocation
PostDate

I'm doing an inner join on the Cat_ID on a query to output them on a web page. This query works great to out put ALL the reports in the database, however I would like to do something different. I would like the user to be able select how many reports they would like to see output in each category. So say I have 20 reports total, and 4 categories. Each category has 5 reports in it. How do I only return only 2 reports from each category? If I do a TOP 8 in the query it wouldn't give me 2 from each category just 8 random records. Make sense? Any ideas?

Ryan

Ryan Everhart
SBC Ameritech



Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-11-08 : 07:49:50
This question of top N of a group has come up a few times lately and I have spent some time looking at the various suggestions. I will outline what I have found here.

First some sample data to work with ...
create table #temp (
someid int,
somevc varchar(10) )

insert into #temp
select
n2.n,
left(
char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
char(round((rand(convert(binary(4),newid())) * 25),0)+97),
floor(rand(convert(binary(4),newid())) * 10)+1)
from
(select n from toolbox.dbo.numbers where n <= 500) n1
cross join toolbox.dbo.numbers n2
where
n2.n < 50

 
This is a two column table. The first column is an int which will represent the groupings and the seconds is a varchar which is the values we want to TOP. The sample data consists of 500 values in each group and 50 groups. (toolbox.dbo.numbers is simply a tally table).

The first solution is one I posted a ways back. I don't remember where it came from, but I do remember it was something setbasedisthetruepath and I were working one.


select
someid,
somevc
from
#temp t
where
exists (
select 1
from
#temp
where
someid = t.someid and
somevc = t.somevc
group by
someid
having (
select
count(*)
from
#temp
where
someid = t.someid and
somevc < t.somevc ) < 3 )
order by someid, somevc

 
This query will return the top three values for each group and duplicates will be reported. Meaning, if there are two rows of 1,a then both will be returned. This query presents a subtree cost of 11.5 (very poor) and runs on my laptop with the above data set in around 3.6 seconds.

The next is derived from Ilya and the post mentioned above.


select
someid,
somevc
from
#temp t
where
somevc in (
select top 3
somevc
from
#temp
where
t.someid = someid
order by somevc )
order by someid, somevc

 
This query returns the exact same rowset as the previous method. Now, while this query looks much simpler, it presents a subtree cost of 467 and takes around 19.6 seconds under the same conditions as above.

The final method comes from Arnold Fribble from a post I can't seem to find at the moment. A post dealing with bottom 2 of a group.

select
someid,
somevc
from (
select
someid,
case
when col = 0 then somevc0
when col = 1 then somevc1
else somevc2
end somevc
from (
select
t0.someid,
min(t0.somevc) as somevc0,
min(t1.somevc) as somevc1,
min(t2.somevc) as somevc2
from (
select
someid,
min(somevc) as somevc
from
#temp
group by
someid ) as t0
left join #temp t1
on t0.someid = t1.someid and
t0.somevc < t1.somevc
left join #temp t2
on t1.someid = t2.someid and
t1.somevc < t2.somevc
group by
t0.someid ) as a
cross join (
select 0 as col
union all
select 1
union all
select 2) as n ) as a
where
somevc is not null
order by someid, somevc

 
Now Frib's piece here is a bit different than the two above in that duplicates are not shown. So if there are two rows in the sample data of 1,a only one of them will appear in the final rowset. While this query gives more reasonable 16.6 subtree cost, it took about 38.4 seconds to run.

So from that, it would appear that my original method is the way to go. Now, that being said, with a smaller data set such as ..

select 1,'a'
union select 1,'b'
union select 1,'c'
union select 1,'d'
union select 1,'f'
union select 1,'g'
union select 1,'h'
union select 1,'i'
union select 2,'a'
union select 2,'b'
union select 2,'c'
union select 2,'d'
union select 2,'e'
union select 2,'f'
union select 2,'g'
union select 2,'h'
union select 2,'i'
union select 2,'j'

 
..all three perform resonably well.

Jay White
{0}

Edited by - Page47 on 11/08/2002 07:57:27
Go to Top of Page

r937
Posting Yak Master

112 Posts

Posted - 2002-11-08 : 09:26:18
jay, which, if any, of your methods will return the top X of each department, where the department itself specifies how many X to list?

not to be obstinate, but i believe that was the original problem

rudy
Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-11-08 : 09:44:57
quote:
not to be obstinate, but i believe that was the original problem


Rudy, I was looking at the approach rather than the exact solution. Certainly, you can understand the value of discussing different approaches that might be adapted vs. giving cut and paste code on a forum such as this.

The determining TOP X is just a variation on the theme, right? If you can do TOP 3, you can do TOP X. You do a good job of adapting the second approach (Ilya's) I present to this specific problem. For my first approach (the one that I think performs best), it is just as easy to adapt by coding the having clause to be less than a column rather than '3'. Now Frib's approach is a bit more X specific but I think you can do the same thing by limiting the cross joined table 'a' ... I'd have to think about that a bit more.

Jay White
{0}
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-11-08 : 11:54:19
Sorry Jay, you've flattened that rather too much.
It should look like this:

SELECT someid, somevc
FROM (
SELECT someid,
CASE col WHEN 0 THEN somevc0 WHEN 1 THEN somevc1 ELSE somevc2 END somevc
FROM (
SELECT T1.someid, MIN(somevc0) AS somevc0,
MIN(T1.somevc1) AS somevc1, MIN(somevc) AS somevc2
FROM (
SELECT T0.someid, MIN(somevc0) AS somevc0, MIN(somevc) somevc1
FROM (
SELECT someid, MIN(somevc) AS somevc0
FROM xtemp
GROUP BY someid
) AS T0
LEFT JOIN xtemp AS T1 ON T0.someid = T1.someid AND T0.somevc0 < T1.somevc
GROUP BY T0.someid
) AS T1
LEFT JOIN xtemp AS T2 ON T1.someid = T2.someid AND T1.somevc1 < T2.somevc
GROUP BY T1.someid
) T2
CROSS JOIN (SELECT 0 AS col UNION ALL SELECT 1 UNION ALL SELECT 2) n
) a
WHERE somevc IS NOT NULL
ORDER BY someid, somevc

 
But I agree that you have to be really sure that you want 3 values to use this!


Edited by - Arnold Fribble on 11/08/2002 12:08:22
Go to Top of Page

ConfusedOfLife
Starting Member

35 Posts

Posted - 2002-11-08 : 11:58:27
Suppose this is our table :


someid somevc
----------- ----------
1 Oopse
1 Oopse
1 Oopse
1 Oopse
1 Oopse
2 Wow!


You said that dublicates will be reported, so, I tried your code ( the first one and the second ) on this table and for the first row ( 1,Oopse) it returns all of them ( 5 rows ), well, it should, but what if we wanted to have only ( and only! ) 3 rows for each group in the results? ( i.e. I do not get 5 rows for this oopse! )

Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-11-08 : 12:02:50
Holy crap Frib, that is scary fast!!! I'm glad you chimed in ...

quote:
You said that dublicates will be reported, so, I tried your code ( the first one and the second ) on this table and for the first row ( 1,Oopse) it returns all of them ( 5 rows ), well, it should, but what if we wanted to have only ( and only! ) 3 rows for each group in the results? ( i.e. I do not get 5 rows for this oopse! )


Then use Fribbles method or modify one of the others so it meets your needs ...

Jay White
{0}

Edited by - Page47 on 11/08/2002 12:08:26
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-11-08 : 12:11:51
Sigh. I had to modify it again: it was selecting the wrong someid values -- doesn't matter unless the data doesn't have enough to fill all 3...


Go to Top of Page

MuffinMan
Posting Yak Master

107 Posts

Posted - 2002-11-08 : 12:25:34
Ok, how about this? It reports ties, but might not be completely true to the TOP X criteria (see the output below for the TOP 3 from each grouping).


set nocount on

create table #sale (Company varchar(20) , Division varchar(20) , sale int)

insert #sale select 'Allstar Shipping' , 'Marketing' , 9000
insert #sale select 'Allstar Shipping' , 'IT' , 12000
insert #sale select 'Allstar Shipping' , 'Operations' , 9000
insert #sale select 'Allstar Shipping' , 'Engineering' , 9000
insert #sale select 'Allstar Shipping' , 'Sales' , 7000

insert #sale select 'Bombay Allied' , 'North' , 140000
insert #sale select 'Bombay Allied' , 'South' , 125000
insert #sale select 'Bombay Allied' , 'East' , 86000
insert #sale select 'Bombay Allied' , 'West' , 168000

insert #sale select 'Cat Walkers, Ltd.' , 'Persian' , 18000
insert #sale select 'Cat Walkers, Ltd.' , 'Domestic' , 15000
insert #sale select 'Cat Walkers, Ltd.' , 'Fluffy' , 22500


declare @RecsPerGroup int , @msg varchar(100)
select @RecsPerGroup = 2

select @msg = '** Show TOP ' + convert(varchar(10), @RecsPerGroup) + ' records per grouping **'
print @msg

select *
from (
select top 100 percent *
, Record = (
select count(*) + 1 as Cnt
from #sale s2
where s2.Company = s.Company and s2.sale > s.sale
)
from #sale s
order by Company, Sale Desc , Division
) a
where record <= @RecsPerGroup


print ''

select @RecsPerGroup = 3

select @msg = '** Show TOP ' + convert(varchar(10), @RecsPerGroup) + ' records per grouping **'
print @msg

select *
from (
select top 100 percent *
, Record = (
select count(*) + 1 as Cnt
from #sale s2
where s2.Company = s.Company and s2.sale > s.sale
)
from #sale s
order by Company, Sale Desc , Division
) a
where record <= @RecsPerGroup


print ''

select @RecsPerGroup = 3

select @msg = '** Show TOP ' + convert(varchar(10), @RecsPerGroup) + ' records per grouping USING DISTINCT **'
print @msg

select *
from (
select top 100 percent *
, Record = (
select count(Distinct Sale) + 1 as Cnt
from #sale s2
where s2.Company = s.Company and s2.sale > s.sale
)
from #sale s
order by Company, Sale Desc , Division
) a
where record <= @RecsPerGroup


drop table #sale

/**** Here are the results ***/
** Show TOP 2 records per grouping **
Company Division sale Record
-------------------- -------------------- ----------- -----------
Allstar Shipping IT 12000 1
Allstar Shipping Engineering 9000 2
Allstar Shipping Marketing 9000 2
Allstar Shipping Operations 9000 2
Bombay Allied West 168000 1
Bombay Allied North 140000 2
Cat Walkers, Ltd. Fluffy 22500 1
Cat Walkers, Ltd. Persian 18000 2


** Show TOP 3 records per grouping **
Company Division sale Record
-------------------- -------------------- ----------- -----------
Allstar Shipping IT 12000 1
Allstar Shipping Engineering 9000 2 ==
Allstar Shipping Marketing 9000 2 ==> Are We getting the top 3 here?
Allstar Shipping Operations 9000 2 ==
Bombay Allied West 168000 1
Bombay Allied North 140000 2
Bombay Allied South 125000 3
Cat Walkers, Ltd. Fluffy 22500 1
Cat Walkers, Ltd. Persian 18000 2
Cat Walkers, Ltd. Domestic 15000 3

** Show TOP 3 records per grouping USING DISTINCT **
Company Division sale Record
-------------------- -------------------- ----------- -----------
Allstar Shipping IT 12000 1
Allstar Shipping Engineering 9000 2
Allstar Shipping Marketing 9000 2
Allstar Shipping Operations 9000 2
Allstar Shipping Sales 7000 3 == Now we get 3 distinct values!
Bombay Allied West 168000 1
Bombay Allied North 140000 2
Bombay Allied South 125000 3
Cat Walkers, Ltd. Fluffy 22500 1
Cat Walkers, Ltd. Persian 18000 2
Cat Walkers, Ltd. Domestic 15000 3



Edited by - muffinman on 11/08/2002 12:40:11
Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-11-08 : 12:40:23
Nice work Muffin ...

select
someid,
somevc
from (
select top 100 percent
*,
(select
count(*) + 1 as Cnt
from
#temp t2
where
t2.someid = t.someid and
t2.somevc < t.somevc ) as Record
from
#temp t
order by
someid,
somevc desc ) as a
where
record <= 3
order by
someid,
somevc

 
... a nice 6.16 subtree cost and running just over 3 seconds ...

But Frib's re-write is still the champ (by a huge margin) running in 300ms and only a 1.06 cost.

Jay White
{0}
Go to Top of Page

JustinBigelow
SQL Gigolo

1157 Posts

Posted - 2002-11-08 : 14:11:55
quote:

But Frib's re-write is still the champ (by a huge margin) running in 300ms and only a 1.06 cost.



Which just goes to show you, always be careful when you quibble with Fribble.

Justin

Have you hugged your SQL Server today?
Go to Top of Page

Lavos
Posting Yak Master

200 Posts

Posted - 2002-11-09 : 18:57:12
Of course, there's always another solution. I hesitate to give it, because it really depends on How the tables are used. Denormalize the table to include your ranking information, or create a rankings table. If rankings are calculated only periodically, and the listing generated often, then that would be the fastest solution over time.

I'm finding this very interesting so far. I was working on my own solution for the general case, but I stopped when I couldn't wrap my head around Fribble's query :)

Oh yeah, I did have a procedure that dynamically wrote a lot of UNION ALL queries which was very fast, but it quickly exceeds the varchar(8000) datatype when there are a lot of categories/departments.

----------------------
"O Theos mou! Echo ten labrida en te mou kephale!"
Go to Top of Page

r937
Posting Yak Master

112 Posts

Posted - 2002-11-09 : 19:40:55
fribble: "But I agree that you have to be really sure that you want 3 values to use this!"

i hate to keep harping on this, and i'd love to hear what the SBC Ameritech guys have to say about this, but each department was to list a different number of top rows -- department A will want the top 2 listed, department B will want the top 6 listed, and so on

any luck adapting your 1st query yet, jay?

rudy
Go to Top of Page

Lavos
Posting Yak Master

200 Posts

Posted - 2002-11-09 : 23:19:52
Well Rudy, I think Jay said that he was more interested in attacking the general approach, opposed to coming up with a complete solution.

This query would work, but I don't know how it performs, but if updates to the Employee and Department tables are rare and the query is ran often, then the denormalized approach will still outperform it overall.


SELECT E.*
FROM Departments D INNER JOIN
(SELECT A.EmpID, A.DepID, COUNT(*) AS Rank
FROM Employees A INNER JOIN
Employees B ON (A.DepID = B.DepID AND A.EmpID <= B.EmpID)
GROUP BY A.EmpID, A.DepID ) ER ON
(ER.DepID = D.DepID AND ER.Rank <= D.RankLimit)
INNER JOIN Employees E ( ON ER.EmpID = E.EmpID)


----------------------
"O Theos mou! Echo ten labrida en te mou kephale!"
Go to Top of Page
    Next Page

- Advertisement -