One Query?? - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

One Query??

Author

Topic

rme8494
Yak Posting Veteran

98 Posts

Posted - 2002-11-07 : 21:21:40

Hi Everyone,
Here is the situation, I have two tables, a report and reportcat table.

ReportCat Table
Cat_ID
CatName
Postdate

Report Table
Report_ID
Cat_ID
FileLocation
PostDate

I'm doing an inner join on the Cat_ID on a query to output them on a web page. This query works great to out put ALL the reports in the database, however I would like to do something different. I would like the user to be able select how many reports they would like to see output in each category. So say I have 20 reports total, and 4 categories. Each category has 5 reports in it. How do I only return only 2 reports from each category? If I do a TOP 8 in the query it wouldn't give me 2 from each category just 8 random records. Make sense? Any ideas?

Ryan

Ryan Everhart
SBC Ameritech

simondeutsch
Aged Yak Warrior

547 Posts

Posted - 2002-11-07 : 21:44:27

Could definitely do it with UNION. But there should be a better way.
Otherwise:
SELECT TOP n * FROM ReportCat INNER JOIN Report ON CAt_Id = cat_id where catname = 'a'
UNION
SELECT TOP n * FROM ReportCat INNER JOIN Report ON CAt_Id = cat_id where catname = 'b'
etc.

Sarah Berger MCSD

rme8494
Yak Posting Veteran

98 Posts

Posted - 2002-11-07 : 22:07:57

Sarah,
Thanks for your reply. I am looking for something in single query though. I do think the union is feasable as the number of categories will change. Any other ideas?

Thanks
Ryan

Ryan Everhart
SBC Ameritech

oitsubob
Yak Posting Veteran

70 Posts

Posted - 2002-11-07 : 22:20:08

Hi Everyone...
Ryan and I work together at SBC (a thousand miles apart) and have been bouncing these ideas off one another. Essentially, we're both trying to accomplish the same basic goal but in slightly different applications.

Just to give you an idea of what I'm trying to do, here's the text of a post I submitted to a Cold Fusion user group on Macromedia.com:

Begin Post
=============
What I'm trying to do is build a single query that will join an "employees" table and "department" table and return the top X senior employees from each department where X is a value in the limit column of the "department" table.

For the sake of this example, let's say the "employee" table has 100 records and 4 columns: First Name, Last Name, Department ID and Hire Date while the "department" table has 5 records 3 columns: Department ID, Department Name and Limit.

To make it simple, lets also just assume that the value of the limit column is '2' employee's for each department. When all's said and done, the result set should contain 10 records (2 people for each of the 5 departments).

Note: In actuality, we have tens of thousands of employees and several hundred departments. For that reason, it's not feasible or good practice to write a query for each individual department.

In all honesty, I don't even know where to start on this. I'm hoping that someone in the community has come across a similar situation and can provide me with some much needed direction.

=============
End Post

If anyone has any suggestions, both Ryan and I would appreciate it.

Thanks in Advance,

Bob Weston
SBC Pacific Bell

simondeutsch
Aged Yak Warrior

547 Posts

Posted - 2002-11-07 : 22:34:47

Check this out:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=21334
see Ilya Zaltsman's post. The situation seems similar to yours, although you will have to change the hardcoded TOP operator, maybe by using dynamic SQL, because otherwise I'm not sure how you can select TOP n of an unknown number. Dynamic sql will slow you down somewhat, though.

Sarah Berger MCSD

hoagm
Starting Member

1 Post

Posted - 2002-11-07 : 23:45:20

This should work. I'm sure it's longer than you were looking for but it will take into consideration table changes to the Report table as well as the variable number of categories. I've commented liberally... let me know if you have questions.

--Create #temp_reports then clear it, this makes sure our temp table is always mirrored after our real table
SELECT TOP 1 *
INTO #temp_reports
FROM Report

DELETE #temp_reports

--variable to hold our current category ID
DECLARE @Cat_ID Int

--cursor variable so we can step through each cat_id and perform actions
DECLARE Category_Cursor CURSOR FOR
SELECT Cat_ID FROM ReportCat

--open the cursor and prime it
OPEN Category_Cursor
FETCH NEXT FROM Category_Cursor
INTO @Cat_ID

--as long as we've not reached the end of the cat_id's, we loop
WHILE @@FETCH_STATUS = 0
BEGIN

--insert top n reports from cat_id into temp
INSERT #temp_reports
SELECT TOP n *
FROM Report
WHERE Cat_ID = @Cat_ID

--fetch the next cat_id
FETCH NEXT FROM Category_Cursor
INTO @Cat_ID
END

--close your cursor and clear it from memory
CLOSE Category_Cursor
DEALLOCATE Category_Cursor

--select your results
SELECT * FROM #temp_reports

Edited by - hoagm on 11/07/2002 23:49:44

r937
Posting Yak Master

112 Posts

Posted - 2002-11-08 : 00:17:08

caution: not tested

select D.DepartmentName
     , E.FirstName
     , E.LastName
     , E.HireDate  
  from department D
inner 
  join employee E 
    on D.ID = E.DepartmentID
 where D.Limit > 
       ( select count(*) 
           from employee 
          where DepartmentID = E.DepartmentID 
            and HireDate > E.HireDate ) 
 order 
    by D.DepartmentName
     , E.HireDate  descending

the limit will be exceeded in departments where there are ties across the last place

rudy

mohamedyousuff@yahoo.com
Starting Member

24 Posts

Posted - 2002-11-08 : 01:11:28

Try this query

select * from reportcat rc inner join report r on rc.cat_id = r.cat_id where
report_id in ( select top 2 report_id from report where cat_id = rc.cat_id )

The solution solves it in a single query as you expected. As u see above, it does not involve any temporary table but it is a correlated query. I have tested it and is working fine. I am thinking of alternatives without correlated query (performance reasons) and without temporary table and with a single query.

quote:

Hi Everyone,
Here is the situation, I have two tables, a report and reportcat table.

ReportCat Table
Cat_ID
CatName
Postdate

Report Table
Report_ID
Cat_ID
FileLocation
PostDate

I'm doing an inner join on the Cat_ID on a query to output them on a web page. This query works great to out put ALL the reports in the database, however I would like to do something different. I would like the user to be able select how many reports they would like to see output in each category. So say I have 20 reports total, and 4 categories. Each category has 5 reports in it. How do I only return only 2 reports from each category? If I do a TOP 8 in the query it wouldn't give me 2 from each category just 8 random records. Make sense? Any ideas?

Ryan

Ryan Everhart
SBC Ameritech

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-11-08 : 07:49:50

This question of top N of a group has come up a few times lately and I have spent some time looking at the various suggestions. I will outline what I have found here.

First some sample data to work with ...

create table #temp (
    someid int,
    somevc varchar(10) )

insert into #temp
select
    n2.n,
    left(
        char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
        char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
        char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
        char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
        char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
        char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
        char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
        char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
        char(round((rand(convert(binary(4),newid())) * 25),0)+97) +
        char(round((rand(convert(binary(4),newid())) * 25),0)+97),
    floor(rand(convert(binary(4),newid())) * 10)+1)
from
    (select n from toolbox.dbo.numbers where n <= 500) n1
    cross join toolbox.dbo.numbers n2
where
    n2.n < 50

This is a two column table. The first column is an int which will represent the groupings and the seconds is a varchar which is the values we want to TOP. The sample data consists of 500 values in each group and 50 groups. (toolbox.dbo.numbers is simply a tally table).

The first solution is one I posted a ways back. I don't remember where it came from, but I do remember it was something setbasedisthetruepath and I were working one.



select
    someid,
    somevc
from
    #temp t
where
    exists (
        select 1
        from
            #temp
        where
            someid = t.someid and
            somevc = t.somevc
        group by
            someid
        having (
            select
                count(*)
            from
                #temp
            where
                someid = t.someid and
                somevc < t.somevc ) < 3 )
order by someid, somevc

This query will return the top three values for each group and duplicates will be reported. Meaning, if there are two rows of 1,a then both will be returned. This query presents a subtree cost of 11.5 (very poor) and runs on my laptop with the above data set in around 3.6 seconds.

The next is derived from Ilya and the post mentioned above.



select
    someid,
    somevc
from
    #temp t
where
    somevc in (
        select top 3
            somevc
        from
            #temp
        where
            t.someid = someid
        order by somevc )
order by someid, somevc

This query returns the exact same rowset as the previous method. Now, while this query looks much simpler, it presents a subtree cost of 467 and takes around 19.6 seconds under the same conditions as above.

The final method comes from Arnold Fribble from ~~a post I can't seem to find at the moment.~~ A post dealing with bottom 2 of a group.


select
    someid,
    somevc
from (
    select
        someid,
        case
            when col = 0 then somevc0
            when col = 1 then somevc1
            else somevc2
        end somevc
    from (
        select
            t0.someid,
            min(t0.somevc) as somevc0,
            min(t1.somevc) as somevc1,
            min(t2.somevc) as somevc2
        from (
            select
                someid,
                min(somevc) as somevc
            from
                #temp
            group by
                someid ) as t0
            left join #temp t1
                on t0.someid = t1.someid and
                   t0.somevc < t1.somevc
            left join #temp t2
                on t1.someid = t2.someid and
                   t1.somevc < t2.somevc
        group by
            t0.someid ) as a
    cross join (
        select 0 as col
        union all
        select 1
        union all
        select 2) as n ) as a
where
    somevc is not null
order by someid, somevc

Now Frib's piece here is a bit different than the two above in that duplicates are not shown. So if there are two rows in the sample data of 1,a only one of them will appear in the final rowset. While this query gives more reasonable 16.6 subtree cost, it took about 38.4 seconds to run.

So from that, it would appear that my original method is the way to go. Now, that being said, with a smaller data set such as ..


select 1,'a'
union select 1,'b'
union select 1,'c'
union select 1,'d'
union select 1,'f'
union select 1,'g'
union select 1,'h'
union select 1,'i'
union select 2,'a'
union select 2,'b'
union select 2,'c'
union select 2,'d'
union select 2,'e'
union select 2,'f'
union select 2,'g'
union select 2,'h'
union select 2,'i'
union select 2,'j'

..all three perform resonably well.

Jay White
{0}

Edited by - Page47 on 11/08/2002 07:57:27

r937
Posting Yak Master

112 Posts

Posted - 2002-11-08 : 09:26:18

jay, which, if any, of your methods will return the top X of each department, where the department itself specifies how many X to list?

not to be obstinate, but i believe that was the original problem

rudy

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-11-08 : 09:44:57

quote:
not to be obstinate, but i believe that was the original problem

Rudy, I was looking at the approach rather than the exact solution. Certainly, you can understand the value of discussing different approaches that might be adapted vs. giving cut and paste code on a forum such as this.

The determining TOP X is just a variation on the theme, right? If you can do TOP 3, you can do TOP X. You do a good job of adapting the second approach (Ilya's) I present to this specific problem. For my first approach (the one that I think performs best), it is just as easy to adapt by coding the having clause to be less than a column rather than '3'. Now Frib's approach is a bit more X specific but I think you can do the same thing by limiting the cross joined table 'a' ... I'd have to think about that a bit more.

Jay White
{0}

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-11-08 : 11:54:19

Sorry Jay, you've flattened that rather too much.
It should look like this:


SELECT someid, somevc
FROM (
    SELECT someid,
      CASE col WHEN 0 THEN somevc0 WHEN 1 THEN somevc1 ELSE somevc2 END somevc
    FROM (
        SELECT T1.someid, MIN(somevc0) AS somevc0,
          MIN(T1.somevc1) AS somevc1, MIN(somevc) AS somevc2
        FROM (
            SELECT T0.someid, MIN(somevc0) AS somevc0, MIN(somevc) somevc1
            FROM (
                SELECT someid, MIN(somevc) AS somevc0
                FROM xtemp
                GROUP BY someid
              ) AS T0
            LEFT JOIN xtemp AS T1 ON T0.someid = T1.someid AND T0.somevc0 < T1.somevc
            GROUP BY T0.someid
          ) AS T1
        LEFT JOIN xtemp AS T2 ON T1.someid = T2.someid AND T1.somevc1 < T2.somevc
        GROUP BY T1.someid
      ) T2
    CROSS JOIN (SELECT 0 AS col UNION ALL SELECT 1 UNION ALL SELECT 2) n
  ) a
WHERE somevc IS NOT NULL
ORDER BY someid, somevc

But I agree that you have to be really sure that you want 3 values to use this!

Edited by - Arnold Fribble on 11/08/2002 12:08:22

ConfusedOfLife
Starting Member

35 Posts

Posted - 2002-11-08 : 11:58:27

Suppose this is our table :


someid      somevc     
----------- ---------- 
1           Oopse
1           Oopse
1           Oopse
1           Oopse
1           Oopse
2           Wow!

You said that dublicates will be reported, so, I tried your code ( the first one and the second ) on this table and for the first row ( 1,Oopse) it returns all of them ( 5 rows ), well, it should, but what if we wanted to have only ( and only! ) 3 rows for each group in the results? ( i.e. I do not get 5 rows for this oopse! )

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-11-08 : 12:02:50

Holy crap Frib, that is scary fast!!! I'm glad you chimed in ...

quote:
You said that dublicates will be reported, so, I tried your code ( the first one and the second ) on this table and for the first row ( 1,Oopse) it returns all of them ( 5 rows ), well, it should, but what if we wanted to have only ( and only! ) 3 rows for each group in the results? ( i.e. I do not get 5 rows for this oopse! )

Then use Fribbles method or modify one of the others so it meets your needs ...

Jay White
{0}

Edited by - Page47 on 11/08/2002 12:08:26

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-11-08 : 12:11:51

Sigh. I had to modify it again: it was selecting the wrong someid values -- doesn't matter unless the data doesn't have enough to fill all 3...

MuffinMan
Posting Yak Master

107 Posts

Posted - 2002-11-08 : 12:25:34

Ok, how about this? It reports ties~~, but might not be completely true to the TOP X criteria (see the output below for the TOP 3 from each grouping)~~.


set nocount on

create table #sale (Company varchar(20) , Division varchar(20) , sale int)

insert #sale select 'Allstar Shipping' , 'Marketing' , 9000
insert #sale select 'Allstar Shipping' , 'IT' , 12000
insert #sale select 'Allstar Shipping' , 'Operations' , 9000
insert #sale select 'Allstar Shipping' , 'Engineering' , 9000
insert #sale select 'Allstar Shipping' , 'Sales' , 7000

insert #sale select 'Bombay Allied' , 'North' , 140000
insert #sale select 'Bombay Allied' , 'South' , 125000
insert #sale select 'Bombay Allied' , 'East' , 86000
insert #sale select 'Bombay Allied' , 'West' , 168000

insert #sale select 'Cat Walkers, Ltd.' , 'Persian' , 18000
insert #sale select 'Cat Walkers, Ltd.' , 'Domestic' , 15000
insert #sale select 'Cat Walkers, Ltd.' , 'Fluffy' , 22500


declare @RecsPerGroup int , @msg varchar(100)
select @RecsPerGroup = 2

select @msg = '** Show TOP ' + convert(varchar(10), @RecsPerGroup) + ' records per grouping **'
print @msg

select * 
from (
	select top 100 percent * 
		, Record = (
				select count(*) + 1 as Cnt 
				from #sale s2 
				where s2.Company = s.Company and s2.sale > s.sale
				)
	from #sale s
	order by Company,  Sale Desc , Division
) a
where  record <= @RecsPerGroup


print ''

select @RecsPerGroup = 3

select @msg = '** Show TOP ' + convert(varchar(10), @RecsPerGroup) + ' records per grouping **'
print @msg

select * 
from (
	select top 100 percent * 
		, Record = (
				select count(*) + 1 as Cnt 
				from #sale s2 
				where s2.Company = s.Company and s2.sale > s.sale
				)
	from #sale s
	order by Company,  Sale Desc , Division
) a
where  record <= @RecsPerGroup


print ''

select @RecsPerGroup = 3

select @msg = '** Show TOP ' + convert(varchar(10), @RecsPerGroup) + ' records per grouping USING DISTINCT **'
print @msg

select * 
from (
	select top 100 percent * 
		, Record = (
				select count(Distinct Sale) + 1 as Cnt 
				from #sale s2 
				where s2.Company = s.Company and s2.sale > s.sale
				)
	from #sale s
	order by Company,  Sale Desc , Division
) a
where  record <= @RecsPerGroup


drop table #sale

/**** Here are the results ***/
** Show TOP 2 records per grouping **
Company              Division             sale        Record      
-------------------- -------------------- ----------- ----------- 
Allstar Shipping     IT                   12000       1
Allstar Shipping     Engineering          9000        2
Allstar Shipping     Marketing            9000        2
Allstar Shipping     Operations           9000        2
Bombay Allied        West                 168000      1
Bombay Allied        North                140000      2
Cat Walkers, Ltd.    Fluffy               22500       1
Cat Walkers, Ltd.    Persian              18000       2

 
** Show TOP 3 records per grouping **
Company              Division             sale        Record      
-------------------- -------------------- ----------- ----------- 
Allstar Shipping     IT                   12000       1
Allstar Shipping     Engineering          9000        2   ==
Allstar Shipping     Marketing            9000        2   ==> Are We getting the top 3 here?
Allstar Shipping     Operations           9000        2   ==
Bombay Allied        West                 168000      1
Bombay Allied        North                140000      2
Bombay Allied        South                125000      3
Cat Walkers, Ltd.    Fluffy               22500       1
Cat Walkers, Ltd.    Persian              18000       2
Cat Walkers, Ltd.    Domestic             15000       3

** Show TOP 3 records per grouping USING DISTINCT **
Company              Division             sale        Record      
-------------------- -------------------- ----------- ----------- 
Allstar Shipping     IT                   12000       1
Allstar Shipping     Engineering          9000        2
Allstar Shipping     Marketing            9000        2
Allstar Shipping     Operations           9000        2
Allstar Shipping     Sales                7000        3  == Now we get 3 distinct values!
Bombay Allied        West                 168000      1
Bombay Allied        North                140000      2
Bombay Allied        South                125000      3
Cat Walkers, Ltd.    Fluffy               22500       1
Cat Walkers, Ltd.    Persian              18000       2
Cat Walkers, Ltd.    Domestic             15000       3

Edited by - muffinman on 11/08/2002 12:40:11

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-11-08 : 12:40:23

Nice work Muffin ...


select
    someid,
    somevc
from (
    select top 100 percent
        *,
        (select
            count(*) + 1 as Cnt
         from
            #temp t2
         where
            t2.someid = t.someid and
            t2.somevc < t.somevc ) as Record
    from
        #temp t
    order by
        someid,
        somevc desc ) as a
where
    record <= 3
order by
    someid,
    somevc

... a nice 6.16 subtree cost and running just over 3 seconds ...

But Frib's re-write is still the champ (by a huge margin) running in 300ms and only a 1.06 cost.

Jay White
{0}

JustinBigelow
SQL Gigolo

1157 Posts

Posted - 2002-11-08 : 14:11:55

quote:

But Frib's re-write is still the champ (by a huge margin) running in 300ms and only a 1.06 cost.

Which just goes to show you, always be careful when you quibble with Fribble.

Justin

Have you hugged your SQL Server today?

Lavos
Posting Yak Master

200 Posts

Posted - 2002-11-09 : 18:57:12

Of course, there's always another solution. I hesitate to give it, because it really depends on How the tables are used. Denormalize the table to include your ranking information, or create a rankings table. If rankings are calculated only periodically, and the listing generated often, then that would be the fastest solution over time.

I'm finding this very interesting so far. I was working on my own solution for the general case, but I stopped when I couldn't wrap my head around Fribble's query :)

Oh yeah, I did have a procedure that dynamically wrote a lot of UNION ALL queries which was very fast, but it quickly exceeds the varchar(8000) datatype when there are a lot of categories/departments.

----------------------
"O Theos mou! Echo ten labrida en te mou kephale!"

r937
Posting Yak Master

112 Posts

Posted - 2002-11-09 : 19:40:55

fribble: "But I agree that you have to be really sure that you want 3 values to use this!"

i hate to keep harping on this, and i'd love to hear what the SBC Ameritech guys have to say about this, but each department was to list a different number of top rows -- department A will want the top 2 listed, department B will want the top 6 listed, and so on

any luck adapting your 1st query yet, jay?

rudy

Lavos
Posting Yak Master

200 Posts

Posted - 2002-11-09 : 23:19:52

Well Rudy, I think Jay said that he was more interested in attacking the general approach, opposed to coming up with a complete solution.

This query would work, but I don't know how it performs, but if updates to the Employee and Department tables are rare and the query is ran often, then the denormalized approach will still outperform it overall.


SELECT E.*
FROM Departments D INNER JOIN
 (SELECT A.EmpID, A.DepID, COUNT(*) AS Rank
  FROM Employees A INNER JOIN
      Employees B ON (A.DepID = B.DepID AND A.EmpID <= B.EmpID)
  GROUP BY A.EmpID, A.DepID ) ER ON
 (ER.DepID = D.DepID AND ER.Rank <= D.RankLimit)
 INNER JOIN Employees E ( ON ER.EmpID = E.EmpID)

----------------------
"O Theos mou! Echo ten labrida en te mou kephale!"

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources