Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Multiple Min Aggregates, with precedence

Author  Topic 

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2003-09-16 : 11:13:04
Say I have this table:
create table min_test (
a int not null,
b int not null,
c int not null,
d varchar(20),
constraint pk_min_test primary key (a,b,c))

 
And I populate it with this data:
insert min_test (a,b,c,d)
select 1,1,2,'this one'
union select 1,1,3,'not this one'
union select 1,2,1,'not this one'
union select 1,2,2,'not this one'
union select 2,4,7,'not this one'
union select 2,3,4,'this one'
union select 3,10,10,'this one'

 
I need to query for a single d value for each a with the following business rule: If there are multiple d's for each a, use the row with the lowest b. If there are still multiples, use the lowest c for the lowest b. {a,b,c} is unique (primary key).

So the expected rowset should look like:

a d
----------- --------------------
1 this one
2 this one
3 this one

(3 row(s) affected)

 
So far, I have come up with this query:
select
m.a,
m.d
from
min_test m
inner join (
select
m.a,
m.b,
min(c) as c
from
min_test m
inner join (
select
a,
min(b) as b
from
min_test
group by
a) b
on m.a = b.a and
m.b = b.b
group by
m.a,
m.b) c
on m.a = c.a and
m.b = c.b and
m.c = c.c

 
Can anyone come up with a better performer?

Thanks in advance.

Jay White
{0}

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2003-09-16 : 12:12:45
This gives you the "rank" of each one:


select m1.*, (select count(*) from min_test m2 where m2.a = m1.a and
(m2.b < m1.b or (m2.b = m1.b and m2.c <= m1.c))) as Rank
from
min_test m1


So, from there, how does this perform:


select m1.*
from
min_test m1
where
(select count(*) from min_test m2 where m2.a = m1.a and
(m2.b < m1.b or (m2.b = m1.b and m2.c <= m1.c))) = 1


Actually, execution cost seems about the same ... but it's shorter at least!


- Jeff
Go to Top of Page

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2003-09-16 : 12:19:25
why not convert/merge the a,b,c into a (standardised format)number....and just take the lowest records?

010102
010103
010201
010202
020304
020407
031010

your criteria seem to suggest that they can be combined to "get the lowest c for the lowest b"....

the lowest c for the lowest b....automatically satisfies the lowest b condition...true/false?

maybe i'm missing something?
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2003-09-16 : 15:21:54
There's a bit in this thread, though it doesn't really come to any conclusions:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=26503
Go to Top of Page

joseph
Starting Member

10 Posts

Posted - 2003-09-17 : 06:15:10
Hi page47
I know you are one of the best guru so your answer is better than mine.
maybe my solution is wrong.
but I come up with the following solution :
select *
from min_test A
where c=(select min(c) from min_test B
where a.a=b.a
and b=( select min(b) from min_test c
where a.a=c.a
))
and b=( select min(b) from min_test c
where a.a=c.a
)
Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2003-09-17 : 07:15:40
joseph, that does work, but the subtree cost is still higher than my original attempt.

Jay White
{0}
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2003-09-17 : 08:21:08
But Jay, you haven't given us any representative data. Clearly if your table has 7 rows, it makes no practical difference which way you write it!
The various solutions are highly sensitive to the selectivity of the various columns in the primary key, the size of the other column(s) in the table, the presence of indexes, what the clustered index is, etc. So at best all we can reasonably say is 'here are 5 ways to do it, see which works best with your real scenario'.
Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2003-09-17 : 09:07:24
I'm sorry. I have been testing each solution against my data set as they are presented.

My data set consists of 189651 rows. 119744 distinct 'a's. There are on average 1.5808987 'b's for each 'a' and, of course, 1.001838 'c's for each 'a'/'b' combo.

So far, Jeff's solution shaves about 4/10ths of second of my 5.5 second query. Josephs runs in close to 9 seconds, and Arnold, yours (I think I've adapted it correctly) looks to be around 6 seconds. I wasn't able to make sense of nr's solution as it doesn't seem to correlate the 'b's and 'c's, instead it just gets the min of each, but not necessarily in the same row. (Maybe I'm not understanding the query).

Jay White
{0}
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2003-09-17 : 10:44:00
If you modify Jeff's to:

select m1.*
from
min_test m1
where not exists
(select * from min_test m2 where m2.a = m1.a and
(m2.b < m1.b or (m2.b = m1.b and m2.c < m1.c)))

Is it better or worse?
Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2003-09-17 : 10:48:52
Better. Down to 2 seconds.

Jay White
{0}
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2003-09-17 : 10:51:44
As far as nigel's in the other thread, I thought the only problem (as far as sense is concerned) was the table aliases seemed to have gotten confused -- hence the first bit of code in my reply.
But as I said, the performance on that one suffers when the PK1 (= a in this thread) is highly selective since the cardinality of the output will be the same as the cardinality of PK1 -- hence a table seek for every row, column in the output. (AFAIR)
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2003-09-17 : 11:06:26
I thought about using EXISTS() but I got a headache trying to figure that one out .... Nice work, Arnold !

- Jeff
Go to Top of Page
   

- Advertisement -