A question involving group by's and I need help

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

A question involving group by's and I need help

Author

Topic

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2002-08-08 : 09:11:17

Matthew writes "Hi

I have a table to keep some salary data of employees.
The table is defined with a staffid, FullTimeSalary, EffectiveSalary and EffectiveBeginDate

Its set up like this to keep a history of salaries and when they were received.

What I need to do is get the most current salary - so i had something like
select staffid, max(effectivebegindate) from salary group by staffid

but i want to be able to get the FTSalary field too. I cant switch it and do max(ftsalary) because a salary can be decreased.

any ideas?"

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-08-08 : 09:28:37

First, you should consider normalizing your data structure.

Second, I think you need a where not exists (correlated subquery) rather than a group by.


select
    <col_list>
from
    workforce w
where
    not exists (
        select 1
        from
            workforce
        where
            w.staffid = staffid and
            effectivebegindate > w.effectivebegindate )

Jay White
{0}

r937
Posting Yak Master

112 Posts

Posted - 2002-08-08 : 09:46:06

hmm, i would write the correlated subquery this way --

select staffid, FullTimeSalary
  from workforce w
 where EffectiveBeginDate =
       ( select max(EffectiveBeginDate)
           from workforce
          where staffid = w.staffid )

any ideas about whether this will differ in execution time, jay?

rudy
http://rudy.ca/

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-08-08 : 10:02:12

It's damn near impossible to say without ddl and sample data. The NOT EXISTS avoids the Aggregate. However, your aggregate method uses and an inner join, while the NOT EXIST must use a left anti semi join operator. I think its going to be a toss up largely dependant on indexing.

Matthew, how bout supplying the ddl and some sample data? And explaining the difference between EffectiveSalary and FullTimeSalary?

Jay White
{0}

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-08-08 : 10:26:48

Consider this test...


create table test (
    pk int identity(1,1) not null primary key,
    a  int not null,
    b  int not null )
go
set nocount on
declare @i int
set @i = 0
while @i < 10000
begin
insert into test (a,b)
select round((rand() * 10), 0) + 1, round((rand() * 1000), 0) + 1
select @i = @i + 1
end

select
    pk,a,b
from
    test t
where
    not exists (
        select 1
        from
            test
        where
            t.a = a and
            t.b < b)
order by
    a

select
    t.pk,t.a,t.b
from
    test t
    left join test t2
        on (t.a = t2.a and
            t.b < t2.b)
where
    t2.pk is null
order by
    t.a

select
    pk,a,b
from
    test t
where
    b = (
        select
            max(b)
        from
            test
        where
            t.a = a)
order by
   t.a

... I am unable to do anything to make the last select perform the best.

Jay White
{0}

Edited by - Page47 on 08/08/2002 10:33:23

M.E.
Aged Yak Warrior

539 Posts

Posted - 2002-08-08 : 11:08:32

Would a having clause be appropriate here?

hehe, not with that syntax

-----------------------
Take my advice, I dare ya

Edited by - M.E. on 08/08/2002 11:09:06

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-08-08 : 13:20:51

This is even better yet...


select
    t.pk,
    t.a,
    t.b
from
    test t
    inner join (
        select
            a,
            max(b) as b
        from
            test t
        group by
            a ) t2
        on t.a = t2.a and t.b = t2.b
order by
    t.a

Jay White
{0}

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-08-08 : 15:39:42

Aha, I was just going to suggest that. I presume you're using version 7: I vaguely remember that what you posted was the only way to get a good query plan for that sort of distribution of a's and b's. In 2000, the query plan for that one and the third suggestion you gave before will be the same.

BTW, using COUNT instead of NOT EXISTS is, as usual, a bad idea.
(Don't use this)


SELECT T1.a, T1.b
FROM Test AS T1
WHERE (
    SELECT COUNT(*)
    FROM Test AS T2
    WHERE T1.a = T2.a AND T1.b < T2.b
  ) = 0
ORDER BY T1.a, T1.b

Edited by - Arnold Fribble on 08/08/2002 15:54:33

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources