UDF Aggregate - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

UDF Aggregate

Author

Topic

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2003-05-26 : 09:24:58

Frans writes "I have to use a LAST aggregate function, which, offcourse, SQL Server does not support. I have looked everywhere, and no, in context, it is not possible to use a "it's a simple sub query". It has to be an aggregate function. Surely this must be possible to create? People also tend to comfortably just jump to the conclusion that it is not possible, but we are talking about SQL Server. How can it not be possible? If we imagine that, using a udf, it is allowed as an aggregate function, the definition would look something like this:


create function dbo.last(@group cursor)
returning sql_variant
begin
declare @result sql_variant
fetch last @group into @result
return @result
end

Please note that this is just a sample piece of code.

Help would be greatly appreciated.

Regards

Frans"

mohdowais
Sheikh of Yak Knowledge

1456 Posts

Posted - 2003-05-26 : 10:02:02

Oh yes its certainly possible to emulate the functionality of the LAST keyword, there are a variety of techniques you could use like co-related subqueries, for example. There are also techniques using TOP and MAX, depending upon the table design. But your example using a cursor confuses me, I dont think you need to do something like this. Post your DDL, some sample data and expected results, we could help you.

OS

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-05-26 : 11:34:44

quote:
People also tend to comfortably just jump to the conclusion that it is not possible, but we are talking about SQL Server. How can it not be possible?

No, actually it's people who jump to the conclusion that because there's such a function in Access, that SQL Server MUST be able to support it. Without realizing that first and last have no meaning in a relational database. Data is defined by its value(s), not its position in a set. If the data is not ordered by a particular value then the concept of first and last is meaningless. If it IS ordered, then it corresponds to Min and Max, which SQL Server can do.

nr
SQLTeam MVY

12543 Posts

Posted - 2003-05-26 : 14:35:56

As Rob says 'last row' is a meaningless term when talking about a table which has no intrinsic sequence to the data.
It is meaningful when referring to a cursor which is by definition ordered.

select top 1 * from mytable
select top 1 * from mytable

There is nothing to guarantee that these staements executed together will produce the same row twice even if the data does not change. They probably would but if anything happens on the server to reorganise the data or to change the query plan between the two statements then they could return different rows.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2003-05-26 : 14:53:23

Devil's Advocate mode!

Say you have a table


CREATE TABLE T (
  PK1 varchar(20) NOT NULL,
  PK2 varchar(20) NOT NULL,
  PK3 varchar(20) NOT NULL,
  PK4 varchar(20) NOT NULL,
  C1 varchar(20) NOT NULL,
  C2 int NOT NULL,
  C3 datetime NOT NULL,
  PRIMARY KEY (PK1, PK2, PK3, PK4)
)

and someone had written in Access


SELECT PK1, FIRST(PK2), FIRST(PK3), FIRST(PK4), FIRST(C1), FIRST(C2), FIRST(C3)
FROM T
GROUP BY PK1
ORDER BY PK1, PK2, PK3, PK4

then clearly FIRST(PK2) can be replaced by MIN(...). Presumably FIRST(PK3), FIRST(PK4) are deterministic since they're part of the ORDER BY, but in that case they're obviously not MIN(...). And I've always assumed that the values returned by FIRST(C1)..FIRST(C3) would be deterministic since the order includes a candidate key. Is that not right? If it is, then how would you rewrite that query without FIRST?

Edit: ****ing IE bug. Argh!
Edit2: tweak that so it makes sense!

Edited by - Arnold Fribble on 05/26/2003 15:12:30

nr
SQLTeam MVY

12543 Posts

Posted - 2003-05-26 : 16:22:03

Not sure what this does in access
I assume it returns one row per PK1 group.

You could do this by using a correlated subquery for each of the other fields
As you say the first values from the subqueries will be meaningful as they are ordered by the PK.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2003-05-26 : 16:40:05

quote:

You could do this by using a correlated subquery for each of the other fields

I'm not sure what you have in mind here, does it involve a TOP or 3 levels of nested subqueries?

nr
SQLTeam MVY

12543 Posts

Posted - 2003-05-26 : 17:51:43

SELECT PK1,
PK2 = (select top 1 t1.PK2 from T t2 where t1.PK1 = t2.PK2 order by t1.PK1, t1.PK2, t1.PK3, t1.PK4) ,
PK3 = (select top 1 t1.PK3 from T t2 where t1.PK1 = t2.PK2 order by t1.PK1, t1.PK2, t1.PK3, t1.PK4) ,
PK4 = (select top 1 t1.PK4 from T t2 where t1.PK1 = t2.PK2 order by t1.PK1, t1.PK2, t1.PK3, t1.PK4) ,
C1 = (select top 1 t1.C1 from T t2 where t1.PK1 = t2.PK2 order by t1.PK1, t1.PK2, t1.PK3, t1.PK4) ,
C2 = (select top 1 t1.C2 from T t2 where t1.PK1 = t2.PK2 order by t1.PK1, t1.PK2, t1.PK3, t1.PK4) ,
C3 = (select top 1 t1.C3 from T t2 where t1.PK1 = t2.PK2 order by t1.PK1, t1.PK2, t1.PK3, t1.PK4) ,
from (select distinct PK1 from T) t1
order by PK1, PK2, PK3, PK4

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2003-05-27 : 08:04:35

I think you meant


SELECT PK1, 
PK2 = (select top 1 t2.PK2 from T t2 where t1.PK1 = t2.PK1 order by t2.PK1, t2.PK2, t2.PK3, t2.PK4) ,
PK3 = (select top 1 t2.PK3 from T t2 where t1.PK1 = t2.PK1 order by t2.PK1, t2.PK2, t2.PK3, t2.PK4) ,
PK4 = (select top 1 t2.PK4 from T t2 where t1.PK1 = t2.PK1 order by t2.PK1, t2.PK2, t2.PK3, t2.PK4) ,
C1 = (select top 1 t2.C1 from T t2 where t1.PK1 = t2.PK1  order by t2.PK1, t2.PK2, t2.PK3, t2.PK4) ,
C2 = (select top 1 t2.C2 from T t2 where t1.PK1 = t2.PK1 order by t2.PK1, t2.PK2, t2.PK3, t2.PK4) ,
C3 = (select top 1 t2.C3 from T t2 where t1.PK1 = t2.PK1 order by t2.PK1, t2.PK2, t2.PK3, t2.PK4)
from (select distinct PK1 from T) t1 
order by PK1, PK2, PK3, PK4

Yes, that's pretty good, unless the PK1 is highly selective. Still seems a shame to do all those seeks for the same row, but I can't find anything better; certainly the query plan I'm getting for this is disasterous (unless PK1 is highly selective!):


SELECT *
FROM T AS T1
WHERE EXISTS(
    SELECT *
    FROM (
        SELECT TOP 1 pk1, pk2, pk3, pk4
        FROM T AS T2
        WHERE T1.pk1 = T2.pk1
        ORDER BY pk2, pk3, pk4
      ) AS T2
    WHERE T1.pk2 = T2.pk2
      AND T1.pk3 = T2.pk3
      AND T1.pk4 = T2.pk4
  )

When I said about the nested subquery, what I had in mind is that, since this sort of thing can do its job without a join (given a clustered index starting (PK1, PK2)):


SELECT *
FROM T AS T1
WHERE pk2 = (
    SELECT MIN(pk2)
    FROM T AS T2
    WHERE T1.pk1 = T2.pk2
  )

then perhaps it could be extended to the other 2 columns of the pk like this:


SELECT *
FROM T AS T1
WHERE pk4 = (
    SELECT MIN(pk4)
    FROM T AS T2
    WHERE T1.pk1 = T2.pk1
      AND T1.pk2 = T2.pk2
      AND T1.pk3 = T2.pk3
      AND pk3 = (
        SELECT MIN(pk3)
        FROM T AS T3
        WHERE T2.pk1 = T3.pk1
          AND T2.pk2 = T3.pk2
          AND pk2 = (
            SELECT MIN(pk2)
            FROM T AS T4
            WHERE T3.pk1 = T4.pk1
          )
      )
  )

But it doesn't manage to collapse it into the one Segment and Top, so it ended up with a couple of merge joins as well. Ok, but not great.
What I did notice is SQL Server's lack of good variable unification in nested subqueries: the entirely equivalent query:


SELECT *
FROM T AS T1
WHERE pk4 = (
    SELECT MIN(pk4)
    FROM T AS T2
    WHERE T1.pk1 = T2.pk1
      AND T1.pk2 = T2.pk2
      AND T1.pk3 = T2.pk3
      AND pk3 = (
        SELECT MIN(pk3)
        FROM T AS T3
        WHERE T1.pk1 = T3.pk1
          AND T1.pk2 = T3.pk2
          AND pk2 = (
            SELECT MIN(pk2)
            FROM T AS T4
            WHERE T1.pk1 = T4.pk1
          )
      )
  )

Gives a different and much worse plan, though not as bad as moving all the comparisons to the innermost subquery:


SELECT *
FROM T AS T1
WHERE pk4 = (
    SELECT MIN(pk4)
    FROM T AS T2
    WHERE pk3 = (
        SELECT MIN(pk3)
        FROM T AS T3
        WHERE pk2 = (
            SELECT MIN(pk2)
            FROM T AS T4
            WHERE T1.pk1 = T2.pk1
              AND T1.pk2 = T2.pk2
              AND T1.pk3 = T2.pk3
              AND T2.pk1 = T3.pk1
              AND T2.pk2 = T3.pk2
              AND T3.pk1 = T4.pk1
           )
      )
  )

This is (in estimation, at least) astoundingly poor!

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources