Why is this group by slow? - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

Why is this group by slow?

Author

Topic

phenreid
Starting Member

29 Posts

Posted - 2010-11-18 : 21:38:58

[code]
select t.tranid, t.LName, t.L4ssn from TestAdds t,
(select Lname, L4SSN from udf_AM_vw('Passed Value')) A where t.lname=a.lname and t.L4ssn=a.l4ssn
and LEN(t.l4ssn)=4 --group by t.tranid, t.LName, t.L4ssn
[/code]

t.tranid = PK; index on all fields on both sides of the implied join.

Returns 980 records.

Runs in 1 second without the group by. With the group by runs forever... after a five minutes I aborted.

I tried the equivalent of above with JOIN -- same result. If I do a make table query with temp table, then the group by works.

I tried making the whole thing a DT and then selecting/ grouping on that -- same problem; runs forever.

I want to understand not only the solution but what is going on here that would cause a group by to behave this way. I feel like there is a fundamental concept I'm missing.

You might ask what the UDF is doing, but the question I meant to ask is why adding the "group by" would cause it to hang -- clearly the UDF isn't the problem since without the group by it runs instantly.

Thanks!

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-11-18 : 21:48:29

Show us the estimated execution plan for both queries.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-11-18 : 21:49:42

Why do you even need a group by though? You aren't using any aggregate functions. How about just use DISTINCT?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

phenreid
Starting Member

29 Posts

Posted - 2010-11-19 : 13:06:54

I tried both distinct and group by originally; no difference. (I thought I read they were equivalent and that it was best to avoid the "distinct" word when you could accomplish same with "group by".)

udf_AM_vw creates a many-to-many relation. I solved my immediate problem by making a new UDF that gives me a one-to-one match, but I want to understand how I could have accomplished this. I want SQL to first come up with my 980 records THEN aggregate them or show me DISTINCT. If I do this in two steps with a temp table, it's instant. But somehow it insists on doing something more complex no matter how I group the SQL statement. What is the trick/syntax to making it work like two separate steps?

I would be happy to post the execution plan but I can't post a graphic on this forum and you probably don't want me to paste the giant XML -- what is the etiquette/method here for posting the execution plan. I would love to learn how to read those properly.

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-11-19 : 13:26:22

Upload the jpg to some site and then provide us the link to view it.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

phenreid
Starting Member

29 Posts

Posted - 2010-11-19 : 13:55:57

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-11-19 : 14:16:11

The clustered index scan is your problem. I assume the Members table is being reference in the udf, since I don't see the object in your query. Can you post the UDF?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

phenreid
Starting Member

29 Posts

Posted - 2010-11-19 : 14:45:50

Unfortunately, I can't post the UDF on a public forum. It is a select on a view that is PK-to-PK of two large tables and uses the DecryptByPassPhrase function on a number of fields. That function has not caused me any problems or issues up to now. As I mentioned -- without the DISTINCT keyword, select returns 980 records very quickly. Add the DISTINCT keyword and you get the exec plan above.

So, it sounds like there is no "silver bullet" way to phrase SQL statements so they will always SELECT the 980 records first THEN apply a DISTINCT to them. I assume the optimizer does its own thing and a UDF table could throw the optimizer off -- so you just have to look at each situation independently. Or, if this is one-time SQL work -- if DISTINCT slow, just use temp tables because for non-experts it can take awhile to find the cause/solution. And, if it is something recurring, then you have to dig into it and find a faster way.

Is that basically how the experts handle it?

I'm just surprised I couldn't phrase the statement so that it returns the 980 records first (as a DT) then: Select Distinct field1, field2 from (DT returning 980 records). It doesn't seem to care about the phrasing as either way I end up with the clustered index scan above. Only with two completely separate passes with temp tables I don't.

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-11-19 : 14:49:37

quote:
Originally posted by phenreid

I assume the optimizer does its own thing and a UDF table could throw the optimizer off -- so you just have to look at each situation independently.

Bingo! The recommendation is to avoid UDFs if at all possible. They can cause major performance problems.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources