Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

Slow snitz query...

Author

Topic

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-06-19 : 17:13:22

Part of my app shows a users' recent posts in a Snitz forum (the same forum software as here on SQLTeam).

I'm seeing somewhat slugging performance on this query, and it gets run a *lot*. QA is showing that the problem is the bookmark lookup. I'm out of my depth here -- I don't even understand what a bookmark lookup is, so I'm not sure how to improve the query.

Here's the actual query that's slow (it inserts topics and replies into a temp table, then selects the top 10 from the temp table -- I'm only posting the reply portion because that's what's slow):

declare @iUsers int
select @iUsers=2

	DECLARE @tPosts table ( CAT_ID int,
			TOPIC_ID int,
			FORUM_ID int,
			REPLY_ID int,
			P_DATE varchar(14),
			SEQ int
			)

	insert into @tPosts (CAT_ID,TOPIC_ID,FORUM_ID,P_DATE,REPLY_ID,SEQ)
  		select CAT_ID,TOPIC_ID,FORUM_ID,MAX(R_DATE),MAX(REPLY_ID),MAX(SEQ)
   		from SNITZ_REPLY R
   		where R_AUTHOR=@iUsers
     			and not exists (Select * from @tPosts where TOPIC_ID=R.TOPIC_ID) 
			 and forum_id<>36
   		group by CAT_ID,TOPIC_ID,FORUM_ID

...and here's the showplan_all:

declare @iUsers int  select @iUsers=2
 DECLARE @tPosts table ( CAT_ID int,     TOPIC_ID int,     FORUM_ID int,     REPLY_ID int,     P_DATE varchar(14),     SEQ int     )     insert into @tPosts (CAT_ID,TOPIC_ID,FORUM_ID,P_DATE,REPLY_ID,SEQ)      select CAT_ID,TOPIC_ID,FORUM_ID,MAX(R_DATE),MAX(REPLY_ID),MAX(SEQ)       from SNITZ_REPLY R       where R_AUTHOR=@iUsers          and not exists (Select * from @tPosts where TOPIC_ID=R.TOPIC_ID)       and forum_id<>36       group by CAT_ID,TOPIC_ID,FORUM_ID
  |--Table Insert(OBJECT:(@tPosts), SET:(@tPosts.[SEQ]=[Expr1006], @tPosts.[REPLY_ID]=[Expr1005], @tPosts.[FORUM_ID]=[R].[FORUM_ID], @tPosts.[TOPIC_ID]=[R].[TOPIC_ID], @tPosts.[CAT_ID]=[R].[CAT_ID], @tPosts.[P_DATE]=[Expr1007]))
       |--Top(ROWCOUNT est 0)
            |--Compute Scalar(DEFINE:([Expr1007]=Convert([Expr1004])))
                 |--Stream Aggregate(GROUP BY:([R].[CAT_ID], [R].[TOPIC_ID], [R].[FORUM_ID]) DEFINE:([Expr1004]=MAX([R].[R_DATE]), [Expr1005]=MAX([R].[REPLY_ID]), [Expr1006]=MAX([R].[SEQ])))
                      |--Sort(ORDER BY:([R].[CAT_ID] ASC, [R].[TOPIC_ID] ASC, [R].[FORUM_ID] ASC))
                           |--Merge Join(Right Anti Semi Join, MERGE:(@tPosts.[TOPIC_ID])=([R].[TOPIC_ID]), RESIDUAL:(@tPosts.[TOPIC_ID]=[R].[TOPIC_ID]))
                                |--Sort(DISTINCT ORDER BY:(@tPosts.[TOPIC_ID] ASC))
                                |    |--Table Scan(OBJECT:(@tPosts))
                                |--Filter(WHERE:([R].[FORUM_ID]<36 OR [R].[FORUM_ID]>36))
                                     |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([web].[dbo].[SNITZ_REPLY] AS [R]))
                                          |--Index Seek(OBJECT:([web].[dbo].[SNITZ_REPLY].[SNITZ_REPLY_R_AUTHOR] AS [R]), SEEK:([R].[R_AUTHOR]=[@iUsers]) ORDERED FORWARD)

...any ideas?

Thanks
-b

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2003-06-19 : 17:26:46

A bookmark lookup is the process of taking the RID from the index page and using it to look up the corresponding data page for the row in question. You'll see a bookmark lookup any time you use an index seek for a row but request a column not in the index.

Two possible solutions:
1) Put the column in the index, or
2) Make the index clustered, making the index and data pages one and the same.

Jonathan
{0}

nr
SQLTeam MVY

12543 Posts

Posted - 2003-06-19 : 17:36:19

It is first filtering by R_AUTHOR - not a good idea for someone with a lot of posts (Rob).
Can you include the top 10 with the query.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-06-19 : 18:27:39

Hmm... I can't top 10 it, because there's some filtering that goes on after and I need 10 results, but I'll bet I could top 20 it or something.

However, I would then need to order by the max(r_date) desc, which I think creates the same problem. And, don't the MAX() functions also force it to look at all rows even if I added a "top" to it?

And I'm confused by the "put the column in the index" suggestion -- right now, the index it's using *is* r_author. The SNITZ_REPLY_R_AUTHOR index, which I read the plan to say it's using, is on the r_author column only.

Thanks -- sorry to be dense here, but it's part of my contunuing education :)

Cheers
-b

graz
Chief SQLTeam Crack Dealer

4149 Posts

Posted - 2003-06-19 : 22:52:48

What are you trying to display?

===============================================
Creating tomorrow's legacy systems today.
One crisis at a time.

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2003-06-20 : 09:56:02

You're causing a bookmark lookup on the forum_ID column. Put that in the index to avoid the lookup (ie make a composite index with 2 columns), or convert the SNITZ_REPLY_R_AUTHOR to a clustered index.

Jonathan
{0}

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2003-06-20 : 10:39:50

I don't think the problem is the SQL you've posted.

you are declaring a table variable, which is by default empty.

then, you insert some rows into it, checking to make sure that they are not already IN the table variable? But that table variable has no rows in it ....

is there a loop surrounding this whole thing and it is being repeated over and over? If not, why is there the check for existence in the table variable?

I have a feeling you have a WHILE loop running the same query (with different parameters) over and over ... and THAT is your performance problem, I would guess.

how 'bout some more info.

- Jeff

Edited by - jsmith8858 on 06/20/2003 10:40:50

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2003-06-20 : 10:45:10

As written this can't be run in a loop since he'd be repeatedly declaring the table variable. Unless Aiken has cut and pasted ...

Jonathan
{0}

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2003-06-20 : 10:50:02

quote:

As written this can't be run in a loop since he'd be repeatedly declaring the table variable. Unless Aiken has cut and pasted ...

Exactly my point. As posted it makes no sense. What is the point of:


and not exists (Select * from @tPosts where TOPIC_ID=R.TOPIC_ID)

if it is NOT running in some sort of loop ...

- Jeff

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-06-20 : 14:50:13

I should learn to not try to only post the relevant bits

The reason for the "not exists" is that, really, between the creation of the table variable and the population with replies, the table variable gets populated with topics. The idea is that if someone both started a topic and replied to it, I only want the initial topic, not the reply. I just didn't include it because that query is fast as all getout, and didn't seem relevant to the problem.

The overall intent is "Show me that last 10 topics that user X has either posted or replied to, with no more than one row for any given topic. If they replied more than once, only show the last reply. If they started the topic *and* replied, only show the initial topic".

Thanks, setbasedistheverylonghandle. I didn't get where the bookmark was. I've added forum_id to the index, and the query is about 4 times faster.

However, it created another bookmark lookup! I'm posting the showplan here, but in addition to telling me what column it's on, would someone tell me how to read the plan so I can see that for myself? I still don't see where the first example tells one that it's the forum_id column.

New showplan:

declare @iUsers int  select @iUsers=2
 select top 20 CAT_ID,TOPIC_ID,FORUM_ID,MAX(R_DATE),MAX(REPLY_ID),MAX(SEQ)       from SNITZ_REPLY R       where R_AUTHOR=@iUsers       group by CAT_ID,TOPIC_ID,FORUM_ID
  |--Top(20)
       |--Stream Aggregate(GROUP BY:([R].[CAT_ID], [R].[TOPIC_ID], [R].[FORUM_ID]) DEFINE:([Expr1001]=MAX([R].[R_DATE]), [Expr1002]=MAX([R].[REPLY_ID]), [Expr1003]=MAX([R].[SEQ])))
            |--Sort(ORDER BY:([R].[CAT_ID] ASC, [R].[TOPIC_ID] ASC, [R].[FORUM_ID] ASC))
                 |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([web].[dbo].[SNITZ_REPLY] AS [R]))
                      |--Index Seek(OBJECT:([web].[dbo].[SNITZ_REPLY].[SNITZ_REPLY_R_AUTHOR] AS [R]), SEEK:([R].[R_AUTHOR]=[@iUsers]) ORDERED FORWARD)

Thanks, everyone!

-b

Edited by - aiken on 06/20/2003 14:56:28

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2003-06-20 : 15:59:45

The operator directly above the bookmark lookup tells you. In the first SHOWPLAN, this would be the Filter operator which referenced the forum_id column. In the second SHOWPLAN, it's the Sort operator which referenced CAT_ID, TOPIC_ID, and FORUM_ID.

Rather than adding more columns to the index, try making it clustered. Then there will be no bookmark lookups.

Jonathan
{0}

X002548
Not Just a Number

15586 Posts

Posted - 2003-06-20 : 16:14:29

Or, how about...

Create a new table, populate it once with everyones "Last" set up info. Make R_AUTH the key.

Create a trigger, and when an insert into your other table occurs, update the new table with the values stored in the INSERTED Table?

waht'dya think?

Brett

8-)

X002548
Not Just a Number

15586 Posts

Posted - 2003-06-20 : 16:19:44

Ok, just read...it seems you want just the "recent" post, what I suggested was for the last one...ok

How about adding a dattime column with a default of getdate() (so as to not impact any code), and just select post for the last 2 weeks or something.

It seems like you're going to get every post ever...

EDIT: (That some real good englash you be speaking at me thar)

Brett

8-)

Edited by - x002548 on 06/20/2003 16:20:36

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-06-20 : 16:51:46

Thanks, everyone!

I can't make it clustered without hurting performance in the actual forums; the current changes have been enough to reduce the load times of a very busy page by about 150ms / load, which is fantastic.

Thanks!

-b

graz
Chief SQLTeam Crack Dealer

4149 Posts

Posted - 2003-06-20 : 19:43:50

This is pretty ugly but it works:


select top 10  * 
from (select distinct TOPIC_ID, t_date as post_date, 't' as source
	from forum_topics
	where t_author = 100
	UNION
	select distinct TOPIC_ID, r_date as post_date, 'r' as source
	from forum_reply
	where r_author = 100
) as d1
order by 2 desc

You might do better to run two "distinct top 10" inserts into into a table variable and then select the "distinct top 10" from that. I'm curious how this performs compared to what you've got now.

===============================================
Creating tomorrow's legacy systems today.
One crisis at a time.

elebetsamer
Starting Member

12 Posts

Posted - 2003-06-21 : 09:41:27

This could be a shot in the dark, but it ended up working for a huge stored procedure that we have. Try putting primary keys/indexes, etc... on the table variable. If you have a large amount of data in that table, then it may be hurting you to do the "Exists". Like I said, it's a shot in the dark, but it worked for me, and gave me a big performance boost.

--Eric

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-06-21 : 10:15:26

You can put a primary key on a table variable, but you cannot create indexes on it. Nor can you make anything clustered, which isn't an issue since the table variable would be in RAM. Still, not being able to use indexes can be a problem:

http://www.sqlteam.com/Forums/topic.asp?TOPIC_ID=22212

Hate to sound like a broken record, but what about using a temp table? Have you at least TRIED it? You can index the temp table any way you like, the more the merrier. And if it is accessed frequently enough, it will stay in RAM anyway just like a table variable, but with the added benefit of indexes. It will also let you create the covering indexes that Jonathan suggested, and will let the optimizer do an index seek.

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-06-21 : 10:26:04

And I'd like to ask a favor:

DON'T PUT SHOWPLAN RESULTS INSIDE CODE TAGS!!!!

Let the reader worry about interpreting the indenting. If you have to, change the leading spaces to some other character that doesn't collapse, like a dash or asterisk. If you have to put something in code tags, break up long lines manually. It's ridiculous that what I'm typing here will fit on one line because of the length of a code block in this thread.

Sorry about the rant, nothing personal, but I can't read anything in this thread for all the scrolling I have to do.

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-06-21 : 14:18:14

Thanks for that, graz -- it looks really promising. I've modified it to include the t_subject, since I need that, and so far it's quite a bit faster than my current approach with the table variable. However, it doesn't meet the requirement of only including the most recent reply when a user has replied to the same topic more than once. Any ideas? Here's what I've got now:

select top 10  * 
from (select distinct TOPIC_ID, t_date as post_date, forum_id, t_subject, 't' as source
	from snitz_topics
	where t_author = 2
	UNION
	select distinct snitz_reply.TOPIC_ID, r_date as post_date, snitz_reply.forum_id, t_subject, 'r' as source
	from snitz_reply
	  join snitz_topics on snitz_reply.topic_id=snitz_topics.topic_id
	where r_author = 2
) as d1
order by 2 desc

And sorry, Rob!

Cheers
-b

Edited by - aiken on 06/21/2003 14:18:42

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2003-06-21 : 15:59:15

what about:


select top 10  * 
from (select distinct TOPIC_ID, t_date as post_date, forum_id, t_subject, 't' as source
	from snitz_topics
	where t_author = 2
	UNION
	select snitz_reply.TOPIC_ID, Max(r_date) as post_date, snitz_reply.forum_id, t_subject, 'r' as source
	from snitz_reply
	  join snitz_topics on snitz_reply.topic_id=snitz_topics.topic_id
	where r_author = 2  and t_author <> 2
        GROUP BY snitz_reply.topic_id, snitz_Reply.forum_id, t_subject 
) as d1
order by 2 desc

or is that too close to the original query and thus too inefficient?

- Jeff

Edited by - jsmith8858 on 06/21/2003 16:06:23

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-06-21 : 18:30:03

Thanks, Jeff -- the max() and group by makes a lot of sense.

However, the query is getting to be almost as slow as the original.

Here's what I've got now:

create procedure p_snitz_RecentTopics2 (@iUsers int) AS 
set nocount on

select distinct top 10 topic_id,post_date,forum_id,f_subject,t_subject 
from (select distinct TOPIC_ID, t_date as post_date, snitz_topics.forum_id, f_subject, t_subject, 't' as source
	from snitz_topics
		join snitz_forum on snitz_topics.forum_id=snitz_forum.forum_id
	where t_author = @iUsers and snitz_topics.forum_id<>36
	UNION
	select distinct snitz_reply.TOPIC_ID, max(r_date) as post_date, snitz_reply.forum_id, f_subject, t_subject, 'r' as source
	from snitz_reply
	  join snitz_topics on snitz_reply.topic_id=snitz_topics.topic_id
	join snitz_forum on snitz_reply.forum_id=snitz_forum.forum_id
	where r_author = @iUsers and snitz_topics.t_author<>1 and snitz_reply.forum_id<>36
	GROUP BY snitz_reply.topic_id, snitz_Reply.forum_id, t_subject, f_subject
) as d1
order by 2 desc

...And that's taking an average of 120ms to run, whereas the
optimized temp table approach is taking 44ms (both tested with
a sample size of 100 runs).

Cheers
-b

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources