Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
aiken
Aged Yak Warrior
525 Posts |
Posted - 2003-06-19 : 17:13:22
|
Part of my app shows a users' recent posts in a Snitz forum (the same forum software as here on SQLTeam).I'm seeing somewhat slugging performance on this query, and it gets run a *lot*. QA is showing that the problem is the bookmark lookup. I'm out of my depth here -- I don't even understand what a bookmark lookup is, so I'm not sure how to improve the query.Here's the actual query that's slow (it inserts topics and replies into a temp table, then selects the top 10 from the temp table -- I'm only posting the reply portion because that's what's slow):declare @iUsers intselect @iUsers=2 DECLARE @tPosts table ( CAT_ID int, TOPIC_ID int, FORUM_ID int, REPLY_ID int, P_DATE varchar(14), SEQ int ) insert into @tPosts (CAT_ID,TOPIC_ID,FORUM_ID,P_DATE,REPLY_ID,SEQ) select CAT_ID,TOPIC_ID,FORUM_ID,MAX(R_DATE),MAX(REPLY_ID),MAX(SEQ) from SNITZ_REPLY R where R_AUTHOR=@iUsers and not exists (Select * from @tPosts where TOPIC_ID=R.TOPIC_ID) and forum_id<>36 group by CAT_ID,TOPIC_ID,FORUM_ID ...and here's the showplan_all:declare @iUsers int select @iUsers=2 DECLARE @tPosts table ( CAT_ID int, TOPIC_ID int, FORUM_ID int, REPLY_ID int, P_DATE varchar(14), SEQ int ) insert into @tPosts (CAT_ID,TOPIC_ID,FORUM_ID,P_DATE,REPLY_ID,SEQ) select CAT_ID,TOPIC_ID,FORUM_ID,MAX(R_DATE),MAX(REPLY_ID),MAX(SEQ) from SNITZ_REPLY R where R_AUTHOR=@iUsers and not exists (Select * from @tPosts where TOPIC_ID=R.TOPIC_ID) and forum_id<>36 group by CAT_ID,TOPIC_ID,FORUM_ID |--Table Insert(OBJECT:(@tPosts), SET:(@tPosts.[SEQ]=[Expr1006], @tPosts.[REPLY_ID]=[Expr1005], @tPosts.[FORUM_ID]=[R].[FORUM_ID], @tPosts.[TOPIC_ID]=[R].[TOPIC_ID], @tPosts.[CAT_ID]=[R].[CAT_ID], @tPosts.[P_DATE]=[Expr1007])) |--Top(ROWCOUNT est 0) |--Compute Scalar(DEFINE:([Expr1007]=Convert([Expr1004]))) |--Stream Aggregate(GROUP BY:([R].[CAT_ID], [R].[TOPIC_ID], [R].[FORUM_ID]) DEFINE:([Expr1004]=MAX([R].[R_DATE]), [Expr1005]=MAX([R].[REPLY_ID]), [Expr1006]=MAX([R].[SEQ]))) |--Sort(ORDER BY:([R].[CAT_ID] ASC, [R].[TOPIC_ID] ASC, [R].[FORUM_ID] ASC)) |--Merge Join(Right Anti Semi Join, MERGE:(@tPosts.[TOPIC_ID])=([R].[TOPIC_ID]), RESIDUAL:(@tPosts.[TOPIC_ID]=[R].[TOPIC_ID])) |--Sort(DISTINCT ORDER BY:(@tPosts.[TOPIC_ID] ASC)) | |--Table Scan(OBJECT:(@tPosts)) |--Filter(WHERE:([R].[FORUM_ID]<36 OR [R].[FORUM_ID]>36)) |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([web].[dbo].[SNITZ_REPLY] AS [R])) |--Index Seek(OBJECT:([web].[dbo].[SNITZ_REPLY].[SNITZ_REPLY_R_AUTHOR] AS [R]), SEEK:([R].[R_AUTHOR]=[@iUsers]) ORDERED FORWARD) ...any ideas? Thanks-b |
|
|
setbasedisthetruepath
Used SQL Salesman
992 Posts |
Posted - 2003-06-19 : 17:26:46
|
| A bookmark lookup is the process of taking the RID from the index page and using it to look up the corresponding data page for the row in question. You'll see a bookmark lookup any time you use an index seek for a row but request a column not in the index.Two possible solutions:1) Put the column in the index, or2) Make the index clustered, making the index and data pages one and the same.Jonathan{0} |
 |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2003-06-19 : 17:36:19
|
| It is first filtering by R_AUTHOR - not a good idea for someone with a lot of posts (Rob).Can you include the top 10 with the query.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
aiken
Aged Yak Warrior
525 Posts |
Posted - 2003-06-19 : 18:27:39
|
| Hmm... I can't top 10 it, because there's some filtering that goes on after and I need 10 results, but I'll bet I could top 20 it or something.However, I would then need to order by the max(r_date) desc, which I think creates the same problem. And, don't the MAX() functions also force it to look at all rows even if I added a "top" to it?And I'm confused by the "put the column in the index" suggestion -- right now, the index it's using *is* r_author. The SNITZ_REPLY_R_AUTHOR index, which I read the plan to say it's using, is on the r_author column only.Thanks -- sorry to be dense here, but it's part of my contunuing education :)Cheers-b |
 |
|
|
graz
Chief SQLTeam Crack Dealer
4149 Posts |
Posted - 2003-06-19 : 22:52:48
|
| What are you trying to display?===============================================Creating tomorrow's legacy systems today.One crisis at a time. |
 |
|
|
setbasedisthetruepath
Used SQL Salesman
992 Posts |
Posted - 2003-06-20 : 09:56:02
|
| You're causing a bookmark lookup on the forum_ID column. Put that in the index to avoid the lookup (ie make a composite index with 2 columns), or convert the SNITZ_REPLY_R_AUTHOR to a clustered index.Jonathan{0} |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2003-06-20 : 10:39:50
|
| I don't think the problem is the SQL you've posted.you are declaring a table variable, which is by default empty.then, you insert some rows into it, checking to make sure that they are not already IN the table variable? But that table variable has no rows in it ....is there a loop surrounding this whole thing and it is being repeated over and over? If not, why is there the check for existence in the table variable?I have a feeling you have a WHILE loop running the same query (with different parameters) over and over ... and THAT is your performance problem, I would guess. how 'bout some more info.- JeffEdited by - jsmith8858 on 06/20/2003 10:40:50 |
 |
|
|
setbasedisthetruepath
Used SQL Salesman
992 Posts |
Posted - 2003-06-20 : 10:45:10
|
| As written this can't be run in a loop since he'd be repeatedly declaring the table variable. Unless Aiken has cut and pasted ...Jonathan{0} |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2003-06-20 : 10:50:02
|
quote: As written this can't be run in a loop since he'd be repeatedly declaring the table variable. Unless Aiken has cut and pasted ...
Exactly my point. As posted it makes no sense. What is the point of:and not exists (Select * from @tPosts where TOPIC_ID=R.TOPIC_ID) if it is NOT running in some sort of loop ...- Jeff |
 |
|
|
aiken
Aged Yak Warrior
525 Posts |
Posted - 2003-06-20 : 14:50:13
|
I should learn to not try to only post the relevant bits The reason for the "not exists" is that, really, between the creation of the table variable and the population with replies, the table variable gets populated with topics. The idea is that if someone both started a topic and replied to it, I only want the initial topic, not the reply. I just didn't include it because that query is fast as all getout, and didn't seem relevant to the problem.The overall intent is "Show me that last 10 topics that user X has either posted or replied to, with no more than one row for any given topic. If they replied more than once, only show the last reply. If they started the topic *and* replied, only show the initial topic".Thanks, setbasedistheverylonghandle. I didn't get where the bookmark was. I've added forum_id to the index, and the query is about 4 times faster.However, it created another bookmark lookup! I'm posting the showplan here, but in addition to telling me what column it's on, would someone tell me how to read the plan so I can see that for myself? I still don't see where the first example tells one that it's the forum_id column.New showplan:declare @iUsers int select @iUsers=2 select top 20 CAT_ID,TOPIC_ID,FORUM_ID,MAX(R_DATE),MAX(REPLY_ID),MAX(SEQ) from SNITZ_REPLY R where R_AUTHOR=@iUsers group by CAT_ID,TOPIC_ID,FORUM_ID |--Top(20) |--Stream Aggregate(GROUP BY:([R].[CAT_ID], [R].[TOPIC_ID], [R].[FORUM_ID]) DEFINE:([Expr1001]=MAX([R].[R_DATE]), [Expr1002]=MAX([R].[REPLY_ID]), [Expr1003]=MAX([R].[SEQ]))) |--Sort(ORDER BY:([R].[CAT_ID] ASC, [R].[TOPIC_ID] ASC, [R].[FORUM_ID] ASC)) |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([web].[dbo].[SNITZ_REPLY] AS [R])) |--Index Seek(OBJECT:([web].[dbo].[SNITZ_REPLY].[SNITZ_REPLY_R_AUTHOR] AS [R]), SEEK:([R].[R_AUTHOR]=[@iUsers]) ORDERED FORWARD) Thanks, everyone!-bEdited by - aiken on 06/20/2003 14:56:28 |
 |
|
|
setbasedisthetruepath
Used SQL Salesman
992 Posts |
Posted - 2003-06-20 : 15:59:45
|
| The operator directly above the bookmark lookup tells you. In the first SHOWPLAN, this would be the Filter operator which referenced the forum_id column. In the second SHOWPLAN, it's the Sort operator which referenced CAT_ID, TOPIC_ID, and FORUM_ID.Rather than adding more columns to the index, try making it clustered. Then there will be no bookmark lookups.Jonathan{0} |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-06-20 : 16:14:29
|
| Or, how about...Create a new table, populate it once with everyones "Last" set up info. Make R_AUTH the key.Create a trigger, and when an insert into your other table occurs, update the new table with the values stored in the INSERTED Table?waht'dya think?Brett8-) |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-06-20 : 16:19:44
|
| Ok, just read...it seems you want just the "recent" post, what I suggested was for the last one...okHow about adding a dattime column with a default of getdate() (so as to not impact any code), and just select post for the last 2 weeks or something.It seems like you're going to get every post ever...EDIT: (That some real good englash you be speaking at me thar)Brett8-)Edited by - x002548 on 06/20/2003 16:20:36 |
 |
|
|
aiken
Aged Yak Warrior
525 Posts |
Posted - 2003-06-20 : 16:51:46
|
| Thanks, everyone!I can't make it clustered without hurting performance in the actual forums; the current changes have been enough to reduce the load times of a very busy page by about 150ms / load, which is fantastic.Thanks!-b |
 |
|
|
graz
Chief SQLTeam Crack Dealer
4149 Posts |
Posted - 2003-06-20 : 19:43:50
|
This is pretty ugly but it works:select top 10 * from (select distinct TOPIC_ID, t_date as post_date, 't' as source from forum_topics where t_author = 100 UNION select distinct TOPIC_ID, r_date as post_date, 'r' as source from forum_reply where r_author = 100) as d1order by 2 desc You might do better to run two "distinct top 10" inserts into into a table variable and then select the "distinct top 10" from that. I'm curious how this performs compared to what you've got now.===============================================Creating tomorrow's legacy systems today.One crisis at a time. |
 |
|
|
elebetsamer
Starting Member
12 Posts |
Posted - 2003-06-21 : 09:41:27
|
| This could be a shot in the dark, but it ended up working for a huge stored procedure that we have. Try putting primary keys/indexes, etc... on the table variable. If you have a large amount of data in that table, then it may be hurting you to do the "Exists". Like I said, it's a shot in the dark, but it worked for me, and gave me a big performance boost.--Eric |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2003-06-21 : 10:15:26
|
| You can put a primary key on a table variable, but you cannot create indexes on it. Nor can you make anything clustered, which isn't an issue since the table variable would be in RAM. Still, not being able to use indexes can be a problem:http://www.sqlteam.com/Forums/topic.asp?TOPIC_ID=22212Hate to sound like a broken record, but what about using a temp table? Have you at least TRIED it? You can index the temp table any way you like, the more the merrier. And if it is accessed frequently enough, it will stay in RAM anyway just like a table variable, but with the added benefit of indexes. It will also let you create the covering indexes that Jonathan suggested, and will let the optimizer do an index seek. |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2003-06-21 : 10:26:04
|
| And I'd like to ask a favor:DON'T PUT SHOWPLAN RESULTS INSIDE CODE TAGS!!!!Let the reader worry about interpreting the indenting. If you have to, change the leading spaces to some other character that doesn't collapse, like a dash or asterisk. If you have to put something in code tags, break up long lines manually. It's ridiculous that what I'm typing here will fit on one line because of the length of a code block in this thread.Sorry about the rant, nothing personal, but I can't read anything in this thread for all the scrolling I have to do. |
 |
|
|
aiken
Aged Yak Warrior
525 Posts |
Posted - 2003-06-21 : 14:18:14
|
Thanks for that, graz -- it looks really promising. I've modified it to include the t_subject, since I need that, and so far it's quite a bit faster than my current approach with the table variable. However, it doesn't meet the requirement of only including the most recent reply when a user has replied to the same topic more than once. Any ideas? Here's what I've got now:select top 10 * from (select distinct TOPIC_ID, t_date as post_date, forum_id, t_subject, 't' as source from snitz_topics where t_author = 2 UNION select distinct snitz_reply.TOPIC_ID, r_date as post_date, snitz_reply.forum_id, t_subject, 'r' as source from snitz_reply join snitz_topics on snitz_reply.topic_id=snitz_topics.topic_id where r_author = 2) as d1order by 2 desc And sorry, Rob!Cheers-bEdited by - aiken on 06/21/2003 14:18:42 |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2003-06-21 : 15:59:15
|
what about:select top 10 * from (select distinct TOPIC_ID, t_date as post_date, forum_id, t_subject, 't' as source from snitz_topics where t_author = 2 UNION select snitz_reply.TOPIC_ID, Max(r_date) as post_date, snitz_reply.forum_id, t_subject, 'r' as source from snitz_reply join snitz_topics on snitz_reply.topic_id=snitz_topics.topic_id where r_author = 2 and t_author <> 2 GROUP BY snitz_reply.topic_id, snitz_Reply.forum_id, t_subject ) as d1order by 2 desc or is that too close to the original query and thus too inefficient?- JeffEdited by - jsmith8858 on 06/21/2003 16:06:23 |
 |
|
|
aiken
Aged Yak Warrior
525 Posts |
Posted - 2003-06-21 : 18:30:03
|
Thanks, Jeff -- the max() and group by makes a lot of sense.However, the query is getting to be almost as slow as the original.Here's what I've got now:create procedure p_snitz_RecentTopics2 (@iUsers int) AS set nocount onselect distinct top 10 topic_id,post_date,forum_id,f_subject,t_subject from (select distinct TOPIC_ID, t_date as post_date, snitz_topics.forum_id, f_subject, t_subject, 't' as source from snitz_topics join snitz_forum on snitz_topics.forum_id=snitz_forum.forum_id where t_author = @iUsers and snitz_topics.forum_id<>36 UNION select distinct snitz_reply.TOPIC_ID, max(r_date) as post_date, snitz_reply.forum_id, f_subject, t_subject, 'r' as source from snitz_reply join snitz_topics on snitz_reply.topic_id=snitz_topics.topic_id join snitz_forum on snitz_reply.forum_id=snitz_forum.forum_id where r_author = @iUsers and snitz_topics.t_author<>1 and snitz_reply.forum_id<>36 GROUP BY snitz_reply.topic_id, snitz_Reply.forum_id, t_subject, f_subject) as d1order by 2 desc ...And that's taking an average of 120ms to run, whereas theoptimized temp table approach is taking 44ms (both tested witha sample size of 100 runs).Cheers-b |
 |
|
|
Next Page
|
|
|
|
|