Author |
Topic |
vishal_7
Posting Yak Master
127 Posts |
Posted - 2005-05-31 : 20:12:46
|
Hello, I have two tables. The first tables contains a set of keywords and looks like this:It has approx 40 rows.CREATE TABLE [REBLOGGER_KEYWORDS] ( [id] [int] IDENTITY (1, 1) NOT NULL , [keyword_display] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [keyword1] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [keyword2] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [keyword3] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [keyword4] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [keyword5] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [keyword6] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [priority] [int] NULL ) ON [PRIMARY]GO The second table looks like this and has approx 6000 rows.CREATE TABLE [REBLOGGER_POSTS] ( [ID] [int] IDENTITY (1, 1) NOT NULL , [BLOG_ID] [int] NOT NULL , [POST_PATH] [varchar] (500) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [POST_URL] [varchar] (500) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [POST_TITLE] [varchar] (500) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [POST] [text] COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [POST_DATE] [datetime] NOT NULL , [POST_ACTIVE] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL CONSTRAINT [DF_REBLOG_POSTS_POST_ACTIVE] DEFAULT ('Y'), [keyword_id] [int] NULL , CONSTRAINT [PK_REBLOG_POSTS] PRIMARY KEY NONCLUSTERED ( [ID] ) WITH FILLFACTOR = 90 ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]GO The first problem is that when I do a select * from REBLOGGER_POSTS it takes over 40 secs to just return 3000 rows. The second problem is that I need to check each keyword with the posts table to see if a keyword is available in a post or post_title - where post is a text field. If it has a keyword then I just add the keywordid to the posts table. I need to run this so that it checks each post regardless if it already has a keywordid or not. The idea is to reply the keywords to the posts. For that purpose I am using the following sp, which I am sure is totally not correct.CREATE procedure pr_REBLOGGER_RankPosts asset nocount ondeclare @mId int, @mKeyWord1 varchar(25),@mKeyWord2 varchar(25),@mKeyWord3 varchar(25),@mKeyWord4 varchar(25),@mKeyWord5 varchar(25),@mKeyWord6 varchar(25)declare @updSQL nvarchar(4000)declare rate_crsr CURSOR FAST_FORWARD FOR select id, keyword1, keyword2, keyword3, keyword4, keyword5, keyword6 from reblogger_keywords order by priority/*open cursor*/open rate_crsr/* get first row from cursor to start with*/fetch next from rate_crsr into @mid, @mKeyWord1, @mKeyWord2, @mKeyWord3, @mKeyWord4, @mKeyWord5,@mKeyWord6/* loop through all records fetching one row at a time*/set @updSQL = 'UPDATE REBLOGGER_POSTS SET KEYWORD_id = NULL'exec sp_executesql @updSQLwhile @@fetch_status = 0begin /* update reblogger_posts for matching keyword in the post title*/ if len(@mkeyword1) > 0 begin set @updSQL = 'UPDATE REBLOGGER_POSTS SET KEYWORD_id = ''' + cast(@mid as varchar(10)) + ''' where post_title like ''%' + @mkeyword1 + '%'' and KEYWORD_ID is null' /*select @updsql*/ exec sp_executesql @updSQL /* update reblogger_posts for matching keyword in the post body*/ set @updSQL = 'UPDATE REBLOGGER_POSTS SET KEYWORD_ID = ''' + cast(@mid as varchar(10)) + ''' where post like ''%' + @mkeyword1 + '%'' and KEYWORD_ID is null' /*select @updsql*/ exec sp_executesql @updSQL end if len(@mkeyword2) > 0 begin set @updSQL = 'UPDATE REBLOGGER_POSTS SET KEYWORD_id = ''' + cast(@mid as varchar(10)) + ''' where post_title like ''%' + @mkeyword2 + '%'' and KEYWORD_ID is null' /*select @updsql*/ exec sp_executesql @updSQL /* update reblogger_posts for matching keyword in the post body*/ set @updSQL = 'UPDATE REBLOGGER_POSTS SET KEYWORD_ID = ''' + cast(@mid as varchar(10)) + ''' where post like ''%' + @mkeyword2 + '%'' and KEYWORD_ID is null' /*select @updsql*/ exec sp_executesql @updSQL end if len(@mkeyword3) > 0 begin set @updSQL = 'UPDATE REBLOGGER_POSTS SET KEYWORD_id = ''' + cast(@mid as varchar(10)) + ''' where post_title like ''%' + @mkeyword3 + '%'' and KEYWORD_ID is null' /*select @updsql*/ exec sp_executesql @updSQL /* update reblogger_posts for matching keyword in the post body*/ set @updSQL = 'UPDATE REBLOGGER_POSTS SET KEYWORD_ID = ''' + cast(@mid as varchar(10)) + ''' where post like ''%' + @mkeyword3 + '%'' and KEYWORD_ID is null' /*select @updsql*/ exec sp_executesql @updSQL end if len(@mkeyword4) > 0 begin set @updSQL = 'UPDATE REBLOGGER_POSTS SET KEYWORD_id = ''' + cast(@mid as varchar(10)) + ''' where post_title like ''%' + @mkeyword4 + '%'' and KEYWORD_ID is null' /*select @updsql*/ exec sp_executesql @updSQL /* update reblogger_posts for matching keyword in the post body*/ set @updSQL = 'UPDATE REBLOGGER_POSTS SET KEYWORD_ID = ''' + cast(@mid as varchar(10)) + ''' where post like ''%' + @mkeyword4 + '%'' and KEYWORD_ID is null' /*select @updsql*/ exec sp_executesql @updSQL end if len(@mkeyword5) > 0 begin set @updSQL = 'UPDATE REBLOGGER_POSTS SET KEYWORD_id = ''' + cast(@mid as varchar(10)) + ''' where post_title like ''%' + @mkeyword5 + '%'' and KEYWORD_ID is null' /*select @updsql*/ exec sp_executesql @updSQL /* update reblogger_posts for matching keyword in the post body*/ set @updSQL = 'UPDATE REBLOGGER_POSTS SET KEYWORD_ID = ''' + cast(@mid as varchar(10)) + ''' where post like ''%' + @mkeyword5 + '%'' and KEYWORD_ID is null' /*select @updsql*/ exec sp_executesql @updSQL end if len(@mkeyword6) > 0 begin set @updSQL = 'UPDATE REBLOGGER_POSTS SET KEYWORD_id = ''' + cast(@mid as varchar(10)) + ''' where post_title like ''%' + @mkeyword6 + '%'' and KEYWORD_ID is null' /*select @updsql*/ exec sp_executesql @updSQL /* update reblogger_posts for matching keyword in the post body*/ set @updSQL = 'UPDATE REBLOGGER_POSTS SET KEYWORD_ID = ''' + cast(@mid as varchar(10)) + ''' where post like ''%' + @mkeyword6 + '%'' and KEYWORD_ID is null' /*select @updsql*/ exec sp_executesql @updSQL end /* fetch next record from cursor */ fetch next from rate_crsr into @mid, @mKeyWord1, @mKeyWord2, @mKeyWord3, @mKeyWord4, @mKeyWord5, @mKeyWord6end/* mark post for deletion if no matching keyword found */set @updSQL = 'UPDATE REBLOGGER_POSTS SET POST_ACTIVE = ''' + 'D' + ''' where KEYWORD_ID is null'exec sp_executesql @updSQLset @updSQL = 'UPDATE REBLOGGER_POSTS SET POST_ACTIVE = ''' + 'Y' + ''' where KEYWORD_ID is NOT null'exec sp_executesql @updSQL/*select @updsql*//* close cursor*/deallocate rate_crsr Now is there anything that I can boost this sp or the table? I know I should not use cursors, but I dont know how to check each keyword with the posts rows.Any ideas or suggestions are very appreciated. |
|
raclede
Posting Yak Master
180 Posts |
Posted - 2005-05-31 : 20:36:44
|
I think cursors really hurts the performance. Well by looking at your Code hmm.. esp the Update part you can simplify it."If the automobile had followed the same development cycle as the computer, a Rolls-Royce would today cost $100, get a million miles per gallon, and explode once a year, killing everyone inside. "raclede |
|
|
Merkin
Funky Drop Bear Fearing SQL Dude!
4970 Posts |
Posted - 2005-05-31 : 21:13:34
|
Your main problem is your design.The keyword1, keyword2 etc in one table isn't very normalized. I'd put each keyword as a separate row in a table which will make it much easier to write a set based query to find and update keywords.Is it too late to change your design ?DamianIta erat quando hic adveni. |
|
|
vishal_7
Posting Yak Master
127 Posts |
Posted - 2005-05-31 : 21:22:26
|
Hello thanks for the reponse to both. Merkin, how would adding the keywords into one row each make it different? The reason why we put several keyword columns is that a certain keyword can belong in several categories. For example keyword java.xml and jaxp is the same. We are already using the tables as they are, but I am willing to change them, if that it what it makes it faster. I am open to hear about the changes you would do.Again thanks! |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2005-05-31 : 21:23:03
|
quote: Is it too late to change your design ?
Its never too late to fix crap like that....Sorry... what am I thinking... NULLs are GOOD............ DavidMA front-end is something that tries to violate a back-end. |
|
|
vishal_7
Posting Yak Master
127 Posts |
Posted - 2005-05-31 : 21:23:41
|
Furthermore how do I make the post table faster? It takes over 40 secs to retrieve 3.000 posts and there are approx. 6.000. |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2005-05-31 : 21:30:26
|
quote: Furthermore how do I make the post table faster?
Would it be the fact that you have stored the Smithsonian Institute in you TEXT column?Seriously...SELECT MAX(DATALENGTH(POST)) as MaxBytes, AVG(DATALENGTH(POST)) as AvgBytesFROM REBLOGGER_POSTSDavidMA front-end is something that tries to violate a back-end. |
|
|
vishal_7
Posting Yak Master
127 Posts |
Posted - 2005-05-31 : 21:45:24
|
well its a post column. How else I am supposed to store the post content, where the lentgth can be very long?MaxBytes 170796AvgBytes 1545 |
|
|
Merkin
Funky Drop Bear Fearing SQL Dude!
4970 Posts |
Posted - 2005-05-31 : 21:48:05
|
Why do you need to SELECT * from that table with no filtering ? If you have lots of large posts the slowdown is probably lots of IO and network traffic, there isn't anything you can do about that, except design your application better so that your queries don't select EVERYTHING in one go.As for your design, if I get some time later I'll try to explain it.DamianIta erat quando hic adveni. |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2005-05-31 : 21:48:18
|
Not excessive numbers...Try the "TEXT IN ROW" option on this table..DavidMA front-end is something that tries to violate a back-end. |
|
|
vishal_7
Posting Yak Master
127 Posts |
Posted - 2005-05-31 : 21:52:30
|
sorry for the stupid question, but how do I set that option when using:SELECT MAX(DATALENGTH(POST)) as MaxBytes, AVG(DATALENGTH(POST)) as AvgBytesFROM REBLOGGER_POSTS |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2005-05-31 : 22:25:16
|
It a configuration setting...BOL.. sp_tableoptionDavidMA front-end is something that tries to violate a back-end. |
|
|
vishal_7
Posting Yak Master
127 Posts |
Posted - 2005-05-31 : 23:01:11
|
Thanks, but it returns the same number.sp_tableoption N'REBLOGGER_POSTS', 'text in row', 'ON'SELECT MAX(DATALENGTH(POST)) as MaxBytes, AVG(DATALENGTH(POST)) as AvgBytesFROM REBLOGGER_POSTS |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2005-05-31 : 23:08:21
|
You missed several vital aspects..1) The amount to place in row2) How SQL deals with..Keep reading....sp_tableoption N'REBLOGGER_POSTS', 'text in row', '2048'It will not alter the data.. only its physical location..You MUST update exisitng data for it to be "re located".UPDATE REBLOGGER_POSTS SET POST = POSTWHERE DATALENGTH(POST) < = 2048DavidMA front-end is something that tries to violate a back-end. |
|
|
vishal_7
Posting Yak Master
127 Posts |
Posted - 2005-05-31 : 23:16:17
|
Sorry for another stupid question, but what exactly will the 'table in row' option do? I dont understand it. Thanks for helping. |
|
|
vishal_7
Posting Yak Master
127 Posts |
Posted - 2005-06-26 : 12:27:53
|
Is there any possibility to change the sp to not use cursors? |
|
|
vishal_7
Posting Yak Master
127 Posts |
Posted - 2005-06-26 : 20:09:40
|
Can anybody help me please? I can change the design if necessary. |
|
|
vishal_7
Posting Yak Master
127 Posts |
Posted - 2005-06-26 : 20:21:56
|
Would it help to enable full text search on that database and then using FreeText? |
|
|
|