Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 ways to optimize large searches (brainstorming)

Author  Topic 

nic
Posting Yak Master

209 Posts

Posted - 2009-09-15 : 18:46:43
We have an application that has millions of records. Overall the site works well but we are running into issues with search. The application allows users to search for existing records by applicant name, address, phone, dates, policyNumber etc. We are running into some timeout issues and are trying to brainstorm how to optimize the process.

We don't want users to return large datasets so have built into place a process to limit a total of 200 records returned in 50 record sets

Rough content of Search Sprocs
1) Insert into temp table (select records based on search param)
2) If temp table greater than 200 throw error
3) Join to temp table and return appropriate data (we only display 50 records at a time so know the first, last record to be displayed etc)

If you look at the execution plan the vast majority of the processing is associated with step 1 (since our base tables have millions of records). Although we try to limit the large queries, it appears some users kick off large queries and this can have a cascading effect, if we have enough of these in a short period of time performance overall on the application starts to degrade. Until we run step 1 we can't know if the dataset is large.

So my main question is how do large sites handle large searches?

I have ran the Index Wizard and it didn't any new suggestions.

For example we need something like this:
Select
fields
FROM
MainTable a INNER JOIN PersonTable b ON a.ID = b.ID
INNER JOIN OtherTable c ON a.ID = c.ID
......
WHERE
b.FirstName Like @firstName
AND b.LastName Like @LastName
.....


We can force users to supply input param but since it is a general search, given a large enough database, these queries can be large.....

Are there other approaches to handling searches besides directly querying the primary tables? Maybe having optimized Search tables (indexed views or something)???? Maybe storing all the searchable items in some search table - not sure how that would synch up with the actaul tables etc??????

Anyway, I am just brainstorming and I'm hoping if others ran into this issue maybe there is a completely different approach.

(The rest of the application works well since once we have "found" the record, we can query it by primary key so the queries are very narrow in scope)

Just curious if there are other approaches.....

Nic

RickD
Slow But Sure Yak Herding Master

3608 Posts

Posted - 2009-09-16 : 05:01:11
Why not use top 200? This way your users will only ever be able to select 200 records max.

Also, using LIKE will generally ignore any indexes on the table, have you looked into Full Text Indexing on those search fields?
Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2009-09-16 : 07:59:30
There are numerous ways of optimizing searching but it really depends on the nature of your searches, and the size of your database. Some concepts you should consider are full text indexing like RickD suggests, horizontal partitioning and creating search-tables (tables more optimized for searching).

I assume that you use the temp-table for paging but it really doesn't make any sense to throw an error if the result set is larger than 200 rows. Ricks SELECT TOP 200 will easily handle that trick for you and the user will get the first 200 rows regardless of how big the actual result set is.

- Lumbago
Go to Top of Page

nic
Posting Yak Master

209 Posts

Posted - 2009-09-16 : 11:48:52
Does the top 200 help with performance at all? I wouldn't think so. My concern with returning the top 200 is it could unnecessarily program the user to just kick off large queries to "see" if the data is in there. Maybe this doesn't make sense, but by raising the error the user needs to supply more information (and thus narrow the search). By displaying the top 200, I think we'd get more big query kickoffs.

From a usability issue I agree top 200 is nice but since we are already running into some timeout issues, I'd be hesitent to incorporate it now, but we can play around with it.

I will definately look into Full Text Indexing. I appreciate your suggestions.

Thanks,
Nic

Nic
Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2009-09-17 : 03:43:49
Well...it depends how you see it. The query optimizer will have to do roughly the same but the both the data retrieval and the data transfer to the client (temp table in your case) will be MUCH faster. It's naturally faster to get 200 rows of data from the disks than 1000, or 10 000 or even 100 000 and passing those to the client or writing them to disk again will save you a lot of network transfer/IO.

- Lumbago
Go to Top of Page

RickD
Slow But Sure Yak Herding Master

3608 Posts

Posted - 2009-09-17 : 04:28:59
Agree with Lumbago, just wanted to add to his answer. You will find that most of the bottlenecks on your queries will be with trying to return too much data to the client (I/O). The easiest way for you to find out if it is faster is to take a query and look at the execution plan for using top 200 and not using it. There is nothing that will answer your question as much as doing a test.
Go to Top of Page
   

- Advertisement -