| Author |
Topic |
|
IdRatherBeProgramming
Starting Member
19 Posts |
Posted - 2008-11-12 : 09:43:44
|
In an application I'm working on, I have the need to fully search a full name field, which can contain businesses or individuals and has no formatted order to how the individual names are entered. Therefore, the data could be any of the following:* Jason Jude Humphries* Smith, John Evan Jr* ABC Investment CompanyMy current plan uses a method of indexing/searching that search engines made popular. It includes indexing the data based on individual words, and then doing searches against the indexed words - not the original data table. So, I have created stored procedures to parse and index the full name field. The initial indexing to the entire table is done on the table load (nightly), and any updates the rows fire a trigger to re-parse the indexes for that row (using the same stored procedure used in the nightly process).Once the index is built, the search is working pretty well. But the only SQL I could come up with that would utilize the index table is (The blue sql is generated when someone enters a 3 part name like "Jason Jude Humphries"):SELECT TOP 1000 * FROM Customers WHERE ( C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL ) AND ( C_TaxId = @TaxId OR @TaxId IS NULL ) AND ( C_CustType = @Type OR @Type IS NULL ) AND ( C_ID IN (SELECT CNI_CustomerID FROM CustomerNameIndex WHERE CNI_NamePart LIKE @Data0 + '%' ) AND C_ID IN (SELECT CNI_CustomerID FROM CustomerNameIndex WHERE CNI_NamePart LIKE @Data1 + '%' ) AND C_ID IN (SELECT CNI_CustomerID FROM CustomerNameIndex WHERE CNI_NamePart LIKE @Data2 + '%' ) ) ORDER BY C_FullName, C_PrimaryPhone Is there a better way to achieve this same general search functionality that might have better structured SQL than what I built (which will cross reference temp results using sub-queries)? |
|
|
IdRatherBeProgramming
Starting Member
19 Posts |
Posted - 2008-11-12 : 10:07:53
|
Obviously, removing the Order By helped because it no longer had to get all the results on the DB before it returned the top 1000... Here is another SQL attempt I tried to see how it worked, but it's performance is HORRIBLE...SELECT TOP 100 * FROM Customers LEFT OUTER JOIN CustomerNameIndex CNI0 ON CNI0.CNI_NamePart = @Data0 AND CNI0.CNI_CustomerId = C_ID LEFT OUTER JOIN CustomerNameIndex CNI1 ON CNI1.CNI_NamePart = @Data1 AND CNI1.CNI_CustomerId = C_ID LEFT OUTER JOIN CustomerNameIndex CNI2 ON CNI2.CNI_NamePart = @Data2 AND CNI2.CNI_CustomerId = C_ID WHERE ( C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL ) AND ( C_TaxId = @TaxId OR @TaxId IS NULL ) AND ( C_CustType = @Type OR @Type IS NULL ) AND ( CNI0.CNI_CustomerId IS NOT NULL OR @Data0 IS NULL ) AND ( CNI1.CNI_CustomerId IS NOT NULL OR @Data1 IS NULL ) AND ( CNI2.CNI_CustomerId IS NOT NULL OR @Data2 IS NULL ) |
 |
|
|
sakets_2000
Master Smack Fu Yak Hacker
1472 Posts |
Posted - 2008-11-12 : 10:12:02
|
| Can you check for this,SELECT TOP 1000 * FROM Customers WHERE ( C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL ) AND ( C_TaxId = @TaxId OR @TaxId IS NULL ) AND ( C_CustType = @Type OR @Type IS NULL ) AND ( exists (SELECT 1 FROM CustomerNameIndex WHERE CNI_NamePart LIKE @Data0 + '%' and Customers.C_ID=CNI_CustomerID ) AND exists (SELECT 1 FROM CustomerNameIndex WHERE CNI_NamePart LIKE @Data1 + '%' and Customers.C_ID=CNI_CustomerID) AND exists (SELECT 1 FROM CustomerNameIndex WHERE CNI_NamePart LIKE @Data2 + '%'and Customers.C_ID=CNI_CustomerID ) ) ORDER BY C_FullName, C_PrimaryPhone |
 |
|
|
IdRatherBeProgramming
Starting Member
19 Posts |
Posted - 2008-11-12 : 10:14:36
|
This query came back in 10 seconds, but that's slower than the original...SELECT TOP 100 * FROM Customers WHERE ( C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL ) AND ( C_TaxId = @TaxId OR @TaxId IS NULL ) AND ( C_CustType = @Type OR @Type IS NULL ) AND ( C_ID IN ( SELECT CNI_CustomerID FROM CustomerNameIndex CNI0 WHERE CNI0.CNI_NamePart LIKE @Data0 + '%' AND CNI0.CNI_CustomerID IN ( SELECT CNI_CustomerID FROM CustomerNameIndex CNI1 WHERE CNI1.CNI_NamePart LIKE @Data1 + '%' AND CNI1.CNI_CustomerID IN ( SELECT CNI_CustomerID FROM CustomerNameIndex CNI2 WHERE CNI2.CNI_NamePart LIKE @Data2 + '%' ) ) ) ) |
 |
|
|
IdRatherBeProgramming
Starting Member
19 Posts |
Posted - 2008-11-12 : 10:15:25
|
| The query from sakets_2000 took 8 seconds... But, just to be clear, the original query is taking 8 seconds as well right now... This is a shared SQL development box, so I'm not expecting "instant response"... Any other ideas out there or is the "exists" the best way? |
 |
|
|
IdRatherBeProgramming
Starting Member
19 Posts |
Posted - 2008-11-12 : 10:27:43
|
| The Queries above all have 80% (or more) of their execution time doing a "Clustered Index Scan" on the Primary Key of the Customer table... I think that's good, right?PS. The CustomerNameIndex table is defined as:CREATE TABLE [dbo].[CustomerNameIndex] ( [CNI_CustomerId] [numeric](18, 0) NOT NULL , [CNI_NamePart] [varchar] (15) NOT NULL ) ON [PRIMARY]GO CREATE CLUSTERED INDEX [IX_CustomerNameIndex] ON [dbo].[CustomerNameIndex]([CNI_NamePart]) ON [PRIMARY]GOALTER TABLE [dbo].[CustomerNameIndex] ADD CONSTRAINT [PK_CustomerNameIndex] PRIMARY KEY NONCLUSTERED ( [CNI_CustomerId], [CNI_NamePart] ) ON [PRIMARY] GO |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-11-12 : 10:46:43
|
[code]SELECT TOP 1000 *FROM CustomersWHERE (C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL) AND (C_TaxId = @TaxId OR @TaxId IS NULL) AND (C_CustType = @Type OR @Type IS NULL) AND EXISTS( SELECT * FROM CustomerNameIndex WHERE (CNI_NamePart LIKE @Data0 + '%' OR @Data0 IS NULL) AND (CNI_NamePart LIKE @Data1 + '%' OR @Data1 IS NULL) AND (CNI_NamePart LIKE @Data2 + '%' OR @Data2 IS NULL) AND CNI_CustomerID = C_ID )[/code] E 12°55'05.63"N 56°04'39.26" |
 |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2008-11-12 : 10:58:40
|
| Why not use the Full Text Search functionality available in 2005?If it is not practically useful, then it is practically useless. |
 |
|
|
IdRatherBeProgramming
Starting Member
19 Posts |
Posted - 2008-11-12 : 11:05:47
|
| The Example from PESO is gramatically correct, but takes 12 seconds and does not return any results. This is because the search terms cannot be pulled in one sub-query... The Full Text Search might work, but I don't have access to SQL Server 2005 to test it right now...and I'm not sure about how much size and overhead is built into doing a "full text scan" rather than a "word scan" as I'm building. However, that is an option and will be tested at some point in the future. It would be interesting to see if the results are better than using my "name part" indexed table. |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-11-12 : 16:26:19
|
[code]SELECT TOP 1000 c.*FROM ( SELECT * FROM Customers WHERE (C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL) AND (C_TaxId = @TaxId OR @TaxId IS NULL) AND (C_CustType = @Type OR @Type IS NULL) ) AS cINNER JOIN ( SELECT CNI_CustomerID FROM CustomerNameIndex GROUP BY CNI_CustomerID HAVING MAX(CASE WHEN CNI_NamePart LIKE @Data0 + '%' OR @Data0 IS NULL THEN 1 ELSE 0 END) = 1 MAX(CASE WHEN CNI_NamePart LIKE @Data1 + '%' OR @Data1 IS NULL THEN 1 ELSE 0 END) = 1 MAX(CASE WHEN CNI_NamePart LIKE @Data2 + '%' OR @Data2 IS NULL THEN 1 ELSE 0 END) = 1 ) AS w ON w.CNI_CustomerID = c.C_ID[/code] E 12°55'05.63"N 56°04'39.26" |
 |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2008-11-12 : 16:49:51
|
quote: Originally posted by IdRatherBeProgramming The Example from PESO is gramatically correct, but takes 12 seconds and does not return any results. This is because the search terms cannot be pulled in one sub-query... The Full Text Search might work, but I don't have access to SQL Server 2005 to test it right now...and I'm not sure about how much size and overhead is built into doing a "full text scan" rather than a "word scan" as I'm building. However, that is an option and will be tested at some point in the future. It would be interesting to see if the results are better than using my "name part" indexed table.
If you are not using SQL Server 2005, you should not post your questions on the 2005 forum.That said, full text indexing is available in earlier versions of SQL Server, but with less capabilities. The lookups with full text indexes are fast, but there is significant maintenance overhead. The capability in SQL Server 2005 is much easier to manage.CODO ERGO SUM |
 |
|
|
IdRatherBeProgramming
Starting Member
19 Posts |
Posted - 2008-11-14 : 14:42:54
|
quote: Originally posted by Michael Valentine Jones If you are not using SQL Server 2005, you should not post your questions on the 2005 forum.
I'm asking a generic transact SQL question. And it will eventually (actually in a month or so) be on a SQL 2005 box, so I will try the full index scan at that time. But for now - I'm just asking if there's a better way to set up the query... |
 |
|
|
IdRatherBeProgramming
Starting Member
19 Posts |
Posted - 2008-11-14 : 14:45:54
|
quote: Originally posted by Peso
SELECT TOP 1000 c.*FROM ( SELECT * FROM Customers WHERE (C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL) AND (C_TaxId = @TaxId OR @TaxId IS NULL) AND (C_CustType = @Type OR @Type IS NULL) ) AS cINNER JOIN ( SELECT CNI_CustomerID FROM CustomerNameIndex GROUP BY CNI_CustomerID HAVING MAX(CASE WHEN CNI_NamePart LIKE @Data0 + '%' OR @Data0 IS NULL THEN 1 ELSE 0 END) = 1 MAX(CASE WHEN CNI_NamePart LIKE @Data1 + '%' OR @Data1 IS NULL THEN 1 ELSE 0 END) = 1 MAX(CASE WHEN CNI_NamePart LIKE @Data2 + '%' OR @Data2 IS NULL THEN 1 ELSE 0 END) = 1 ) AS w ON w.CNI_CustomerID = c.C_ID E 12°55'05.63"N 56°04'39.26"
When I try your SQL, I get...Server: Msg 170, Level 15, State 1, Line 91Line 91: Incorrect syntax near 'MAX'.I can't see anything wrong. Can you? |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2008-11-14 : 15:16:36
|
quote: Originally posted by IdRatherBeProgramming
quote: Originally posted by Peso
SELECT TOP 1000 c.*FROM ( SELECT * FROM Customers WHERE (C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL) AND (C_TaxId = @TaxId OR @TaxId IS NULL) AND (C_CustType = @Type OR @Type IS NULL) ) AS cINNER JOIN ( SELECT CNI_CustomerID FROM CustomerNameIndex GROUP BY CNI_CustomerID HAVING MAX(CASE WHEN CNI_NamePart LIKE @Data0 + '%' OR @Data0 IS NULL THEN 1 ELSE 0 END) = 1 AND MAX(CASE WHEN CNI_NamePart LIKE @Data1 + '%' OR @Data1 IS NULL THEN 1 ELSE 0 END) = 1 AND MAX(CASE WHEN CNI_NamePart LIKE @Data2 + '%' OR @Data2 IS NULL THEN 1 ELSE 0 END) = 1 ) AS w ON w.CNI_CustomerID = c.C_ID E 12°55'05.63"N 56°04'39.26"
When I try your SQL, I get...Server: Msg 170, Level 15, State 1, Line 91Line 91: Incorrect syntax near 'MAX'.I can't see anything wrong. Can you?
|
 |
|
|
IdRatherBeProgramming
Starting Member
19 Posts |
Posted - 2008-11-18 : 14:32:29
|
| Well, adding the "AND" keywords in at least allowed the SQL to execute... But it took over 19 seconds.I'm guessing there is just no easy way to do this. I'm going to take a look at putting the original query in a stored procedure with up to 5 search parts and see if I can use a temp table to help narrow down as I go... I'll let you know how it goes. |
 |
|
|
IdRatherBeProgramming
Starting Member
19 Posts |
Posted - 2008-11-18 : 16:04:10
|
| OK, the stored procedure didn't help much either...What about better ways of setting up the index table and the customer table?Maybe have multiple "name part" indexes as columns on the customer table instead of separate rows in own table?Maybe I should change how I am setting up the indexes on the tables (clustering vs non clustering, primary vs index)?Any thoughts from this perspective?All I'm trying to do is build a text field that can store a name with multiple parts, which is quickly searchable (and likely indexed) by any individual "part" that is longer than a single character. |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-11-18 : 16:20:39
|
In your original queries, you have no check for @dataX is null either... E 12°55'05.63"N 56°04'39.26" |
 |
|
|
IdRatherBeProgramming
Starting Member
19 Posts |
Posted - 2008-11-18 : 16:28:41
|
quote: Originally posted by Peso In your original queries, you have no check for @dataX is null either...
True, but in my test case, I was always sending in 3 pieces of data - so there was no NULLs in the data and it would have done the same thing either way. |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-11-18 : 17:10:04
|
You are ALWAYS sending three pieces of information?SELECT TOP 1000 c.*FROM ( SELECT * FROM Customers WHERE (C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL) AND (C_TaxId = @TaxId OR @TaxId IS NULL) AND (C_CustType = @Type OR @Type IS NULL) ) AS cINNER JOIN ( SELECT CNI_CustomerID FROM CustomerNameIndex GROUP BY CNI_CustomerID HAVING MAX(CASE WHEN CNI_NamePart LIKE @Data0 + '%' THEN 1 ELSE 0 END) = 1 AND MAX(CASE WHEN CNI_NamePart LIKE @Data1 + '%' THEN 1 ELSE 0 END) = 1 AND MAX(CASE WHEN CNI_NamePart LIKE @Data2 + '%' THEN 1 ELSE 0 END) = 1 ) AS w ON w.CNI_CustomerID = c.C_ID E 12°55'05.63"N 56°04'39.26" |
 |
|
|
IdRatherBeProgramming
Starting Member
19 Posts |
Posted - 2008-11-20 : 09:24:45
|
quote: Originally posted by Peso
No, but I'm just trying to optimize the query for now... The query you have in your last post ran, but took more time than the original. |
 |
|
|
IdRatherBeProgramming
Starting Member
19 Posts |
Posted - 2008-11-20 : 09:29:08
|
Ok, Here's what I have now... In my ASP.Net app, I'm auto-generating parameterized SQL that looks like this (notice there is only two parameters in this query):SELECT TOP 1000 * FROM Customers WHERE ( C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL ) AND ( C_TaxId = @TaxId OR @TaxId IS NULL ) AND ( C_CustType = @Type OR @Type IS NULL ) AND C_ID IN ( SELECT CNI_CustomerID FROM CustomerNameIndex WHERE CNI_NamePart = @Data0 AND CNI_CustomerID IN ( SELECT CNI_CustomerID FROM CustomerNameIndex WHERE CNI_NamePart = @Data1 ) ) And here is my table definitions (with indexes):CREATE TABLE [dbo].[CustomerNameIndex] ( [CNI_CustomerId] [numeric](18, 0) NOT NULL , [CNI_NamePart] [varchar] (15) NOT NULL ) ON [PRIMARY]GOCREATE TABLE [dbo].[Customers] ( [C_Id] [numeric](18, 0) IDENTITY (1, 1) NOT NULL , [C_CustType] [tinyint] NULL , [C_FullName] [varchar] (50) NULL , [C_TaxType] [char] (1) NULL , [C_TaxId] [char] (9) NULL , [C_PrimaryPhone] [varchar] (15) NULL , [C_Title] [varchar] (20) NULL , [C_RelManagerId] [int] NULL ) ON [PRIMARY]GOALTER TABLE [dbo].[Customers] WITH NOCHECK ADD CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED ( [C_Id] ) ON [PRIMARY] GO CREATE CLUSTERED INDEX [IX_CustomerNameIndex] ON [dbo].[CustomerNameIndex]([CNI_NamePart]) ON [PRIMARY]GOALTER TABLE [dbo].[CustomerNameIndex] ADD CONSTRAINT [PK_CustomerNameIndex] PRIMARY KEY NONCLUSTERED ( [CNI_CustomerId], [CNI_NamePart] ) ON [PRIMARY] GO CREATE INDEX [IX_Customers] ON [dbo].[Customers]([C_TaxId]) ON [PRIMARY]GO CREATE INDEX [IX_Customers_1] ON [dbo].[Customers]([C_PrimaryPhone]) ON [PRIMARY]GOSET QUOTED_IDENTIFIER ON GOSET ANSI_NULLS ON GO One would think an indexed field should be able to come back rather quickly, but every search is taking around 8 seconds... There's only a million rows in the table, so it should be able to handle it much faster than this.Any ideas on the table definition?I think there almost has to be a problem in the way I set up the tables and indexes, because the execution plan says it's doing a Index SEEK, not a SCAN... What should I change? |
 |
|
|
Next Page
|