Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Indexed Searching - Customer Full Name

Author  Topic 

IdRatherBeProgramming
Starting Member

19 Posts

Posted - 2008-11-12 : 09:43:44
In an application I'm working on, I have the need to fully search a full name field, which can contain businesses or individuals and has no formatted order to how the individual names are entered.

Therefore, the data could be any of the following:
* Jason Jude Humphries
* Smith, John Evan Jr
* ABC Investment Company

My current plan uses a method of indexing/searching that search engines made popular. It includes indexing the data based on individual words, and then doing searches against the indexed words - not the original data table. So, I have created stored procedures to parse and index the full name field. The initial indexing to the entire table is done on the table load (nightly), and any updates the rows fire a trigger to re-parse the indexes for that row (using the same stored procedure used in the nightly process).

Once the index is built, the search is working pretty well. But the only SQL I could come up with that would utilize the index table is (The blue sql is generated when someone enters a 3 part name like "Jason Jude Humphries"):


SELECT TOP 1000 *
FROM Customers
WHERE ( C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL )
AND ( C_TaxId = @TaxId OR @TaxId IS NULL )
AND ( C_CustType = @Type OR @Type IS NULL )
AND (
C_ID IN (SELECT CNI_CustomerID FROM CustomerNameIndex WHERE CNI_NamePart LIKE @Data0 + '%' )
AND C_ID IN (SELECT CNI_CustomerID FROM CustomerNameIndex WHERE CNI_NamePart LIKE @Data1 + '%' )
AND C_ID IN (SELECT CNI_CustomerID FROM CustomerNameIndex WHERE CNI_NamePart LIKE @Data2 + '%' )

)
ORDER BY C_FullName, C_PrimaryPhone

Is there a better way to achieve this same general search functionality that might have better structured SQL than what I built (which will cross reference temp results using sub-queries)?

IdRatherBeProgramming
Starting Member

19 Posts

Posted - 2008-11-12 : 10:07:53
Obviously, removing the Order By helped because it no longer had to get all the results on the DB before it returned the top 1000...

Here is another SQL attempt I tried to see how it worked, but it's performance is HORRIBLE...



SELECT TOP 100 *
FROM Customers
LEFT OUTER JOIN CustomerNameIndex CNI0 ON CNI0.CNI_NamePart = @Data0 AND CNI0.CNI_CustomerId = C_ID
LEFT OUTER JOIN CustomerNameIndex CNI1 ON CNI1.CNI_NamePart = @Data1 AND CNI1.CNI_CustomerId = C_ID
LEFT OUTER JOIN CustomerNameIndex CNI2 ON CNI2.CNI_NamePart = @Data2 AND CNI2.CNI_CustomerId = C_ID
WHERE ( C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL )
AND ( C_TaxId = @TaxId OR @TaxId IS NULL )
AND ( C_CustType = @Type OR @Type IS NULL )
AND ( CNI0.CNI_CustomerId IS NOT NULL OR @Data0 IS NULL )
AND ( CNI1.CNI_CustomerId IS NOT NULL OR @Data1 IS NULL )
AND ( CNI2.CNI_CustomerId IS NOT NULL OR @Data2 IS NULL )
Go to Top of Page

sakets_2000
Master Smack Fu Yak Hacker

1472 Posts

Posted - 2008-11-12 : 10:12:02


Can you check for this,

SELECT TOP 1000 *
FROM Customers
WHERE ( C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL )
AND ( C_TaxId = @TaxId OR @TaxId IS NULL )
AND ( C_CustType = @Type OR @Type IS NULL )
AND (
exists (SELECT 1 FROM CustomerNameIndex WHERE CNI_NamePart LIKE @Data0 + '%' and Customers.C_ID=CNI_CustomerID )
AND exists (SELECT 1 FROM CustomerNameIndex WHERE CNI_NamePart LIKE @Data1 + '%' and Customers.C_ID=CNI_CustomerID)
AND exists (SELECT 1 FROM CustomerNameIndex WHERE CNI_NamePart LIKE @Data2 + '%'and Customers.C_ID=CNI_CustomerID )
)
ORDER BY C_FullName, C_PrimaryPhone
Go to Top of Page

IdRatherBeProgramming
Starting Member

19 Posts

Posted - 2008-11-12 : 10:14:36
This query came back in 10 seconds, but that's slower than the original...


SELECT TOP 100 *
FROM Customers
WHERE ( C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL )
AND ( C_TaxId = @TaxId OR @TaxId IS NULL )
AND ( C_CustType = @Type OR @Type IS NULL )
AND (
C_ID IN (
SELECT CNI_CustomerID
FROM CustomerNameIndex CNI0
WHERE CNI0.CNI_NamePart LIKE @Data0 + '%' AND CNI0.CNI_CustomerID IN (
SELECT CNI_CustomerID
FROM CustomerNameIndex CNI1
WHERE CNI1.CNI_NamePart LIKE @Data1 + '%' AND CNI1.CNI_CustomerID IN (
SELECT CNI_CustomerID
FROM CustomerNameIndex CNI2
WHERE CNI2.CNI_NamePart LIKE @Data2 + '%'
)
)
)
)
Go to Top of Page

IdRatherBeProgramming
Starting Member

19 Posts

Posted - 2008-11-12 : 10:15:25
The query from sakets_2000 took 8 seconds...

But, just to be clear, the original query is taking 8 seconds as well right now... This is a shared SQL development box, so I'm not expecting "instant response"...

Any other ideas out there or is the "exists" the best way?
Go to Top of Page

IdRatherBeProgramming
Starting Member

19 Posts

Posted - 2008-11-12 : 10:27:43
The Queries above all have 80% (or more) of their execution time doing a "Clustered Index Scan" on the Primary Key of the Customer table... I think that's good, right?

PS. The CustomerNameIndex table is defined as:

CREATE TABLE [dbo].[CustomerNameIndex] (
[CNI_CustomerId] [numeric](18, 0) NOT NULL ,
[CNI_NamePart] [varchar] (15) NOT NULL
) ON [PRIMARY]
GO

CREATE CLUSTERED INDEX [IX_CustomerNameIndex] ON [dbo].[CustomerNameIndex]([CNI_NamePart]) ON [PRIMARY]
GO

ALTER TABLE [dbo].[CustomerNameIndex] ADD
CONSTRAINT [PK_CustomerNameIndex] PRIMARY KEY NONCLUSTERED
(
[CNI_CustomerId],
[CNI_NamePart]
) ON [PRIMARY]
GO



Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-11-12 : 10:46:43
[code]SELECT TOP 1000
*
FROM Customers
WHERE (C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL)
AND (C_TaxId = @TaxId OR @TaxId IS NULL)
AND (C_CustType = @Type OR @Type IS NULL)
AND EXISTS(
SELECT *
FROM CustomerNameIndex
WHERE (CNI_NamePart LIKE @Data0 + '%' OR @Data0 IS NULL)
AND (CNI_NamePart LIKE @Data1 + '%' OR @Data1 IS NULL)
AND (CNI_NamePart LIKE @Data2 + '%' OR @Data2 IS NULL)
AND CNI_CustomerID = C_ID
)[/code]


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2008-11-12 : 10:58:40
Why not use the Full Text Search functionality available in 2005?

If it is not practically useful, then it is practically useless.
Go to Top of Page

IdRatherBeProgramming
Starting Member

19 Posts

Posted - 2008-11-12 : 11:05:47
The Example from PESO is gramatically correct, but takes 12 seconds and does not return any results. This is because the search terms cannot be pulled in one sub-query...

The Full Text Search might work, but I don't have access to SQL Server 2005 to test it right now...and I'm not sure about how much size and overhead is built into doing a "full text scan" rather than a "word scan" as I'm building.

However, that is an option and will be tested at some point in the future. It would be interesting to see if the results are better than using my "name part" indexed table.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-11-12 : 16:26:19
[code]SELECT TOP 1000 c.*
FROM (
SELECT *
FROM Customers
WHERE (C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL)
AND (C_TaxId = @TaxId OR @TaxId IS NULL)
AND (C_CustType = @Type OR @Type IS NULL)
) AS c
INNER JOIN (
SELECT CNI_CustomerID
FROM CustomerNameIndex
GROUP BY CNI_CustomerID
HAVING MAX(CASE WHEN CNI_NamePart LIKE @Data0 + '%' OR @Data0 IS NULL THEN 1 ELSE 0 END) = 1
MAX(CASE WHEN CNI_NamePart LIKE @Data1 + '%' OR @Data1 IS NULL THEN 1 ELSE 0 END) = 1
MAX(CASE WHEN CNI_NamePart LIKE @Data2 + '%' OR @Data2 IS NULL THEN 1 ELSE 0 END) = 1
) AS w ON w.CNI_CustomerID = c.C_ID[/code]


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2008-11-12 : 16:49:51
quote:
Originally posted by IdRatherBeProgramming

The Example from PESO is gramatically correct, but takes 12 seconds and does not return any results. This is because the search terms cannot be pulled in one sub-query...

The Full Text Search might work, but I don't have access to SQL Server 2005 to test it right now...and I'm not sure about how much size and overhead is built into doing a "full text scan" rather than a "word scan" as I'm building.

However, that is an option and will be tested at some point in the future. It would be interesting to see if the results are better than using my "name part" indexed table.



If you are not using SQL Server 2005, you should not post your questions on the 2005 forum.

That said, full text indexing is available in earlier versions of SQL Server, but with less capabilities. The lookups with full text indexes are fast, but there is significant maintenance overhead. The capability in SQL Server 2005 is much easier to manage.





CODO ERGO SUM
Go to Top of Page

IdRatherBeProgramming
Starting Member

19 Posts

Posted - 2008-11-14 : 14:42:54
quote:
Originally posted by Michael Valentine Jones

If you are not using SQL Server 2005, you should not post your questions on the 2005 forum.


I'm asking a generic transact SQL question. And it will eventually (actually in a month or so) be on a SQL 2005 box, so I will try the full index scan at that time.

But for now - I'm just asking if there's a better way to set up the query...
Go to Top of Page

IdRatherBeProgramming
Starting Member

19 Posts

Posted - 2008-11-14 : 14:45:54
quote:
Originally posted by Peso

SELECT TOP 1000	c.*
FROM (
SELECT *
FROM Customers
WHERE (C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL)
AND (C_TaxId = @TaxId OR @TaxId IS NULL)
AND (C_CustType = @Type OR @Type IS NULL)
) AS c
INNER JOIN (
SELECT CNI_CustomerID
FROM CustomerNameIndex
GROUP BY CNI_CustomerID
HAVING MAX(CASE WHEN CNI_NamePart LIKE @Data0 + '%' OR @Data0 IS NULL THEN 1 ELSE 0 END) = 1
MAX(CASE WHEN CNI_NamePart LIKE @Data1 + '%' OR @Data1 IS NULL THEN 1 ELSE 0 END) = 1
MAX(CASE WHEN CNI_NamePart LIKE @Data2 + '%' OR @Data2 IS NULL THEN 1 ELSE 0 END) = 1
) AS w ON w.CNI_CustomerID = c.C_ID



E 12°55'05.63"
N 56°04'39.26"




When I try your SQL, I get...

Server: Msg 170, Level 15, State 1, Line 91
Line 91: Incorrect syntax near 'MAX'.

I can't see anything wrong. Can you?
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2008-11-14 : 15:16:36
quote:
Originally posted by IdRatherBeProgramming

quote:
Originally posted by Peso

SELECT TOP 1000	c.*
FROM (
SELECT *
FROM Customers
WHERE (C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL)
AND (C_TaxId = @TaxId OR @TaxId IS NULL)
AND (C_CustType = @Type OR @Type IS NULL)
) AS c
INNER JOIN (
SELECT CNI_CustomerID
FROM CustomerNameIndex
GROUP BY CNI_CustomerID
HAVING MAX(CASE WHEN CNI_NamePart LIKE @Data0 + '%' OR @Data0 IS NULL THEN 1 ELSE 0 END) = 1
AND MAX(CASE WHEN CNI_NamePart LIKE @Data1 + '%' OR @Data1 IS NULL THEN 1 ELSE 0 END) = 1
AND MAX(CASE WHEN CNI_NamePart LIKE @Data2 + '%' OR @Data2 IS NULL THEN 1 ELSE 0 END) = 1
) AS w ON w.CNI_CustomerID = c.C_ID



E 12°55'05.63"
N 56°04'39.26"




When I try your SQL, I get...

Server: Msg 170, Level 15, State 1, Line 91
Line 91: Incorrect syntax near 'MAX'.

I can't see anything wrong. Can you?

Go to Top of Page

IdRatherBeProgramming
Starting Member

19 Posts

Posted - 2008-11-18 : 14:32:29
Well, adding the "AND" keywords in at least allowed the SQL to execute... But it took over 19 seconds.

I'm guessing there is just no easy way to do this. I'm going to take a look at putting the original query in a stored procedure with up to 5 search parts and see if I can use a temp table to help narrow down as I go... I'll let you know how it goes.
Go to Top of Page

IdRatherBeProgramming
Starting Member

19 Posts

Posted - 2008-11-18 : 16:04:10
OK, the stored procedure didn't help much either...

What about better ways of setting up the index table and the customer table?
Maybe have multiple "name part" indexes as columns on the customer table instead of separate rows in own table?
Maybe I should change how I am setting up the indexes on the tables (clustering vs non clustering, primary vs index)?

Any thoughts from this perspective?

All I'm trying to do is build a text field that can store a name with multiple parts, which is quickly searchable (and likely indexed) by any individual "part" that is longer than a single character.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-11-18 : 16:20:39
In your original queries, you have no check for @dataX is null either...



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

IdRatherBeProgramming
Starting Member

19 Posts

Posted - 2008-11-18 : 16:28:41
quote:
Originally posted by Peso

In your original queries, you have no check for @dataX is null either...


True, but in my test case, I was always sending in 3 pieces of data - so there was no NULLs in the data and it would have done the same thing either way.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-11-18 : 17:10:04
You are ALWAYS sending three pieces of information?
SELECT TOP 1000	c.*
FROM (
SELECT *
FROM Customers
WHERE (C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL)
AND (C_TaxId = @TaxId OR @TaxId IS NULL)
AND (C_CustType = @Type OR @Type IS NULL)
) AS c
INNER JOIN (
SELECT CNI_CustomerID
FROM CustomerNameIndex
GROUP BY CNI_CustomerID
HAVING MAX(CASE WHEN CNI_NamePart LIKE @Data0 + '%' THEN 1 ELSE 0 END) = 1
AND MAX(CASE WHEN CNI_NamePart LIKE @Data1 + '%' THEN 1 ELSE 0 END) = 1
AND MAX(CASE WHEN CNI_NamePart LIKE @Data2 + '%' THEN 1 ELSE 0 END) = 1
) AS w ON w.CNI_CustomerID = c.C_ID



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

IdRatherBeProgramming
Starting Member

19 Posts

Posted - 2008-11-20 : 09:24:45
quote:
Originally posted by Peso


No, but I'm just trying to optimize the query for now... The query you have in your last post ran, but took more time than the original.
Go to Top of Page

IdRatherBeProgramming
Starting Member

19 Posts

Posted - 2008-11-20 : 09:29:08
Ok, Here's what I have now... In my ASP.Net app, I'm auto-generating parameterized SQL that looks like this (notice there is only two parameters in this query):


SELECT TOP 1000 *
FROM Customers
WHERE ( C_PrimaryPhone LIKE @PrimaryPhone + '%' OR @PrimaryPhone IS NULL )
AND ( C_TaxId = @TaxId OR @TaxId IS NULL )
AND ( C_CustType = @Type OR @Type IS NULL )
AND C_ID IN (
SELECT CNI_CustomerID
FROM CustomerNameIndex
WHERE CNI_NamePart = @Data0 AND CNI_CustomerID IN (
SELECT CNI_CustomerID
FROM CustomerNameIndex
WHERE CNI_NamePart = @Data1 ) )


And here is my table definitions (with indexes):


CREATE TABLE [dbo].[CustomerNameIndex] (
[CNI_CustomerId] [numeric](18, 0) NOT NULL ,
[CNI_NamePart] [varchar] (15) NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[Customers] (
[C_Id] [numeric](18, 0) IDENTITY (1, 1) NOT NULL ,
[C_CustType] [tinyint] NULL ,
[C_FullName] [varchar] (50) NULL ,
[C_TaxType] [char] (1) NULL ,
[C_TaxId] [char] (9) NULL ,
[C_PrimaryPhone] [varchar] (15) NULL ,
[C_Title] [varchar] (20) NULL ,
[C_RelManagerId] [int] NULL
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[Customers] WITH NOCHECK ADD
CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED
(
[C_Id]
) ON [PRIMARY]
GO

CREATE CLUSTERED INDEX [IX_CustomerNameIndex] ON [dbo].[CustomerNameIndex]([CNI_NamePart]) ON [PRIMARY]
GO

ALTER TABLE [dbo].[CustomerNameIndex] ADD
CONSTRAINT [PK_CustomerNameIndex] PRIMARY KEY NONCLUSTERED
(
[CNI_CustomerId],
[CNI_NamePart]
) ON [PRIMARY]
GO

CREATE INDEX [IX_Customers] ON [dbo].[Customers]([C_TaxId]) ON [PRIMARY]
GO

CREATE INDEX [IX_Customers_1] ON [dbo].[Customers]([C_PrimaryPhone]) ON [PRIMARY]
GO

SET QUOTED_IDENTIFIER ON
GO
SET ANSI_NULLS ON
GO



One would think an indexed field should be able to come back rather quickly, but every search is taking around 8 seconds... There's only a million rows in the table, so it should be able to handle it much faster than this.

Any ideas on the table definition?

I think there almost has to be a problem in the way I set up the tables and indexes, because the execution plan says it's doing a Index SEEK, not a SCAN... What should I change?
Go to Top of Page
    Next Page

- Advertisement -