Search - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

Author

Topic

BendJoe
Posting Yak Master

128 Posts

Posted - 2007-11-05 : 09:40:29

I am doing a search. I have a table which keeps FirstName and LastName.
But the searchparameter which I am taking from the webpage is Name which is FirstName + LastName. Now I want to check whether this name is there in my table. Right now I am having difficulty in searching two column with the same search parameter. See below
sample SQL
AND (freetext(UserContact.FirstName, @Name) OR (@Name IS NULL) )
/*AND (freetext(UserContact.LastName, @Name) OR (@Name IS NULL) )*/

When I use one column search it works.
How can I solve this.
Thanks

Kristen
Test

22859 Posts

Posted - 2007-11-05 : 09:46:35

Not sure I know what FreeText() does, but would this do?

AND (freetext(UserContact.FirstName + UserContact.LastName, @Name) OR (@Name IS NULL) )

Kristen

BendJoe
Posting Yak Master

128 Posts

Posted - 2007-11-05 : 09:54:19

AND (freetext(UserContact.FirstName + UserContact.LastName, @Name) OR (@Name IS NULL) )
Got this error.
Msg 102, Level 15, State 1, Procedure usp_FindAuthor, Line 40
Incorrect syntax near '+'.
I am using freetext when search parameter is more than one word. eg: parameter John Doe, freetext will search for John and Doe. As far as I know Like will look for John Doe as a single continous word.
Or is there a better way to do this without freetext. I am new to SQL.
Thanks

Kristen
Test

22859 Posts

Posted - 2007-11-05 : 10:29:48

I don't think you've given us enough information.

What is freetext() ?

And why/how has it triggered a call to usp_FindAuthor?

Kristen

BendJoe
Posting Yak Master

128 Posts

Posted - 2007-11-05 : 10:39:44

here it is.
set ANSI_NULLS ON
set QUOTED_IDENTIFIER ON
go

ALTER PROCEDURE [dbo].[usp_FindAuthor]

/* ALL parameters are optional */
@Name nvarchar(200) = NULL,
@email varchar(200)=NULL,
@Company varchar(200) = NULL,
@Building varchar(200) = NULL,
@Phone varchar(200)=NULL,
@Status bit =NULL

AS

SELECT DISTINCT aspnet_Membership.UserId, aspnet_Membership.LoweredEmail, aspnet_Membership.IsLockedOut, aspnet_Users.UserName, UserContact.FirstName,
UserContact.LastName, UserContact.Company, UserContact.Phone
FROM aspnet_Membership INNER JOIN
aspnet_Users ON aspnet_Membership.UserId = aspnet_Users.UserId INNER JOIN
BuildingAddress ON aspnet_Membership.UserId = BuildingAddress.UserID INNER JOIN
UserContact ON aspnet_Membership.UserId = UserContact.UserID

/*SELECT distinct aspnet_Membership.UserId,aspnet_Membership.LoweredEmail, aspnet_Membership.IsApproved, aspnet_Users.UserName,
UserContact.FirstName, UserContact.LastName, UserContact.Company
FROM UserContact INNER JOIN
BuildingAddress ON UserContact.UserID = BuildingAddress.UserID INNER JOIN
aspnet_Membership INNER JOIN
aspnet_Users ON aspnet_Membership.UserId = aspnet_Users.UserId ON UserContact.UserID = aspnet_Users.UserId
*/
WHERE
((aspnet_Membership.LoweredEmail = @email) OR (@email IS NULL))
AND ((aspnet_Membership.IsLockedOut = @Status) OR (@Status IS NULL))
AND ((BuildingAddress.BuildingAddress LIKE '%'+@Building+'%') OR (@Building IS NULL))
AND (freetext(UserContact.FirstName + UserContact.LastName, @Name) OR (@Name IS NULL) )
/*AND (freetext(UserContact.LastName, @Name) OR (@Name IS NULL) )*/
/*AND ((UserContact.LastName LIKE '%'+@LastName+'%') OR (@LastName IS NULL) )*/
AND ((UserContact.Company LIKE'%'+ @Company+'%')OR (@Company IS NULL))
AND ((UserContact.Phone = @Phone) OR (@Phone IS NULL))
/*
((FirstName LIKE '%'+@Name+'%') OR (@Name IS NULL) )
AND
((LastName LIKE '%'+@Name+'%') OR (@Name IS NULL) )
AND
((Company LIKE '%'+@Company+'%')OR (@Company IS NULL))
AND
((BuildingAddress LIKE '%'+@Building+'%') OR (@Building IS NULL))
AND
((Phone LIKE '%'+@Phone+'%') OR (@Phone IS NULL))
AND
((LoweredEmail LIKE '%'+@email+'%') OR (@email IS NULL))
AND
((IsLockedOut=@Status) OR (@Status IS NULL))*/

Kristen
Test

22859 Posts

Posted - 2007-11-05 : 11:45:26

If @Name will contain both First and Last names you could do:

((FirstName + ' ' + LastName LIKE '%'+@Name+'%') OR (@Name IS NULL) )

but currently you have

((FirstName LIKE '%'+@Name+'%') OR (@Name IS NULL) )
AND
((LastName LIKE '%'+@Name+'%') OR (@Name IS NULL) )

and if @Name IS NOT NULL then it will have to match both of those, because of the "AND", which is only going to happen for people with the same First and Last name

Kristen

BendJoe
Posting Yak Master

128 Posts

Posted - 2007-11-05 : 12:20:28

I understood the point but if I use this SQL
((FirstName + LastName LIKE '%'+@Name+'%') OR (@Name IS NULL) )
and the @Name is 'John Doe'. I wont get the result. If I put @Name as either 'John' or 'Doe'.
I will get the result. I need SQL server to split the words and search in the two columns. I thought freetext was the solution but I am not able to combine two columns with + sign.
I hope you understood the problem.

Kristen
Test

22859 Posts

Posted - 2007-11-05 : 12:31:51

OK, I've added a space.

(FirstName + ' ' + LastName LIKE '%'+@Name+'%')

will match "john doe"

If you want to separately match John against the FirstName and Doe against the LastName you will indeed need to split them up.

What will you do about "Mary Ann Doe" and "Mary du Doe" - you have no way of knowing which two words are First or Last name?

You would be better to pass to separate values, which you have separately asked the user for, as @FirstName and @LastName parameters, and then match on those.

Note that prefixing "%" will perform very badly if you have a large number of rows in the table and/or a busy server. It would be better to ask the user to insert a Wildcard in the appropriate position if they need to - that way most people won't put a "%" prefix, and that will help performance.

What we do is to have a table of Name Components. We split each name, as it is saved to the table, into all the separate words we can make and store them in the Name Components table. Then we split the "search name" the user types in and match those words against our Name Component table. We "rank" the results according to the number of hits, and also how close the match is - exact match scores better than partial match - so searching for "Smith" will rank better against "Smith" than against "Smithe" - we also use SOUNDEX and some other "similarity" algorithms - we would also match "Smyth" and "Smythe" - again, with lower ranking than an exact match.

You could also use Index Server / Free Text Search, but I don't know how good that is at matching names, rather than linguistically similar words.

Kristen

BendJoe
Posting Yak Master

128 Posts

Posted - 2007-11-05 : 15:02:35

Is there anyone who knows how to do freetext on more than one column.
Something like this.
(freetext(UserContact.FirstName + UserContact.LastName, @Name)

Zoroaster
Aged Yak Warrior

702 Posts

Posted - 2007-11-05 : 15:30:00

quote:
Originally posted by BendJoe

Is there anyone who knows how to do freetext on more than one column.
Something like this.
(freetext(UserContact.FirstName + UserContact.LastName, @Name)

What if you just put the columns in a column list (separate with a comma)? I would think the value in @NAME should still be evaluated against both columns:

(freetext((UserContact.FirstName, UserContact.LastName), @name)

Future guru in the making.

BendJoe
Posting Yak Master

128 Posts

Posted - 2007-11-05 : 15:40:33

Thanks mate you saved my evening.
I knew it was something real simple but didn't have the experience to try different syntaxes.

Kristen
Test

22859 Posts

Posted - 2007-11-06 : 10:14:01

They've changed it in SQL 2005 to allow a list of columns I now see.

I still have doubt that this is going to help matching Names as I have always assumed that Full Text query is of the "meaning" of words not the similarity of Sounding

Kristen

bogus
Starting Member

41 Posts

Posted - 2007-11-08 : 12:52:18

This is getting into a deeper topic beyond just sounding alike.

I am trying to create a similarity search so I can clean up our database... it isn't going to be easy, that's for sure.

Here in lies my data problem, an example:

Let's take something simple, like, gee, a university. A good one is UCLA - University of California, Los Angeles. Do you have any idea how many variations of that can exist on a database? A LOT! UCLA, U.C.L.A., UC,LA, Univ. Cal. LA., the list is literally endless. The same applies for Penn State... PSU, Penn State, P-State... not as many, but you get the idea.

The goal is to be able to bring those similar names back together.

I have been researching the Smith-Waterman algorithm, but they are all rather slow... and prepping a database with a gram based index is just painful... it could take hours to put that together.

For those that don't know, gram is a part of a word. For example, "University," would be stored as UN, NI, IV, and so on... that's a 2 byte gram, a 3 byte gram would be UNI, NIV, IVE, etc.

Think of all the databits that need to be compiled... and then maintained!

Here is a good example of how a search can work:

You all watch CSI, right? Great show. Good fun, serious leaps of faith. They run fingerprint searches, and on their computers, it's these trick graphics with images that are matched. That is how the human mind matches prints, but even then, we use the same basic data the computer uses - reference points on the print.

Take a quick look at a finger, and you will see swirls and junctions and things of that nature (I am not going to get to tech on that, but I used to work in identity theft investigations for a police department on the east coast). These points become part of a mathmatical equation that is then converted to a number. That number is stored.

Whenever a new print is to be tested, it is manually referenced for it's points and then the number is created and compared to everything in that index. Bam, 15 seconds later you have a 99.9% match. Very impressive. It will also preset a top 10 match list.

I do have a question: what is the nature of the similarity searching in SQL 2005?

Kristen
Test

22859 Posts

Posted - 2007-11-08 : 13:30:21

Our clients use organisations that specialise in de-duping data for this kind of fuzzy-matching

bogus
Starting Member

41 Posts

Posted - 2007-11-08 : 13:53:04

yea, I was afraid of that.

We are out to create a model that will allow for data quality in real time. For example, a new client comes into our system, let's say from a trade show.

They fill out an info card and we take the contact data and add it to our CRM.

For example, they fill out the card with this info:

John Smith
Some Huge Technical Company
123 Main Street
Anytown, NY 17999 USA

Well... Our search model would take that info and pass it by our RDBMS and return a top ten hit list of matches.

Some Huge Technical Company can become SHTC or Some Huge Tech Co., or any other combo. But because it looks similar, we can then apply the new contact, Mr. Smith, to the right firm, instead of creating three or four (or more) versions of Some Huge Technical Company.

That kinda mass fuzzy matching is easy, we need something that keeps us from needing to do mass updates on a regular basis.

So far, we have determined that a separate database server will be required. It would contain the metrics needed to narrow the search down to the point where the search won't take 30 seconds. We are resisting the idea of grams, simply because of the overhead needed... a single address could create 400 entries in a database table.

I have found a couple of products, but I am terrorified about pricing... I think coding will be my best solution. Amazon listed a couple of very interesting looking books, pricy, yes, but I suspect cheaper than buying the technology.

Kristen
Test

22859 Posts

Posted - 2007-11-08 : 14:10:20

I did a fair amount of work to just try to match "new accounts".

We looked at Telephone number (unique, so a high indicator if it matches, but then we found lots of addresses had the same "agency phone number"

)

Then ZIP code (in the UK our PostCode generally relates to < 20 properties), then lines of address - mixing them around to try to get State / Street matches even if they were switched around in the Address fields, or someone entered an extra address line - like "4th floor" as the first address line.

We had a copy of the "accounts" table that we had cleaned up a bit. Removed all trailing spaces and punctuation. Also all embedded punctuation converted to "space" and adjacent spaces removed, so:

"10, The High Street,"
"10 The High Street"

We made abbreviations consistent:

"10 The High St"
"10 The High Street"

and we probably took out and "the" and other noise words.

Then we tried matching based on that "sanitised" version.

But it took DAYS AND DAYS - of iterative processing - "Gee, look, here's yet-another-variation-of-rule-X" ...

That really tee-d be off, I don't have the stomach for any single job that takes "days" ...

Just in case any of that gives you any ideas

Kristen

bogus
Starting Member

41 Posts

Posted - 2007-11-08 : 14:34:52

oh, yea... I am aware this is gonna take time... way too much of it, I suspect.

I think before this thread melts anymore newbies brains, I should take this discussion to a better place...

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources