Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Help killing my cursor

Author  Topic 

newbieee
Starting Member

6 Posts

Posted - 2009-02-18 : 20:55:16
Hi,

I've proto-typed a solution to match customer records from varying sources. There may be up to 1million records in the customers table so I need to progress this into a set-based solution for efficiency (if that’s possible). Can any one clear the haze for me?

The select SQL below is just one of 5 queries that will run and to try and match the customer records. The idea is that when certain records are matched they will be inserted into the match table with a common id/BureauID that links them together. As new customer records arrive in the database they may match with an existing record that already has a BureauID so I need to examine the match records to extract the ID if it exists.

I had initially proto-typed a set-based query using ROW_NUMBER to assign a number to those records that satisfied the criteria* - with the idea of moving down the resultset and inserting into the match table on each break in the number, but the introduction of synonyms on Surname, Firstname and SecondName threw out the PARTITION BY clause….and now I'm stuck.

Any ideas greatly appreciated.


* Matching criteria
Surname (equal or synonym) AND
Firstname (equal or synonym) AND
SecondName (equal or synonym or Initial) AND
DateOfBirth (minimum 6 out of 8 - looking for transposed digits) AND
Gender (equal) AND
Streetname (equal to first 6 characters) OR
PostCode (equal)


SELECT TOP (100) PERCENT t.SubscriberID, t.CustomerID, t.Surname, t.FirstName, t.SecondName, t.Initial, t.SecondName_RestOf, t.DateOfBirth, t.GenderID,
t.PostCode, t.Street6, dbo.tblMatch.BureauID
FROM dbo.tblCustomer AS t LEFT OUTER JOIN
dbo.tblMatch ON t.CustomerID = dbo.tblMatch.CustomerID AND t.SubscriberID = dbo.tblMatch.SubscriberID
WHERE (t.Surname IN
(SELECT NameMatch
FROM dbo.NameX_Surname
WHERE (Name = @Surname))) AND (t.FirstName IN
(SELECT NameMatch
FROM dbo.NameX_Forename
WHERE (Name = @Firstname))) AND (t.SecondName IN
(SELECT NameMatch
FROM dbo.NameX_Forename AS NameX_Forename_1
WHERE (Name = @SecondName))) AND (dbo.fnCheckDOB(t.DateOfBirth, @DateOfBirth) > 5) AND (t.GenderID = @GenderID) AND
(t.Street6 = @Street6) OR
(t.Surname IN
(SELECT NameMatch
FROM dbo.NameX_Surname AS NameX_Surname_4
WHERE (Name = @Surname))) AND (t.FirstName IN
(SELECT NameMatch
FROM dbo.NameX_Forename AS NameX_Forename_5
WHERE (Name = @Firstname))) AND (t.SecondName IN
(SELECT NameMatch
FROM dbo.NameX_Forename AS NameX_Forename_1
WHERE (Name = @SecondName))) AND (dbo.fnCheckDOB(t.DateOfBirth, @DateOfBirth) > 5) AND (t.GenderID = @GenderID) AND
(t.PostCode = @PostCode) OR
(t.Surname IN
(SELECT NameMatch
FROM dbo.NameX_Surname AS NameX_Surname_3
WHERE (Name = @Surname))) AND (t.FirstName IN
(SELECT NameMatch
FROM dbo.NameX_Forename AS NameX_Forename_4
WHERE (Name = @Firstname))) AND (@Initial = t.Initial) AND (@SecondName_RestOf = '') AND (dbo.fnCheckDOB(t.DateOfBirth, @DateOfBirth) > 5)
AND (t.GenderID = @GenderID) AND (t.Street6 = @Street6) OR
(t.Surname IN
(SELECT NameMatch
FROM dbo.NameX_Surname AS NameX_Surname_3
WHERE (Name = @Surname))) AND (t.FirstName IN
(SELECT NameMatch
FROM dbo.NameX_Forename AS NameX_Forename_4
WHERE (Name = @Firstname))) AND (@Initial = t.Initial) AND (@SecondName_RestOf = '') AND (dbo.fnCheckDOB(t.DateOfBirth, @DateOfBirth) > 5)
AND (t.GenderID = @GenderID) AND (t.PostCode = @PostCode) OR
(t.Surname IN
(SELECT NameMatch
FROM dbo.NameX_Surname AS NameX_Surname_2
WHERE (Name = @Surname))) AND (t.FirstName IN
(SELECT NameMatch
FROM dbo.NameX_Forename AS NameX_Forename_3
WHERE (Name = @Firstname))) AND (@Initial = t.Initial) AND (t.SecondName_RestOf = '') AND (dbo.fnCheckDOB(t.DateOfBirth, @DateOfBirth) > 5)
AND (t.GenderID = @GenderID) AND (t.Street6 = @Street6) OR
(t.Surname IN
(SELECT NameMatch
FROM dbo.NameX_Surname AS NameX_Surname_2
WHERE (Name = @Surname))) AND (t.FirstName IN
(SELECT NameMatch
FROM dbo.NameX_Forename AS NameX_Forename_3
WHERE (Name = @Firstname))) AND (@Initial = t.Initial) AND (t.SecondName_RestOf = '') AND (dbo.fnCheckDOB(t.DateOfBirth, @DateOfBirth) > 5)
AND (t.GenderID = @GenderID) AND (t.PostCode = @PostCode) OR
(t.Surname IN
(SELECT NameMatch
FROM dbo.NameX_Surname AS NameX_Surname_1
WHERE (Name = @Surname))) AND (t.FirstName IN
(SELECT NameMatch
FROM dbo.NameX_Forename AS NameX_Forename_2
WHERE (Name = @Firstname))) AND (@Initial = t.Initial) AND (@SecondName_RestOf = t.SecondName_RestOf) AND
(dbo.fnCheckDOB(t.DateOfBirth, @DateOfBirth) > 5) AND (t.GenderID = @GenderID) AND (t.Street6 = @Street6) OR
(t.Surname IN
(SELECT NameMatch
FROM dbo.NameX_Surname AS NameX_Surname_1
WHERE (Name = @Surname))) AND (t.FirstName IN
(SELECT NameMatch
FROM dbo.NameX_Forename AS NameX_Forename_2
WHERE (Name = @Firstname))) AND (@Initial = t.Initial) AND (@SecondName_RestOf = t.SecondName_RestOf) AND
(dbo.fnCheckDOB(t.DateOfBirth, @DateOfBirth) > 5) AND (t.GenderID = @GenderID) AND (t.PostCode = @PostCode)


CREATE TABLE [dbo].[tblCustomer](
[SubscriberID] [nvarchar](4) NOT NULL,
[CustomerID] [int] IDENTITY(1,1) NOT NULL,
[Surname] [varchar](35) NOT NULL,
[FirstName] [varchar](35) NOT NULL,
[SecondName] [varchar](35) NULL,
[SecondName_RestOf] [varchar](35) NULL,
[Initial] [char](1) NULL,
[GenderID] [char](1) NOT NULL,
[DateOfBirth] [int] NOT NULL,
[Employer] [nvarchar](50) NULL,
[Occupation] [nvarchar](50) NULL,
[Street6] [varchar](6) NULL,
[PostCode] [int] NULL
) ON [PRIMARY]

CREATE TABLE [dbo].[tblMatch](
[BureauID] [bigint] NOT NULL,
[SubscriberID] [int] NOT NULL,
[CustomerID] [int] NOT NULL,
[Date] [datetime] NOT NULL,
[UserID] [nvarchar](10) NOT NULL,
) ON [PRIMARY]

CREATE TABLE [dbo].[NameX_Surname](
[Name] [varchar](50) NOT NULL,
[NameMatch] [varchar](50) NOT NULL,
[Score] [int] NOT NULL
) ON [PRIMARY]

CREATE TABLE [dbo].[NameX_Forename](
[Name] [varchar](50) NULL,
[NameMatch] [varchar](50) NULL,
[Score] [int] NULL
) ON [PRIMARY]

X002548
Not Just a Number

15586 Posts

Posted - 2009-02-18 : 21:11:13
Are you kidding...what's the plan on that

Got any keys on the tables?



Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam



Go to Top of Page

newbieee
Starting Member

6 Posts

Posted - 2009-02-18 : 21:59:11
I wish i was kidding...

below are the indexes i have on the tables. put there mostly as a result of the database tuning wizard.

i have saved an execution plan if you want to look at it. how do i get that to you?

CREATE NONCLUSTERED INDEX [_dta_index_tblCustomer_32_1282103608__K7_K8_K9_K12_K6_K3_K4_K2_K1_K13] ON [dbo].[tblCustomer]
(
[Initial] ASC,
[GenderID] ASC,
[DateOfBirth] ASC,
[Street6] ASC,
[SecondName_RestOf] ASC,
[Surname] ASC,
[FirstName] ASC,
[CustomerID] ASC,
[SubscriberID] ASC,
[PostCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

CREATE CLUSTERED INDEX [IX_tblCustomer_Surname] ON [dbo].[tblCustomer]
(
[Surname] ASC,
[FirstName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

CREATE UNIQUE NONCLUSTERED INDEX [IX_tblCustomer_CustomerID] ON [dbo].[tblCustomer]
(
[CustomerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]


CREATE NONCLUSTERED INDEX [_dta_index_tblMatch_32_1298103665__K3_K2] ON [dbo].[tblMatch]
(
[CustomerID] ASC,
[SubscriberID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2009-02-18 : 22:12:18
Let's try this

Give us the data as it is, and what the final data should look like

Do this is in DML Form

SELECT 'xxx','xxx' UNION ALL
SELECT 'xxx','xxx' UNION ALL
SELECT 'xxx','xxx' UNION ALL
SELECT 'xxx','xxx' UNION ALL
ect



Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam



Go to Top of Page

newbieee
Starting Member

6 Posts

Posted - 2009-02-18 : 22:57:20
having some serious brain fade here brett but hoping this is what you're after:

--tblCustomer
SELECT '1',1,'Abbott','Colin','John','ohn','J','M','20080131','TestSt',1024 UNION ALL
SELECT '1',2,'Abbott','Colin','Gordon','ordon','G','M','20080131','TestS1',1024 UNION ALL
SELECT '1',3,'Abbott','David','Jeffery','effery','J','M','20080131','TestSt',1024 UNION ALL
SELECT '1',4,'Abbott','Dean','Leslie','eslie','L','M','20080131','TestS1',1024 UNION ALL
SELECT '1',5,'Abbott','Deborah','Joy','oy','J','M','20080131','TestSt',1024 UNION ALL
SELECT '1',6,'Abbott','Donna','','','','M','20080131','TestSt',1024 UNION ALL
SELECT '1',7,'Abbott','Emma','Jayne','ayne','J','M','20080131','TestSt',1024 UNION ALL
SELECT '1',8,'Abbott','Douglas','','','S','M','20080131','TestSt',1024 UNION ALL
SELECT '1',9,'Abbott','Simon','Peter','eter','P','M','20080131','TestSt6',1024 UNION ALL
SELECT '1',10,'Abbott','Douglas','Simon','imon','S','M','20080131','TestSt',1024

--tblMatch
SELECT 1000, '1', 10, convert(datetime,'2009-02-19'),'TEST' UNION ALL
SELECT 1000, '1', 8, convert(datetime,'2009-02-19'),'TEST'


the final data should be able to be queried using s'thing simple like:
SELECT c.*, m.BureauID FROM tblCustomer c LEFT JOIN tblMatch m ON c.CustomerID = m.CustomerID

Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2009-02-19 : 08:40:55
ok good so far...but what's the final result you are after



Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam



Go to Top of Page

newbieee
Starting Member

6 Posts

Posted - 2009-02-19 : 14:52:58
the final result: each record that 'matches' as a result of the query should have their CustomerID inserted into tblMatch along with a unique integer for BureauID to denote they all belong to the same group.

when i was using ROW_NUMBER and PARTITION BY as an initial attempt at set-based logic, i had envisaged a resultset being return to me like the one below, that would allow me to iterate through the records (using either a server-side cursor or client side, ADO.NET) and on the break of the group i would create a unique BureauID & insert the CustomerIDs into tblMatch.

CustomerID Surname Firstname Secondname 2nd_RestOf Initial Gender DOB Street Postcode RowNum_Group
1 Abbott Colin John ohn J M 20080131 TestSt 1024 1
2 Abbott Colin Gordon ordon G M 20080131 TestS1 1024 1
3 Abbott David Jeffery effery J M 20080131 TestSt 1024 1
4 Abbott David J M 20080131 TestS1 1024 2
5 Abbott Deborah Joy oy J M 20080131 TestSt 1024 1
6 Abbott Donna M 20080131 TestSt 1024 1
7 Abbott Emma Jayne ayne J M 20080131 TestSt 1024 1
10 Abbott Douglas Simon imon S M 20080131 TestSt 1024 1
8 Abbott Douglas S M 20080131 TestSt 1024 2
9 Abbott Simon Peter eter P M 20080131 TestSt6 1024 1

the use of synoymns in the NameX table put paid to that approach. Eg. If Douglass was also equal to Douglas then, Douglass would be returned as the surname and ruin the PARTITION BY clause which relied on surname, firstname..etc to create the group.


Go to Top of Page
   

- Advertisement -