Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
newbieee
Starting Member
6 Posts |
Posted - 2009-02-18 : 20:55:16
|
| Hi,I've proto-typed a solution to match customer records from varying sources. There may be up to 1million records in the customers table so I need to progress this into a set-based solution for efficiency (if that’s possible). Can any one clear the haze for me?The select SQL below is just one of 5 queries that will run and to try and match the customer records. The idea is that when certain records are matched they will be inserted into the match table with a common id/BureauID that links them together. As new customer records arrive in the database they may match with an existing record that already has a BureauID so I need to examine the match records to extract the ID if it exists.I had initially proto-typed a set-based query using ROW_NUMBER to assign a number to those records that satisfied the criteria* - with the idea of moving down the resultset and inserting into the match table on each break in the number, but the introduction of synonyms on Surname, Firstname and SecondName threw out the PARTITION BY clause….and now I'm stuck.Any ideas greatly appreciated.* Matching criteriaSurname (equal or synonym) ANDFirstname (equal or synonym) ANDSecondName (equal or synonym or Initial) ANDDateOfBirth (minimum 6 out of 8 - looking for transposed digits) ANDGender (equal) ANDStreetname (equal to first 6 characters) ORPostCode (equal)SELECT TOP (100) PERCENT t.SubscriberID, t.CustomerID, t.Surname, t.FirstName, t.SecondName, t.Initial, t.SecondName_RestOf, t.DateOfBirth, t.GenderID, t.PostCode, t.Street6, dbo.tblMatch.BureauIDFROM dbo.tblCustomer AS t LEFT OUTER JOIN dbo.tblMatch ON t.CustomerID = dbo.tblMatch.CustomerID AND t.SubscriberID = dbo.tblMatch.SubscriberIDWHERE (t.Surname IN (SELECT NameMatch FROM dbo.NameX_Surname WHERE (Name = @Surname))) AND (t.FirstName IN (SELECT NameMatch FROM dbo.NameX_Forename WHERE (Name = @Firstname))) AND (t.SecondName IN (SELECT NameMatch FROM dbo.NameX_Forename AS NameX_Forename_1 WHERE (Name = @SecondName))) AND (dbo.fnCheckDOB(t.DateOfBirth, @DateOfBirth) > 5) AND (t.GenderID = @GenderID) AND (t.Street6 = @Street6) OR (t.Surname IN (SELECT NameMatch FROM dbo.NameX_Surname AS NameX_Surname_4 WHERE (Name = @Surname))) AND (t.FirstName IN (SELECT NameMatch FROM dbo.NameX_Forename AS NameX_Forename_5 WHERE (Name = @Firstname))) AND (t.SecondName IN (SELECT NameMatch FROM dbo.NameX_Forename AS NameX_Forename_1 WHERE (Name = @SecondName))) AND (dbo.fnCheckDOB(t.DateOfBirth, @DateOfBirth) > 5) AND (t.GenderID = @GenderID) AND (t.PostCode = @PostCode) OR (t.Surname IN (SELECT NameMatch FROM dbo.NameX_Surname AS NameX_Surname_3 WHERE (Name = @Surname))) AND (t.FirstName IN (SELECT NameMatch FROM dbo.NameX_Forename AS NameX_Forename_4 WHERE (Name = @Firstname))) AND (@Initial = t.Initial) AND (@SecondName_RestOf = '') AND (dbo.fnCheckDOB(t.DateOfBirth, @DateOfBirth) > 5) AND (t.GenderID = @GenderID) AND (t.Street6 = @Street6) OR (t.Surname IN (SELECT NameMatch FROM dbo.NameX_Surname AS NameX_Surname_3 WHERE (Name = @Surname))) AND (t.FirstName IN (SELECT NameMatch FROM dbo.NameX_Forename AS NameX_Forename_4 WHERE (Name = @Firstname))) AND (@Initial = t.Initial) AND (@SecondName_RestOf = '') AND (dbo.fnCheckDOB(t.DateOfBirth, @DateOfBirth) > 5) AND (t.GenderID = @GenderID) AND (t.PostCode = @PostCode) OR (t.Surname IN (SELECT NameMatch FROM dbo.NameX_Surname AS NameX_Surname_2 WHERE (Name = @Surname))) AND (t.FirstName IN (SELECT NameMatch FROM dbo.NameX_Forename AS NameX_Forename_3 WHERE (Name = @Firstname))) AND (@Initial = t.Initial) AND (t.SecondName_RestOf = '') AND (dbo.fnCheckDOB(t.DateOfBirth, @DateOfBirth) > 5) AND (t.GenderID = @GenderID) AND (t.Street6 = @Street6) OR (t.Surname IN (SELECT NameMatch FROM dbo.NameX_Surname AS NameX_Surname_2 WHERE (Name = @Surname))) AND (t.FirstName IN (SELECT NameMatch FROM dbo.NameX_Forename AS NameX_Forename_3 WHERE (Name = @Firstname))) AND (@Initial = t.Initial) AND (t.SecondName_RestOf = '') AND (dbo.fnCheckDOB(t.DateOfBirth, @DateOfBirth) > 5) AND (t.GenderID = @GenderID) AND (t.PostCode = @PostCode) OR (t.Surname IN (SELECT NameMatch FROM dbo.NameX_Surname AS NameX_Surname_1 WHERE (Name = @Surname))) AND (t.FirstName IN (SELECT NameMatch FROM dbo.NameX_Forename AS NameX_Forename_2 WHERE (Name = @Firstname))) AND (@Initial = t.Initial) AND (@SecondName_RestOf = t.SecondName_RestOf) AND (dbo.fnCheckDOB(t.DateOfBirth, @DateOfBirth) > 5) AND (t.GenderID = @GenderID) AND (t.Street6 = @Street6) OR (t.Surname IN (SELECT NameMatch FROM dbo.NameX_Surname AS NameX_Surname_1 WHERE (Name = @Surname))) AND (t.FirstName IN (SELECT NameMatch FROM dbo.NameX_Forename AS NameX_Forename_2 WHERE (Name = @Firstname))) AND (@Initial = t.Initial) AND (@SecondName_RestOf = t.SecondName_RestOf) AND (dbo.fnCheckDOB(t.DateOfBirth, @DateOfBirth) > 5) AND (t.GenderID = @GenderID) AND (t.PostCode = @PostCode)CREATE TABLE [dbo].[tblCustomer]( [SubscriberID] [nvarchar](4) NOT NULL, [CustomerID] [int] IDENTITY(1,1) NOT NULL, [Surname] [varchar](35) NOT NULL, [FirstName] [varchar](35) NOT NULL, [SecondName] [varchar](35) NULL, [SecondName_RestOf] [varchar](35) NULL, [Initial] [char](1) NULL, [GenderID] [char](1) NOT NULL, [DateOfBirth] [int] NOT NULL, [Employer] [nvarchar](50) NULL, [Occupation] [nvarchar](50) NULL, [Street6] [varchar](6) NULL, [PostCode] [int] NULL) ON [PRIMARY]CREATE TABLE [dbo].[tblMatch]( [BureauID] [bigint] NOT NULL, [SubscriberID] [int] NOT NULL, [CustomerID] [int] NOT NULL, [Date] [datetime] NOT NULL, [UserID] [nvarchar](10) NOT NULL,) ON [PRIMARY]CREATE TABLE [dbo].[NameX_Surname]( [Name] [varchar](50) NOT NULL, [NameMatch] [varchar](50) NOT NULL, [Score] [int] NOT NULL) ON [PRIMARY] CREATE TABLE [dbo].[NameX_Forename]( [Name] [varchar](50) NULL, [NameMatch] [varchar](50) NULL, [Score] [int] NULL) ON [PRIMARY] |
|
|
X002548
Not Just a Number
15586 Posts |
|
|
newbieee
Starting Member
6 Posts |
Posted - 2009-02-18 : 21:59:11
|
| I wish i was kidding...below are the indexes i have on the tables. put there mostly as a result of the database tuning wizard.i have saved an execution plan if you want to look at it. how do i get that to you?CREATE NONCLUSTERED INDEX [_dta_index_tblCustomer_32_1282103608__K7_K8_K9_K12_K6_K3_K4_K2_K1_K13] ON [dbo].[tblCustomer] ( [Initial] ASC, [GenderID] ASC, [DateOfBirth] ASC, [Street6] ASC, [SecondName_RestOf] ASC, [Surname] ASC, [FirstName] ASC, [CustomerID] ASC, [SubscriberID] ASC, [PostCode] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]CREATE CLUSTERED INDEX [IX_tblCustomer_Surname] ON [dbo].[tblCustomer] ( [Surname] ASC, [FirstName] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]CREATE UNIQUE NONCLUSTERED INDEX [IX_tblCustomer_CustomerID] ON [dbo].[tblCustomer] ( [CustomerID] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]CREATE NONCLUSTERED INDEX [_dta_index_tblMatch_32_1298103665__K3_K2] ON [dbo].[tblMatch] ( [CustomerID] ASC, [SubscriberID] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] |
 |
|
|
X002548
Not Just a Number
15586 Posts |
|
|
newbieee
Starting Member
6 Posts |
Posted - 2009-02-18 : 22:57:20
|
| having some serious brain fade here brett but hoping this is what you're after:--tblCustomerSELECT '1',1,'Abbott','Colin','John','ohn','J','M','20080131','TestSt',1024 UNION ALLSELECT '1',2,'Abbott','Colin','Gordon','ordon','G','M','20080131','TestS1',1024 UNION ALLSELECT '1',3,'Abbott','David','Jeffery','effery','J','M','20080131','TestSt',1024 UNION ALLSELECT '1',4,'Abbott','Dean','Leslie','eslie','L','M','20080131','TestS1',1024 UNION ALLSELECT '1',5,'Abbott','Deborah','Joy','oy','J','M','20080131','TestSt',1024 UNION ALLSELECT '1',6,'Abbott','Donna','','','','M','20080131','TestSt',1024 UNION ALLSELECT '1',7,'Abbott','Emma','Jayne','ayne','J','M','20080131','TestSt',1024 UNION ALLSELECT '1',8,'Abbott','Douglas','','','S','M','20080131','TestSt',1024 UNION ALLSELECT '1',9,'Abbott','Simon','Peter','eter','P','M','20080131','TestSt6',1024 UNION ALLSELECT '1',10,'Abbott','Douglas','Simon','imon','S','M','20080131','TestSt',1024 --tblMatchSELECT 1000, '1', 10, convert(datetime,'2009-02-19'),'TEST' UNION ALLSELECT 1000, '1', 8, convert(datetime,'2009-02-19'),'TEST'the final data should be able to be queried using s'thing simple like:SELECT c.*, m.BureauID FROM tblCustomer c LEFT JOIN tblMatch m ON c.CustomerID = m.CustomerID |
 |
|
|
X002548
Not Just a Number
15586 Posts |
|
|
newbieee
Starting Member
6 Posts |
Posted - 2009-02-19 : 14:52:58
|
| the final result: each record that 'matches' as a result of the query should have their CustomerID inserted into tblMatch along with a unique integer for BureauID to denote they all belong to the same group. when i was using ROW_NUMBER and PARTITION BY as an initial attempt at set-based logic, i had envisaged a resultset being return to me like the one below, that would allow me to iterate through the records (using either a server-side cursor or client side, ADO.NET) and on the break of the group i would create a unique BureauID & insert the CustomerIDs into tblMatch.CustomerID Surname Firstname Secondname 2nd_RestOf Initial Gender DOB Street Postcode RowNum_Group1 Abbott Colin John ohn J M 20080131 TestSt 1024 12 Abbott Colin Gordon ordon G M 20080131 TestS1 1024 13 Abbott David Jeffery effery J M 20080131 TestSt 1024 14 Abbott David J M 20080131 TestS1 1024 25 Abbott Deborah Joy oy J M 20080131 TestSt 1024 16 Abbott Donna M 20080131 TestSt 1024 17 Abbott Emma Jayne ayne J M 20080131 TestSt 1024 110 Abbott Douglas Simon imon S M 20080131 TestSt 1024 18 Abbott Douglas S M 20080131 TestSt 1024 29 Abbott Simon Peter eter P M 20080131 TestSt6 1024 1the use of synoymns in the NameX table put paid to that approach. Eg. If Douglass was also equal to Douglas then, Douglass would be returned as the surname and ruin the PARTITION BY clause which relied on surname, firstname..etc to create the group. |
 |
|
|
|
|
|
|
|