SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 deduping 20 files in certain order
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

tryharder
Starting Member

United Kingdom
16 Posts

Posted - 09/30/2013 :  14:29:09  Show Profile  Reply with Quote
Hi,

I have 20 files to dedupe in a certain order, I was going to build a match key based on surname and 6 characters of address line 1but does anyone have a script to run the dedupe based on the match key and by a unique keycode for each file?

Thanks in advance
Sean

SP

James K
Flowing Fount of Yak Knowledge

3744 Posts

Posted - 09/30/2013 :  14:40:42  Show Profile  Reply with Quote
Like shown below. Didn't get what you meant by "unique keycode". Is that another column in the table that you want to order by to determine which rows to delete?
;WITH cte AS
(
	SELECT ROW_NUMBER() OVER
	(
		PARTITION BY surname + LEFT(addressline1,6)
		ORDER BY (SELECT NULL) -- or some other condition
	) AS RN
	FROM YourTable
)
DELETE FROM cte WHERE RN > 1;
Go to Top of Page

tryharder
Starting Member

United Kingdom
16 Posts

Posted - 09/30/2013 :  15:14:53  Show Profile  Reply with Quote
The keycode is just a way to identify which file that record came from. I'm not the best at explaining so bear with me pls

I have 20 files which are standardised and given a keycodebto identify which file the record comes from, the files are then merged and dedupe is carried starting at file 20 which is deduped against the 19 other files then file 19 is deduped against the other 18 files ect until you end up with a file full of unique records but I also need to know how many records where lost from each of the sub file ( based in keycode) and to which other file so i effectively get a matrix which shows the losses against each file.
Hope this makes sense, if not bear with me!! :-)

SP
Go to Top of Page

James K
Flowing Fount of Yak Knowledge

3744 Posts

Posted - 09/30/2013 :  15:52:13  Show Profile  Reply with Quote
Here is a way to do what you described - it is sort of a bruteforce approch, but the logic should be easy to follow. And, by the way, your explanation makes perfect sense :) Instead of the delete in each query, uncomment the select line and run that to see the count of duplicates.
-- Starting with file #20;
;WITH cte AS
(
	surname + LEFT(addressline1,6) AS DuplicateCriterion,
	SELECT ROW_NUMBER() OVER
	(
		PARTITION BY surname + LEFT(addressline1,6)
		ORDER BY (SELECT NULL) -- or some other condition
	) AS RN
	FROM YourTable
	WHERE keycode = 20;
)
--SELECT DuplicateCriterion,COUNT(*) AS Records FROM cte
--HAVING COUNT(*) > 1
DELETE FROM cte WHERE RN > 1;

-- For File 20 and File 19;
;WITH cte AS
(
	surname + LEFT(addressline1,6) AS DuplicateCriterion,
	SELECT ROW_NUMBER() OVER
	(
		PARTITION BY surname + LEFT(addressline1,6)
		ORDER BY (SELECT NULL) -- or some other condition
	) AS RN
	FROM YourTable
	WHERE keycode IN (19,20)
)
--SELECT DuplicateCriterion,COUNT(*) AS Records FROM cte
--HAVING COUNT(*) > 1
DELETE FROM cte WHERE RN > 1;

-- repeat for the remaining by changing the where clause to
-- WHERE keycode IN (18,19,20)
-- WHERE keycode IN (17,18,19,20)
-- etc.
Go to Top of Page

tryharder
Starting Member

United Kingdom
16 Posts

Posted - 09/30/2013 :  16:06:38  Show Profile  Reply with Quote
Many thanks for your help will give it a go tomorrow.

SP
Go to Top of Page

tryharder
Starting Member

United Kingdom
16 Posts

Posted - 10/01/2013 :  16:30:55  Show Profile  Reply with Quote
Many thanks for your help will give it a go tomorrow.

SP
Go to Top of Page

waterduck
Aged Yak Warrior

Malaysia
965 Posts

Posted - 10/01/2013 :  21:27:50  Show Profile  Reply with Quote
shouldn't this enuf?

;WITH cte AS
(
	surname + LEFT(addressline1,6) AS DuplicateCriterion,
	SELECT ROW_NUMBER() OVER
	(
		PARTITION BY surname + LEFT(addressline1,6)
		ORDER BY keycode DESC-- or some other condition
	) AS RN
	FROM YourTable
)
DELETE FROM cte WHERE RN > 1;
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.08 seconds. Powered By: Snitz Forums 2000