| Author |
Topic |
|
paul.rowling
Yak Posting Veteran
81 Posts |
Posted - 2003-08-06 : 10:25:43
|
| Hi,I use some software to identify duplicate records based on inital and surname for people who live at the same address.There is a field in the database called dup_code which stores the identifying value e.g if two records were identified as duplicates they would both be given a dup code of 1. The next preceeding duplicate records would then be given a dup_code of two and so on.The problem is this: for example the software identifies S Andrews, Sarah Andrews, Sharon Andrews as duplicates because they all have the same initial. What I want to be able to do in this scenario is say that none of them are duplicates as we cannot determine whether S Andrews is the same as Sarah or Sharon. On the other hand if the following scenario arose where S Andrews and Sarah Andrews were marked as duplicates you could assume that these are duplicates and can be left as so. In the first scenario I just want to update the dup_code to 0 as 0 means that the records are not duplicates.The software automatically detects that Sharon Andrews and Sharon Andrews are duplicates and so there is nothing more I need to do with scenario's such as this.Can anyone shed any light on this?CheersPaul |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-08-06 : 12:02:21
|
| Anyway, Paul...You would need to more clearly define a dup...How many positions of the first name for example (Sue, Susie, Susan, dups?)Alos what else Birthdate?I imagine 1st init, last name, birth date, state code..somethings we've used in the past...by all means not an absoulteJust the ability to get close...used a total positive match to exclude excats..then get fuzzy...Use a join of 2 reult sets to get a single row and insert into a work table with an IDENTITY ColumnsMOOBrett8-)SELECT POST=NewId() |
 |
|
|
paul.rowling
Yak Posting Veteran
81 Posts |
Posted - 2003-08-06 : 17:33:16
|
| Hi Brett,Thanks for posting, I think I may have confused things with my previous description. The software that I use actually identifies the duplicates for me based on Initial, Surname, Property and Postcode so that I don't have to worry about doing that. The way that it identifies duplicate records is by entering a value in a dup_code field. The problem is that it's not that clever when you have for example the following records all at the same address:S AndrewsSarah AndrewsSharon AndrewsThe software will mark all these records as duplicates because the first letter of the "Forename" field is "S". The software identifies the records as duplicates in the following way:Forename Surname Dup_CodeS Andrews 1Sarah Andrews 1Sharon Andrews 1Paul Rowling 2P Rowling 2Ozzie Osborne 0In this example any record that does not have a duplicate record is always given a dup_code of 0. All other records are given a unique dup_code which will match with an associated duplicate record.Because there is no way to identify if S Andrews is a duplicate with Sarah or Sharon, I want to update their dup_code to 0, but leave all others intact.Does this make sense now........my head hurts!!!You may ask why I want to do this? Well, the DB that I have stores customer info in a customer table and transactions (such as what the customer bought, when they bought it and how much) in a transactions table. Therefore I want to be able to identify duplicate customer records, transfer the transactions of the duplicate person I want to delete to the person I want to keep and then delete duplicate person. In the instance of the "Andrews" example above, I don't know which person to transfer the transactions to and so because I cannot do this accurately I will just take a hit and leave them as they are.Any info on this would be greatly appreciated.ThanksPaul |
 |
|
|
vganesh76
Yak Posting Veteran
64 Posts |
Posted - 2003-08-07 : 01:28:00
|
| Hi Paul, I have solved ur problem in the following query. Hope it works for u.Declare @t table (forename varchar(10),surname varchar(20),dupcode int)insert into @t SELECT 'S', 'Andrews',nullinsert into @t SELECT 'Sarah', 'Andrews', nullinsert into @t SELECT 'Sharon', 'Andrews', nullinsert into @t SELECT 'Paul', 'Rowling', nullinsert into @t SELECT 'P', 'Rowling', nullinsert into @t SELECT 'Ozzie', 'Osborne', nullinsert into @t SELECT 'Ozzie1', 'Osborne', nullDeclare @intctr intDeclare @vchname varchar(20)set @intctr = 0set @vchname =''SELECT forename,surname FROM @t aUPDATE @t set @intctr = dupcode =@intctr + case when @vchname <> surname then 1 else 0 end, @vchname = surname FROM @t aWHERE (SELECT count(*) FROM @t b WHERE a.surname=b.surname) > 1UPDATE @t set dupcode = 0FROM @t aWHERE (SELECT count(*) FROM @t b WHERE a.surname=b.surname) = 1SELECT * FROM @tV.GaneshNet AssetManagement.Comvganesh76@rediffmail.comEnjoy working |
 |
|
|
paul.rowling
Yak Posting Veteran
81 Posts |
Posted - 2003-08-07 : 04:13:58
|
| Hi Vganesh76,"insert into @t SELECT 'S', 'Andrews',nullinsert into @t SELECT 'Sarah', 'Andrews', nullinsert into @t SELECT 'Sharon', 'Andrews', nullinsert into @t SELECT 'Paul', 'Rowling', nullinsert into @t SELECT 'P', 'Rowling', nullinsert into @t SELECT 'Ozzie', 'Osborne', nullinsert into @t SELECT 'Ozzie1', 'Osborne', null"The data that I posted is just an example to help explain my problem. I will be running this on a table with thousands of records, therefore I won't know what records are in there to do the insert statements. Hope this makes sense?CheersPaul |
 |
|
|
vganesh76
Yak Posting Veteran
64 Posts |
Posted - 2003-08-07 : 05:05:47
|
| Hi Paul, The code would work fine just by replacing the table variable (@t )with the actual table . Hope I dont find any problems with the query.Enjoy working |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-08-07 : 05:17:39
|
| How about the case:S Andrews 1Sarah Andrews 1Sharon Andrews 1Sharon Andrews 1Do you want four 0 instead of 1?- Vit |
 |
|
|
Amethystium
Aged Yak Warrior
701 Posts |
Posted - 2003-08-07 : 05:41:16
|
quote: Originally posted by X002548What the hell is that all about? (Crespo?)
Assumptions... assumptions. Don't do it Brett. ----------------Shadow to Light |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-08-07 : 06:13:43
|
quote: Originally posted by AmethystiumAssumptions... assumptions.Don't do it Brett.
What the hell is that all about? (Brett?)- Vit |
 |
|
|
Amethystium
Aged Yak Warrior
701 Posts |
Posted - 2003-08-07 : 06:35:12
|
Vit,I spotted this question on the forum yesterday and the first answer was quite rude. If you look at Brett's original reply you will find the posters response.I was actually tempted to report it then I remembered something from the past and the sheer irony of it made me abandon my plan. When I came in this morning I found that the insulting post was taken down but somehow Brett managed to sneak in his own reply before it was taken down (obviously).I kind of regret my reply to Brett but I just feel really strongly about people making assumptions. Seriously, 90% of mankind’s problems are based on false assumptions.Anyway... ----------------Shadow to Light |
 |
|
|
paul.rowling
Yak Posting Veteran
81 Posts |
Posted - 2003-08-07 : 06:46:09
|
| Hi, In response to Stoad, the two Sharon Andrews should be left as 1 because they are duplicates with each other, the other two records would then be updated to 0. |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-08-07 : 07:36:05
|
| update myTable t set dup_code=0where1=(select count(*) from myTable tt wherett.col1=t.col1 and tt.col2=t.col2 and .... and tt.colN=t.colN)replace 'myTable' with your table name and [col...] with names ofeach column of your table. Hope there are not too many of them in there.- Vit |
 |
|
|
paul.rowling
Yak Posting Veteran
81 Posts |
Posted - 2003-08-07 : 08:35:12
|
| Hi Stoad, thanks for your reply. Sorry to be a pain, but I don't understand this line:1=(select count(*) from myTable tt wherett.col1=t.col1 and tt.col2=t.col2 and .... and tt.colN=t.colN)Can you please explain?CheersPaul |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-08-07 : 08:48:37
|
| This line seek for number of records in the tablewith the same values in corresponding columns as theyare in the 'current' record from outer 'update' instruction.If it is found only one such record then it is obviously 'unique'one and its dup_code value should be assign to 0. I think so...- Vit |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-08-07 : 09:17:45
|
| Finr...Fine...Fine...GONE!However, if it walks like a duck annd squawks like a duck and swims like a duck..then I must Assume it's a YAK...Most of the worlds...oh forget it....[EDIT]...and it wasn't an assumption...it was an inquiry[/EDIT]Brett8-)SELECT POST=NewId() |
 |
|
|
Amethystium
Aged Yak Warrior
701 Posts |
Posted - 2003-08-07 : 09:24:18
|
Brett mate,It's cool. Let's just stick to SQL Server and music!----------------Shadow to Light |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-08-07 : 09:30:47
|
| Yes, better left to the corral...Brett8-)SELECT POST=NewId() |
 |
|
|
|