How to get data mis matches - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

How to get data mis matches

Author

Topic

ivra
Starting Member

18 Posts

Posted - 2010-01-29 : 04:37:41

Hi,

I have a 3 tables with person_id field on each table.

I created a new table (match_data) where all person_id from 3 tables matches.

My problem is how can I get the un match person_Id from the 3 tables.

I'm running a query now:

select distinct ftn.person_id,ftn.OptIn,dpa.Description as dpa_desc, c_dpa.Description as cdpa_desc
into unmatch_data
from ftn, dpa, c_dpa
where ftn.person_id not in (select ftn.person_id from match_data)

order by ftn.person_id

The query is running for almost an hour now. Is there any other way I can get the mis matches using different faster approach?

Thanks all!

Kristen
Test

22859 Posts

Posted - 2010-01-29 : 04:44:12

[code]
from ftn, dpa, c_dpa
[/code]
Cartesian Join!!!!

This will link every record in FTN to every record in DPA, and then for every one of those it will link them to every row in C_DPA.

Are you sure that's what you want?

Kristen
Test

22859 Posts

Posted - 2010-01-29 : 04:47:00

Actaully, slightly inaccirate,. it will only take the rows from FTN that are not in the data from match_data, but that is wrong too I'm afraid:


not in (select ftn.person_id from match_data)

This is going to select the person_ID from the CURRENT FTN row, it gets nothing from match_data, and that will always be false ... so actually its going to match everything with everything from only the other two tables

... and then it will do a SORT of the whole lot and throw away the duplicates - which will take a ~~little~~ long while

ivra
Starting Member

18 Posts

Posted - 2010-01-29 : 05:02:38

thank for pointing out my mistakes.

got what I want

mymatrix
Starting Member

24 Posts

Posted - 2010-01-29 : 05:11:47

Please correct me if i have understood your requirement incorrectly
You need below mentioned 4 columns for unmatch_data table:
person_Id
OptIn
dpa_desc
cdpa_desc

or you need two columns only with first column as person_Id value of second column changed to OptIn,dpa_desc,cdpa_desc depending on table (ftn,dpa,c_dpa). Please advice.

insert into unmatch_data
select ftn.Person_Id, ftn.OptIn,default,default from ftn where ftn.Person_Id not in (select person_Id from match_Data)
union
select dpa.Person_Id, default,dpa.Description,default from dpa where dpa.Person_Id not in (select person_Id from match_Data)
union
select c_dpa.Person_Id, default,default,c_dpa.description from ftn where c_dpa.Person_Id not in (select person_Id from match_Data)

This sql statement may create some duplicate records but performance wise it would be faster. If this suits your requirment, let me know. I will improve it further.

thnks
Gaurav

Even my blood group says be -ve to all the negatives.

ivra
Starting Member

18 Posts

Posted - 2010-01-29 : 05:25:26

I just run:

select distinct ftn.person_id,ftn.OptIn,dpa.Description as dpa_desc, c_dpa.Description as cdpa_desc
into unmatch_data
from ftn, dpa, c_dpa
where ftn.person_id not in (select match_data.person_id from match_data)
and dpa.person_id not in (select match_data.person_id from match_data)
and c_dpa.person_id not in (select match_data.person_id from match_data)

order by ftn.person_id

but gives me duplicates..How can I rid of duplicates?

Kristen
Test

22859 Posts

Posted - 2010-01-29 : 05:45:30

I would use FULL OUTER JOIN

You could just store the ID from the three tables (DISTINCT will give you no duplicates) - but then you will have to LEFT JOIN it back to the three tables to find any other data.

Your problem is that your select list has columns like ftn.OptIn. That will have a value if the row comes from FTN, but no value if it comes from the other two tables. So there will never be an identical match between the rows from each table, and thus DISTINCT can't resolve that.

Kristen
Test

22859 Posts

Posted - 2010-01-29 : 05:48:44

Probably something like this:


select DISTINCT COALESCE(ftn.person_id, dpa.person_id, c_dpa.person_id),
       ftn.OptIn,dpa.Description as dpa_desc, c_dpa.Description as cdpa_desc
into unmatch_data
from ftn
     FULL OUTER JOIN dpa
         ON dpa.person_id = ftn.person_id
     FULL OUTER JOIN c_dpa
         ON c_dpa.person_id = ftn.person_id
where COALESCE(ftn.person_id, dpa.person_id, c_dpa.person_id) not in (select match_data.person_id from match_data)

You will still get DUPs if there are multiple rows within each table for the same ID and the other columns have different values.

If there are lots of duplicates within EACH table the performance of this will be terrible. But hopefully person_id is Unique within each table.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources