Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 duplicates

Author  Topic 

ovince
Starting Member

32 Posts

Posted - 2006-11-03 : 11:03:32
Hello to All Forumers,

this may be a trivial question but my knowledge of SQL is not enough to solve it.

I have 2 databeses and I would like to match them using 3 different IDs. I have noticed that 'ginfo' table is full of rows with duplicated IDs (or maybe even truplicated). As a result, in the output I obtain planty of not-wanted data. This is the query I have used:


select g.*, D.*
from ginfo g, Dhisn D
where
D.pID1 = g.pID1 and
D.pID2 = g.pID2 and
D.pID3 = g.pID3


I have tried somethig like this:

select g.*, D.*
from ginfo g, Dhisn D
where
D.pID1 = g.pID1 and
D.pID2 = g.pID2 and
D.pID3 = g.pID3

in order to avoid duplicates in the output by does not work. what to do?

It is important to have all data from both columns in the output and with uniq IDs

thanks in advance
oliver

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-11-03 : 12:30:07
You have to know which is the logical primary key in each table.
Make sure that there is a primary key or unique constraint on those columns,
talk to the persosns responsible for the database.

If you are going to code around duplicates and other design faults,
then there is will be no end to your troubles.
Go to Top of Page

ovince
Starting Member

32 Posts

Posted - 2006-11-03 : 12:46:26
thank you for quick prompt

I made the databases by myself so I can modify both. What I should do?

Thanks
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-11-03 : 12:56:47
select distinct g.*, D.*
from ginfo g
inner join Dhisn D on D.pID1 = g.pID1 and D.pID2 = g.pID2 and D.pID3 = g.pID3

But you should only select the columns you really want.


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

ovince
Starting Member

32 Posts

Posted - 2006-11-03 : 13:36:04
hello Peter,


your proposition for query does not work in a way we want :)

This are the 2 tables with their columns:
1st
=====================
tableName: Dhisn
=====================
D4HdID ------------> primary key
pID1
pID2
pID3
D4n
D4nerr
HdA
HdAerr
SN



2nd
=====================
tableName: ginfo
=====================
catID -------------> primary key
pID1
pID2
pID3
ragar
decgar


The idea is to join two tables using their pID1,pID2 and pID3. But there are duplicated pID1, pID2 and pID3 in the ginfo table that makes a mess in the output.

So this does not work

select distinct g.*, D.*
from ginfo g
inner join Dhisn D on D.pID1 = g.pID1 and D.pID2 = g.pID2 and D.pID3 = g.pID3


thank you



Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-11-03 : 13:48:32
In the tables should the combination of pID1,pID2,pID3 be unique?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-11-03 : 14:35:42
most probably the pk is also identity... no wonder!


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-11-03 : 16:28:55
I knew that from the beginning, just trying to set things right.
There are workarounds to get the wanted resultset using distinct and group by,
but in many cases they are just not needed if the data model and constraints are set up properly.

Dirty data is like wading through mud, a correct data model is like carving through butter.
I think ovince is doing the former, when he could be doing the latter.

rockmoose
Go to Top of Page

ovince
Starting Member

32 Posts

Posted - 2006-11-03 : 18:15:08
>In the tables should the combination of pID1,pID2,pID3 be unique?

yes, the combination of pID1,pID2,pID3 shuold NOT repeat i.e. one combination of these numbers should corespond to ONE and ONLY ONE row.

Maybe I should clean it first. Not manually I hope because there are over million rows. Is there an easy way to clean it first?


Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-11-03 : 18:48:41
Run something like this:

delete ginfo
where catID in
(
select min(catID) from ginfo
group by pID1,pID2,pID3
having count(*) > 1
)

You may need to run the query several times if there are combinations with more than 2 occurrences.

PS.
Understand and test the query before running it.

PPS.
After cleaning the table, create a unique constraint on pID1,pID2,pID3 so that duplicates can't be entered in the future.

rockmoose
Go to Top of Page

samuelclay
Yak Posting Veteran

71 Posts

Posted - 2006-11-03 : 18:59:29
And as always, backup your data before deleting large amounts of data
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-11-03 : 19:06:37
quote:
Originally posted by samuelclay

And as always, backup your data before deleting large amounts of data



Yes, we don't want a sequel to this...
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-11-03 : 19:12:52
quote:
Originally posted by samuelclay

And as always, backup your data before deleting large amounts of data

Go to Top of Page

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2006-11-03 : 19:52:47

http://sqlteam.com/forums/topic.asp?TOPIC_ID=6256

Madhivanan

Failing to plan is Planning to fail
Go to Top of Page

ovince
Starting Member

32 Posts

Posted - 2006-11-04 : 00:32:13
hi

I think deleting helps. I did checking and seems to work ok


As a new user of the forum, I have noticed that the "insert image" option in a Post New Reply doest not work. Is it temporary?

thanks for help
oliver
Go to Top of Page
   

- Advertisement -