Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Awreilly
Starting Member
2 Posts |
Posted - 2010-01-16 : 01:48:05
|
| Hi,I'm attempting to join two tables, and have discovered that for no apparent reason, one of the tables has appeared to have doubled about 50% of the rows in the table. I don't have much experience in SqlServer 2005, but I was wondering if there was some way to insert Distinct into a join so that when the tables join (They have the same primary key), the data duplication in one of the tables doesn't carry over to the join. Throwing the word distinct after the ON in the join just brings up a syntax error.A few notes:--These tables are quite large, and they were automatically set up and automatically derived data from a program that is integral to our company. Deleting the extra rows in not an option, because not only are the tables large, but I've already seen that messing with the data in SqlServer directly can cause that particular program to stop functioning in unexpected and non-sensical ways.The basic structure isSELECT <horde of fields from both tables>FROM emdbuser.Summary as sLEFT OUTER JOIN emdbuser.Borrower as B ON s.Guid = b.Guid(b.Guid being the one where the table has 50% more data than it should)Thanks. |
|
|
sql-programmers
Posting Yak Master
190 Posts |
Posted - 2010-01-16 : 03:56:06
|
| Hi,You need to add DISTINCT in the follolwing place,SELECT DISTINCT <horde of fields from both tables>FROM emdbuser.Summary as sLEFT OUTER JOIN emdbuser.Borrower as BON s.Guid = b.GuidIf you still get the problem, give the example records and column names. It will be easier to answer.SQL Server Programmers and Consultantshttp://www.sql-programmers.com/ |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-01-16 : 05:09:42
|
| Personally I would not add a DISTINCT as that will just hide the problem (and give SQL 50% more work to do!)I think your problem is that for each GUID in your [Summary] table there are two matching rows in [Borrower] with the same GUID value. (Actually there may be some GUIDs that match 1:1 and others that match 1:many, giving the impression of 50%??)If the multiple records, for a given GUID value, in [Borrower] are identical you presumably need to remove the duplicates in that table, and fix whatever allowed them to get there. (If the figure is exactly 50% this is the likely scenario)If the records are not identical maybe you need to add an extra column in your join - perhaps [Borrower] has one Active records for each GUID and some Inactive / Archive records for the same GUID? Thus adding "AND B.IsActive = 1" to the JOIN would fix that |
 |
|
|
Awreilly
Starting Member
2 Posts |
Posted - 2010-01-21 : 18:56:53
|
| This is what I've tried writing:select distinct b.Guid, BorrowerFirstName, BorrowerLastName, LoanNumber, LoanFolder, LoanAmount, Address1, Zip, City, Lender, EscrowVendor, TitleVendor, CoBorrowerFirstName, CoBorrowerLastName, AppraisalVendor, LoanPurpose, LienPosition, LastModified,DateFileOpened, LoanProcessorName, HomePhone, WorkPhone, CellPhonefrom emdbuser.LoanSummary sleft outer join emdbuser.LoanBorrowers b ON s.Guid = b.Guidwhere LoanFolder != '(Trash)'order by s.BorrowerLastNameThe result pulls 8643 rows (Note without the distinct it pulls 9012 rows). However, if I runSELECT distinct Guid from emdbuser.LoanBorrowers there are only 6096 rows (There are also 6096 items in LoanSummary, and I'm sure that this number is correct). So using just distinct for some reason isn't fully working.Deleting out the duplicated Guids isn't really feasible because there's 3000 of them, and the program that runs off of SQLServer (our loan origination software) has a tendency to glitch up badly if I start deleting things directly in Sqlserver. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-01-22 : 01:13:49
|
"I'm sure that this number is correct). So using just distinct for some reason isn't fully working."I expect that is because the rows with duplicate GUIDs are not identical in the other fields you are using.If your GUIDs are supposed to be unique then the problem is that something (e.g. your loan origination software) is creating multiple rows for a given GUID.Either that or there are supposed to be duplciates, and your query needs to choose the correct one, of the duplciates, to join to - maybe some are marked as Archived or somesuch?This will show you the number of rows, and the number of different guidsSELECT COUNT(*), COUNT(DISTINCT Guid) from emdbuser.LoanSummarySELECT COUNT(*), COUNT(DISTINCT Guid) from emdbuser.LoanBorrowerand then to find some rows to compare:SELECT TOP 100 [T_GUID]=guid, *FROM emdbuser.LoanBorrowersWHERE guid IN( SELECT guid FROM emdbuser.LoanBorrowers GROUP BY guid HAVING COUNT(*) > 1)ORDER BY guid and compare adjacent rows to see which columns are different for rows with the same GUID |
 |
|
|
|
|
|
|
|