Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Darn OUTER JOIN gives duplicates...grrr

Author  Topic 

CactusJuice
Starting Member

46 Posts

Posted - 2003-03-16 : 15:00:17
This query gives me duplicate records for any Plan that has more than one drawing. I understand "why"--it's a LEFT OUTER so that's including everything. And I need that because some Plans do not have drawings but I still want to return them in the list.

I have a copy of "Guru's Guide to Transact SQL" and from it I've tried the idea of derived tables. I'm thinking that some type of sub query on "DrawingFile", like a TOP 1 or DISTINCT might be my answer but all I get are errors

Cameron
-------------

DECLARE @pPlanID int
DECLARE @pPlanName varchar(50)

SELECT DISTINCT p.PlanID, p.PlanName, pd.DrawingFile
FROM sh_tblPlans p
LEFT OUTER JOIN sh_tblPlanDrawings pd ON p.PlanID = pd.PlanID
WHERE ((@pPlanID IS Null) OR (p.PlanID = @pPlanID))
AND ((@pPlanName IS Null) OR (p.PlanName LIKE '%' + @pPlanName + '%'))

nr
SQLTeam MVY

12543 Posts

Posted - 2003-03-16 : 15:53:04
Which DrawingFile do you want for the plan?

DECLARE @pPlanID int
DECLARE @pPlanName varchar(50)

SELECT DISTINCT p.PlanID, p.PlanName, max(pd.DrawingFile)
FROM sh_tblPlans p
LEFT OUTER JOIN sh_tblPlanDrawings pd ON p.PlanID = pd.PlanID
WHERE ((@pPlanID IS Null) OR (p.PlanID = @pPlanID))
AND ((@pPlanName IS Null) OR (p.PlanName LIKE '%' + @pPlanName + '%'))
group by p.PlanID, p.PlanName

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

CactusJuice
Starting Member

46 Posts

Posted - 2003-03-16 : 16:32:27
Nigel,

Gosh! I read through these forums for several hours the past couple days. But I never thought about GROUP BY...grrr. Thanks! I was stuck trying derived columns and sub queries. Guess I was over thinking it

Good question. Only one DrawingFile per plan can be the primary (true) and that's the one I prefer. So using your very helpful code I made this final addition:

LEFT OUTER JOIN sh_tblPlanDrawings pd ON p.PlanID = pd.PlanID AND pd.PrimaryDrawing = 1

thanks again,

Cameron


quote:

Which DrawingFile do you want for the plan?

Go to Top of Page

CactusJuice
Starting Member

46 Posts

Posted - 2005-03-10 : 12:22:13
I have this problem again but there are too many columns to use GROUP BY. So I tried.

SELECT ColA1, ColA2, ColA3...ColA20, ColB2
FROM TableA
LEFT OUTER JOIN TableB
ON TableA.ColA = (SELECT ColB1, MIN(ColB2) FROM TableB GROUP BY ColB1, ColB2)

But I still get a group by error saying all columns must be included. And I'm selecting over 20 columns so too may to include in GROUP. All that I need is one result for TableB...which may have zero to many matching records.
Go to Top of Page

PW
Yak Posting Veteran

95 Posts

Posted - 2005-03-10 : 13:33:18
Join to a Derived Table (aliased as "dt" in this example), to avoid having to Group BY

SELECT ColA1, ColA2, ColA3...ColA20, ColB2
FROM TableA
LEFT OUTER JOIN
(
SELECT KeyCol, <plus whatever columns you need>
FROM TableB
GROUP BY <whatever to get same cardinality as TableA>
) dt
On TableA.ColA = dt.KeyCol



Go to Top of Page

CactusJuice
Starting Member

46 Posts

Posted - 2005-03-10 : 13:54:54
Thanks. I wound up doing a TOP 1 instead of the MIN...so I could avoid GROUP BY.

LEFT OUTER JOIN (SELECT TOP 1 ISNULL(ColB2, -7) AS ColB2, ColB1 FROM TableB) b ON TableA.ColA1 = TableB.ColB2
Go to Top of Page

mr_mist
Grunnio

1870 Posts

Posted - 2005-03-11 : 05:52:10
quote:
Originally posted by CactusJuice

Thanks. I wound up doing a TOP 1 instead of the MIN...so I could avoid GROUP BY.

LEFT OUTER JOIN (SELECT TOP 1 ISNULL(ColB2, -7) AS ColB2, ColB1 FROM TableB) b ON TableA.ColA1 = TableB.ColB2




So long as you remember that that statement will not be guaranteed to give you the same results each time you run it for a given set of data.

-------
Moo. :)
Go to Top of Page

CactusJuice
Starting Member

46 Posts

Posted - 2005-03-11 : 13:08:10
Oh crumb! You are correct. The TOP 1 syntax did not turn out to give me the data I expected. However, using PW's syntax the query ran for over 10 hours and never finished. (There about 15 million records in TableA and 10 million records in TableB. Both tables are on a Oracle server. I'm using a linked server in EM.)

Does the subquery run once...and then become part of the join? If so, I don't understand why it takes so long. There are 6 tables total. All LEFT JOINS. If I take out tableB or put up with the duplicates then the query runs in approx 45 minutes.
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2005-03-11 : 14:15:48
you need to give us more information, including the structure of the tables involved (just relevant columns) and the SQL you have tried.

Break it down into parts, as PW suggested. Work on getting the numbers to be correct and just return things in terms of primary keys. Then, at the very end, add joins to tables needed to show descriptions or other descriptive columns.

Example: you have 3 tables: Customers, Sales, and Budget. You want 1 row per customer with customer name, address, total sales, and total budget for a date range. do it step by step:


select
c.CustomerName, c.Address1, C.State, b.TotalBudget, S.TotalSales
from
customers c
left outer join
(select customerID, sum(Sales) as TotalSales
from Sales
where SalesDate between ....
group by customerID) s
on
s.CustomerID = c.CustomerID
left outer join
(select customerID, sum(Budget) as TotalBudget
from Budgets
where BudgetDate between ....
group by customerID) b
on
b.CustomerID = c.CustomerID


Get the idea? work on 1 part at a time and join them all together at the end.

Also: why are you using all LEFT OUTER JOINS ? Do you really need them? If not, INNER JOINS are typically faster. Make sure you are comfortable with the difference between the join types and how they work.

- Jeff
Go to Top of Page

PW
Yak Posting Veteran

95 Posts

Posted - 2005-03-11 : 14:38:19
>>Both tables are on a Oracle server. I'm using a linked server in EM

Don't ya think important details like this should be stated up front, so people don't waste time thinking about & providing solutions that assume Sql Server tables ?

Since both tables are so large, you should probably haul them both over to the Sql Side first, or write a view on the Oracle side, and query the view instead of the tables.
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2005-03-11 : 14:39:18
>>>>Both tables are on a Oracle server. I'm using a linked server in EM

!!! I missed that .. yes, some information that may have been useful ....

- Jeff
Go to Top of Page

CactusJuice
Starting Member

46 Posts

Posted - 2005-03-11 : 14:58:36
Jeff, the query I'm testing with is very basic, just the CustomerID and Name column from TableA and CustomerID and Classification column from TableB. Table B can have 0 to many Classifications. So that's why I need the LEFT JOIN. And the other 4 tables will be the same way.

PW, Linked servers are new to me, so I didn't realize the soluion could vary then. Sorry!
Go to Top of Page

mr_mist
Grunnio

1870 Posts

Posted - 2005-03-14 : 05:28:48
quote:
Originally posted by CactusJuice
PW, Linked servers are new to me, so I didn't realize the soluion could vary then. Sorry!



It may be a similar solution, but the linked server situation would probably mean that the SQL server is trying to fetch the entire contents of both tables before it does the query on them, which is why the solution is so slow.

Depending on the situation, copying the tables across first by bulk import may be better.

-------
Moo. :)
Go to Top of Page
   

- Advertisement -