Return to Writing Outer Joins in T-SQL
Writing Outer Joins in T-SQL
Written by Kathi Kellenberger on 11 February 2008
Occasionally someone will ask for my help with a query and say that both a right outer join and a left outer join was tried, and still the expected results were not achieved. That made me realize that some developers do not completely understand outer joins and that an article explaining how to use them might help.
Inner Join Review
The most commonly used join is an INNER JOIN. This type of join
combines rows from two tables only when they match on the joining
condition. Usually the primary key from one table matches a foreign key
on another table, but join conditions can be more complex than that.
(Note: Most of the information in this article can be applied to views
as well as tables. For simplicity, the word "table" will be used to
mean table or view unless stated otherwise. Keys are not defined on
views, but the underlying table’s key column or columns are often
included in the view. To keep things simple, let's assume that is the
case.)
INNER JOIN will retrieve a results row only where there is a perfect
match between the two tables in the join condition. You will also often
see one row from one of the tables matching multiple rows in the other
table. For example, one customer can have many orders. One order can
have many order details. The data on the one side will be repeated for
each row on the many side. The following query is an example showing
how the information from the Sales.SalesOrderHeader is repeated on each
matching row:
SELECT s.SalesOrderID, OrderDate,ProductID
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS d ON s.SalesOrderID = d.SalesOrderID
ORDER BY s.SalesOrderID, ProductID
Outer Join Introduction
OUTER JOIN is used to join two tables even if there is not a match.
An OUTER JOIN can be used to return a list of all the customers and the
orders even if no orders have been placed for some of the customers. A
keyword, RIGHT or LEFT, is used to specify which side of the join
returns all possible rows. I like using LEFT because it makes sense to
me to list the most important table first. Except for one example
demonstrating RIGHT OUTER JOIN, this article will use left joins. Just
a note: the keywords INNER and OUTER are optional.
The next example returns a list of all the customers and the SalesOrderID for the orders that have been placed, if any.
SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
It uses the LEFT keyword because the Sales.Customer table is located on
the left side and we want all rows returned from that table even if
there is no match in the Sales.SalesOrderHeader table. This is an
important point. Notice also that the CustomerID column is the primary
key of the Sales.Customer table and a foreign key in the
Sales.SalesOrderHeader table. This means that there must be a valid
customer for every order placed. Writing a query that returns all
orders and the customers if they match doesn’t make sense. The LEFT
table should always be the primary key table when performing a LEFT
OUTER JOIN.
If the location of the tables in the query are switched, the RIGHT keyword is used and the same results are returned:
SELECT c.CustomerID, s.SalesOrderID
FROM Sales.SalesOrderHeader s
RIGHT OUTER JOIN Sales.Customer c ON c.CustomerID = s.CustomerID
Notice that I didn’t change the join condition at all. It doesn’t
matter which side of the equal sign the columns are listed; only where
the tables are named is it important.
If I have a LEFT OUTER JOIN, what is returned from the table on the
right side of the join where there is not a match? Each column from the
right side will return a NULL. Try this query which lists the
non-matching rows first:
SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
ORDER BY s.SalesOrderID
By adding a WHERE clause to check for a NULL SalesOrderID, you can
find all the customers who have not placed an order. My copy of
AdventureWorks returns 66 customers with no orders:
SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
WHERE s.SalesOrderID IS NULL
Occasionally, you will need to be more specific. How can you find
all the customers who have not placed an order in 2002? There are
several ways to solve this problem. You could create a view of all the
orders placed in 2002 and join the view on the Sales.Customer table.
Another option is to create a CTE, or Common Table Expression, of the
orders placed in 2002. This example shows how to use a CTE to get the
required results:
WITH s AS
( SELECT SalesOrderID, customerID
FROM Sales.SalesOrderHeader
WHERE OrderDate between '1/1/2002' and '12/31/2002'
)
SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c
LEFT OUTER JOIN s ON c.customerID = s.customerID
WHERE s.SalesOrderID IS NULL
My favorite technique to solve this problem is much simpler.
Additional criteria, in this case filtering on the OrderDate, can be
added to the join condition. The query joins all customers to the
orders placed in 2002. Then the results are restricted to those where
there is no match. This query will return exactly the same results as
the previous, more complicated query:
SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
and s.OrderDate between '1/1/2002' and '12/31/2002'
WHERE s.SalesOrderID IS NULL
Using Aggregates with Outer Joins
Aggregate queries introduce another pitfall watch out for. The following example is an attempt to list all the customers
and the count of the orders that have been placed. Can you spot the problem?
SELECT c.CustomerID, count(*) OrderCount
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
GROUP BY c.CustomerID
ORDER BY OrderCount
Now the customers with no orders look like they have placed one order. That is because this query
is counting the rows returned. To solve this problem, count the SalesOrderID column. NULL values
are eliminated from the count.
SELECT c.CustomerID, count(SalesOrderID) OrderCount
FROM Sales.Customer c LEFT OUTER JOIN Sales.SalesOrderHeader s
ON c.CustomerID = s.CustomerID
GROUP BY c.CustomerID
ORDER BY OrderCount
Multiple Joins
Once more than two tables are involved in the query, things get a
bit more complicated. When a table is joined to the RIGHT table, a LEFT
OUTER JOIN must be used. That is because the NULL rows from the RIGHT
table will not match any rows on the new table.
An INNER JOIN causes the non-matching rows to be eliminated from the
results. If the Sales.SalesOrderDetail table is joined to the
Sales.SalesOrderHeader table and an INNER JOIN is used, none of the
customers without orders will show up. NULL cannot be joined to any
value, not even NULL.
To illustrate this point, when I add the Sales.SalesOrderDetail table
to one of the previous queries that checked for customers without
orders, I get back no rows at all.
SELECT c.CustomerID, s.SalesOrderID, d.SalesOrderDetailID
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
INNER JOIN Sales.SalesOrderDetail d ON s.SalesOrderID = d.SalesOrderID
WHERE s.SalesOrderID IS NULL
To get correct results, change the INNER JOIN to a LEFT JOIN.
SELECT c.CustomerID, s.SalesOrderID, d.SalesOrderDetailID
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
LEFT OUTER JOIN Sales.SalesOrderDetail d ON s.SalesOrderID = d.SalesOrderID
WHERE s.SalesOrderID IS NULL
What about additional tables joined to Sales.Customer, the table on the left?
Must outer joins be used? If it is possible that there are some rows without matches,
it must be an outer join to guarantee that no results are lost. The Sales.Customer table
has a foreign key pointing to the Sales.SalesTerritory table. Every customer’s territory
ID must match a valid value in Sales.SalesTerritory. This query returns 66 rows as expected
because it is impossible to eliminate any customers by joining to Sales.SalesTerritory:
SELECT c.CustomerID, s.SalesOrderID, t.Name
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
INNER JOIN Sales.SalesTerritory t ON c.TerritoryID = t.TerritoryID
WHERE SalesOrderID IS NULL
Sales.SalesTerritory is the primary key table; every customer must match a valid territory.
If you wanted to write a query that listed all territories, even those that had no customers,
an outer join will be used. This time, Sales.Customers is on the right side of the join.
SELECT t.Name, CustomerID
FROM Sales.SalesTerritory t
LEFT OUTER JOIN Sales.Customer c ON t.TerritoryID = c.TerritoryID
Conclusion
Queries with outer joins can be tricky to write. Extra time and care must be spent making
sure the results are correct. Think about the relationship between the tables and make
sure that the outer join is continued down the path. This article covered
almost everything you need to know about outer joins.
|