Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Slow Left Outer Join with "is NULL" !!!

Author  Topic 

iancuct
Yak Posting Veteran

73 Posts

Posted - 2006-08-03 : 09:44:11
I have 4 tables.
Table A,B,C, and D

Table A is left outer joined with B, C , and D on the same unique Key.

-->Then I have a Grouping, and a SUM.
-->Then I have a Having clause
-----------------------------------------------
Problem: The query returns fine (7 seconds), however if there are no rows in B, C, or D because of the Having clause filter(filter works the same with a where clause) The query DOESN'T return anything because b,c, or D brings back NULL

So to solve this I added "or Column is NULL" to the having/where filter, which solves the problem and brings me back all the necessary information

HOWEVER it takes 48minutes!!!
Here is the code:

----------------------------------------------------------
SELECT mr_jobs.job_num AS Job_Number,
mr_jobs.tbl_date AS Job_Date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc,
SUM(mr_invoices.invoice_amt) AS Cost,
SUM(mr_tasks.estimate) AS Ro_Estimate,
SUM(mr_po_tasks.estimate) AS Po_Estimate
FROM
mr_jobs LEFT OUTER JOIN
mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num LEFT OUTER JOIN
mr_invoices ON mr_jobs.job_num = mr_invoices.job_num LEFT OUTER JOIN
mr_tasks ON mr_jobs.job_num = mr_tasks.job_num

GROUP BY
mr_jobs.job_num,
mr_jobs.tbl_date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc,
mr_invoices.deleted,
mr_tasks.deleted,
mr_po_tasks.deleted,
mr_tasks.job_num,
mr_po_tasks.job_num
HAVING
(mr_invoices.deleted = 0 )
AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null) --when this red portion is added the query takes 48 minutes
AND (mr_po_tasks.deleted = 0 )

iancuct
Yak Posting Veteran

73 Posts

Posted - 2006-08-03 : 09:51:39
table A has 13,000 rows
table B has 44,000 rows
table C has 34,000 rows
table D has 7,000 rows

The join before the Grouping has about 6 Million rows

How can i make this query return faster then 48 minutes !!! I am looking to be under 30 seconds :).
Is there a problem with "is null" because if i take that out the query returns in 10 seconds, but of course its missing rows
Go to Top of Page

Q
Yak Posting Veteran

76 Posts

Posted - 2006-08-03 : 09:59:05
What if you change
'AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null)'
into
AND (isnull(mr_tasks.deleted,0) = 0)
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-08-03 : 10:10:36
You are grouping 11 columns, but are only showing 6 of them.

Remove
mr_invoices.deleted, 
mr_tasks.deleted,
mr_po_tasks.deleted,
mr_tasks.job_num,
mr_po_tasks.job_num
from the GROUP BY clause. Because you are doing a LEFT JOIN, you are grouping columns that potentially can be NULL.

Also put all the HAVING statements in a WHERE statement, because they will filter faster then.


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-08-03 : 10:15:47
Try this code
SELECT		mr_jobs.job_num AS Job_Number,
mr_jobs.tbl_date AS Job_Date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc,
SUM(mr_invoices.invoice_amt) AS Cost,
SUM(mr_tasks.estimate) AS Ro_Estimate,
SUM(mr_po_tasks.estimate) AS Po_Estimate
FROM mr_jobs
LEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num
LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num
LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num
WHERE mr_invoices.deleted = 0
AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null)
AND mr_po_tasks.deleted = 0
GROUP BY mr_jobs.job_num,
mr_jobs.tbl_date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc

Peter Larsson
Helsingborg, Sweden
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-08-03 : 10:16:32
And try this one
SELECT		mr_jobs.job_num AS Job_Number,
mr_jobs.tbl_date AS Job_Date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc,
SUM(mr_invoices.invoice_amt) AS Cost,
SUM(mr_tasks.estimate) AS Ro_Estimate,
SUM(mr_po_tasks.estimate) AS Po_Estimate
FROM mr_jobs
LEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num AND mr_po_tasks.deleted = 0
LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num AND mr_invoices.deleted = 0
LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num AND mr_tasks.deleted = 0
GROUP BY mr_jobs.job_num,
mr_jobs.tbl_date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc

Peter Larsson
Helsingborg, Sweden
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-08-03 : 10:18:17
What indexes do you have? make sure the job_num column in all four tables are indexed.


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-08-03 : 10:30:12
quote:
Originally posted by Q

What if you change
'AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null)'
into
AND (isnull(mr_tasks.deleted,0) = 0)
That is actually slower.


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

Q
Yak Posting Veteran

76 Posts

Posted - 2006-08-03 : 10:31:29
@peso: Never knew you could define any restrictions within the join part. Think it will be very usefull!

Think the last solution will speed up a lot!
Go to Top of Page

iancuct
Yak Posting Veteran

73 Posts

Posted - 2006-08-03 : 10:53:10
Peso Thanks a lot, your solution runs in 32 seconds :)
Waw!!! from 48 minutes to 32 seconds :)

here is the solution which is your last solution post:

SELECT mr_jobs.job_num AS Job_Number,
mr_jobs.tbl_date AS Job_Date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc,
SUM(mr_invoices.invoice_amt) AS Cost,
SUM(mr_tasks.estimate) AS Ro_Estimate,
SUM(mr_po_tasks.estimate) AS Po_Estimate
FROM mr_jobs
LEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num AND mr_po_tasks.deleted = 0
LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num AND mr_invoices.deleted = 0
LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num AND mr_tasks.deleted = 0
GROUP BY mr_jobs.job_num,
mr_jobs.tbl_date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc

Yeah i didn't know you can put filters in the join :)
Go to Top of Page

Q
Yak Posting Veteran

76 Posts

Posted - 2006-08-03 : 11:01:55
48 minutes to 32 seconds! That's some progress!!!!
Go to Top of Page

iancuct
Yak Posting Veteran

73 Posts

Posted - 2006-08-03 : 11:08:13
Actually I have another question for Peso.

YOur first solution returns in 20 seconds
Your second solution returns in 32 seconds

Q1: Why you don't need to ask if .delted is null When you put the filter along with the left join
Q2: Why Solution 1 is faster, regardless that its performing an additional filter ".deleted is null" for every row compared with solution 2

Solution 1 with additional is null in the where clause for every table, returns 13231 rows

SELECT mr_jobs.job_num AS Job_Number,
mr_jobs.tbl_date AS Job_Date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc,
SUM(mr_invoices.invoice_amt) AS Cost,
SUM(mr_tasks.estimate) AS Ro_Estimate,
SUM(mr_po_tasks.estimate) AS Po_Estimate
FROM mr_jobs
LEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num
LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num
LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num
WHERE mr_invoices.deleted = 0
AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null)
AND mr_po_tasks.deleted = 0
GROUP BY mr_jobs.job_num,
mr_jobs.tbl_date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc

Solution 2 returns 13467 rows

SELECT mr_jobs.job_num AS Job_Number,
mr_jobs.tbl_date AS Job_Date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc,
SUM(mr_invoices.invoice_amt) AS Cost,
SUM(mr_tasks.estimate) AS Ro_Estimate,
SUM(mr_po_tasks.estimate) AS Po_Estimate
FROM mr_jobs
LEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num AND mr_po_tasks.deleted = 0
LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num AND mr_invoices.deleted = 0
LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num AND mr_tasks.deleted = 0
GROUP BY mr_jobs.job_num,
mr_jobs.tbl_date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc
Go to Top of Page

iancuct
Yak Posting Veteran

73 Posts

Posted - 2006-08-03 : 11:15:20
solution 1 with additional "is null"

SELECT mr_jobs.job_num AS Job_Number,
mr_jobs.tbl_date AS Job_Date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc,
SUM(mr_invoices.invoice_amt) AS Cost,
SUM(mr_tasks.estimate) AS Ro_Estimate,
SUM(mr_po_tasks.estimate) AS Po_Estimate
FROM mr_jobs
LEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num
LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num
LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num
WHERE (mr_invoices.deleted = 0 or mr_invoices.deleted is null)
AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null)
AND (mr_po_tasks.deleted = 0 or mr_po_tasks.deleted is null)
GROUP BY mr_jobs.job_num,
mr_jobs.tbl_date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc

Returns in 14 seconds on SQL server 2005 after initial return. :)

Thanks a lot everyone
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2006-08-03 : 11:52:08
quote:
Originally posted by iancuct

solution 1 with additional "is null"

SELECT mr_jobs.job_num AS Job_Number,
mr_jobs.tbl_date AS Job_Date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc,
SUM(mr_invoices.invoice_amt) AS Cost,
SUM(mr_tasks.estimate) AS Ro_Estimate,
SUM(mr_po_tasks.estimate) AS Po_Estimate
FROM mr_jobs
LEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num
LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num
LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num
WHERE (mr_invoices.deleted = 0 or mr_invoices.deleted is null)
AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null)
AND (mr_po_tasks.deleted = 0 or mr_po_tasks.deleted is null)
GROUP BY mr_jobs.job_num,
mr_jobs.tbl_date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc

Returns in 14 seconds on SQL server 2005 after initial return. :)

Thanks a lot everyone



This will not return the same results -- you need to put the criteria in each JOIN expression. other than checking for NULL, you should never reference an outer query in your WHERE or HAVING clause, otherwise the effect will be that of an INNER JOIN.

If you use "or xx is null" in your WHERE clause on an outer query, then rows will be filtered out that you did not intend. In fact, when you compared the two methods of doing this, you verified this -- the criteria in the WHERE clause returned less rows than the criteria in the JOIN expression.

Why?

It looks like you want to return all items from the mr_jobs table and any corresponding matching data in the outer tables where deleted = 0. If there are matching rows for each job where deleted=0 *or* where there are *no* matching entries at all in the outer table, your results come back fine. But what if there are *only* matches where deleted <> 0 in the outer tables? That condition does not satisfy your WHERE clause and the entire job is therefore not returned by the query, even though you seem to be indicating that you want all jobs to be returned.



- Jeff
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-08-04 : 02:24:12
I agree Jeff.

If there were only WHERE xxx.deleted = 0 in the WHERE clause, it it much faster to replace LEFT JOIN with INNER JOIN, since you filter the records WHERE xxx.deleted = 0. Not NULL (unknown), just zero. That indicates that the rows must match with jobnum columns.

Hence I wrote to answers because I can't guess which logic poster iancuct wanted.


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-08-04 : 02:29:23
quote:
Originally posted by iancuct

Thanks a lot everyone
Which is faster for you?

Solution 1
SELECT		mr_jobs.job_num AS Job_Number,
mr_jobs.tbl_date AS Job_Date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc,
SUM(mr_invoices.invoice_amt) AS Cost,
SUM(mr_tasks.estimate) AS Ro_Estimate,
SUM(mr_po_tasks.estimate) AS Po_Estimate
FROM mr_jobs
LEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num
LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num
LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num
WHERE (mr_invoices.deleted = 0 or mr_invoices.deleted is null)
AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null)
AND (mr_po_tasks.deleted = 0 or mr_po_tasks.deleted is null)
GROUP BY mr_jobs.job_num,
mr_jobs.tbl_date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc
or solution 2
SELECT		mr_jobs.job_num AS Job_Number,
mr_jobs.tbl_date AS Job_Date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc,
SUM(mr_invoices.invoice_amt) AS Cost,
SUM(mr_tasks.estimate) AS Ro_Estimate,
SUM(mr_po_tasks.estimate) AS Po_Estimate
FROM mr_jobs
LEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num AND mr_po_tasks.deleted = 0
LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num AND mr_invoices.deleted = 0
LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num AND mr_tasks.deleted = 0
GROUP BY mr_jobs.job_num,
mr_jobs.tbl_date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc
Does both queries return the same number of records? Post the times for the two solutions. If there are discrepancies, I think this is due to inproper indexing.


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-08-04 : 02:37:00
quote:
Originally posted by Q

48 minutes to 32 seconds! That's some progress!!!!

That's good news any day.
But still, 6 million rows from 4 tables (each about 20000-40000 rows) should not take about 30 seconds to JOIN. 4-8 second is normal in my environment.


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2006-08-04 : 09:03:26
quote:
Originally posted by Peso

quote:
Originally posted by iancuct

Thanks a lot everyone
Which is faster for you?

Solution 1
SELECT		mr_jobs.job_num AS Job_Number,
mr_jobs.tbl_date AS Job_Date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc,
SUM(mr_invoices.invoice_amt) AS Cost,
SUM(mr_tasks.estimate) AS Ro_Estimate,
SUM(mr_po_tasks.estimate) AS Po_Estimate
FROM mr_jobs
LEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num
LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num
LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num
WHERE (mr_invoices.deleted = 0 or mr_invoices.deleted is null)
AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null)
AND (mr_po_tasks.deleted = 0 or mr_po_tasks.deleted is null)
GROUP BY mr_jobs.job_num,
mr_jobs.tbl_date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc
or solution 2
SELECT		mr_jobs.job_num AS Job_Number,
mr_jobs.tbl_date AS Job_Date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc,
SUM(mr_invoices.invoice_amt) AS Cost,
SUM(mr_tasks.estimate) AS Ro_Estimate,
SUM(mr_po_tasks.estimate) AS Po_Estimate
FROM mr_jobs
LEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num AND mr_po_tasks.deleted = 0
LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num AND mr_invoices.deleted = 0
LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num AND mr_tasks.deleted = 0
GROUP BY mr_jobs.job_num,
mr_jobs.tbl_date,
mr_jobs.asset_name,
mr_jobs.location,
mr_jobs.status,
mr_jobs.work_desc
Does both queries return the same number of records? Post the times for the two solutions. If there are discrepancies, I think this is due to inproper indexing.


Peter Larsson
Helsingborg, Sweden



It should be irrelevant which one is faster; they are not equivalent and do not return the same results.

- Jeff
Go to Top of Page

iancuct
Yak Posting Veteran

73 Posts

Posted - 2006-08-04 : 14:41:56
Yes the statements are not equivalent :) I agree.

The correct one would be the one with the:

LEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num AND mr_po_tasks.deleted = 0
LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num AND mr_invoices.deleted = 0
LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num AND mr_tasks.deleted = 0

because this will bring back everything from table A and any matching rows from B,C, or D.

the where clause would not bring back correct results because of the same reason jsmith8858 explained

P.S. I have changed the deleted field form a small int to a bit
and now the query returns in 10 to 12 seconds.
Reply to PESO : "4-8 second is normal in my environment"
me : my environment might be a little slower then yours (slower computer/harddrives ) :)

Thanks everyone again
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-08-04 : 15:20:13
Jeff, I can see that this is true for multiple outer joins.

But is this also true when having only one outer join? Is there then a difference between the two methods?
Or when there are multiple outer joins, but all outer joins have the same predictability, ie if all outer joins contains equal number of matching rows? For example main HouseTable having three outer joined tables named Kitchen, Bedroom and Toilet? For every House there is at least one matching in every outer joined table.


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2006-08-04 : 16:33:21
Even for a single OUTER JOIN, they are not equivalent.

Suppose I want to return *all* players and the number of home runs they hit in 2005, if any.

select P.Player, S.HomeRuns
from Players P
left outer join Stats S on P.Player = S.Player
where s.Year = 2005 or s.Year is null

What's wrong with that?

What happens if the player has stats for 2004 only?

The left outer join results in 1 row from the stats from 2004 being returned ... Now, look at there WHERE clause: S.Year is not null and s.Year doesn't equal 2005, so the row is filtered out and the player is not displayed.

If you really want to return *all* players and their stats, if any, from 2005, then you need to write:

select P.Player, S.HomeRuns
from Players P
left outer join Stats S on P.Player = S.Player and S.Year = 2005
where s.Year = 2005 or s.Year is null

Does this make sense? if not, let me know, I can write up some quick DDL/DML with an example.


- Jeff
Go to Top of Page
    Next Page

- Advertisement -