Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
iancuct
Yak Posting Veteran
73 Posts |
Posted - 2006-08-03 : 09:44:11
|
I have 4 tables.Table A,B,C, and DTable A is left outer joined with B, C , and D on the same unique Key.-->Then I have a Grouping, and a SUM.-->Then I have a Having clause-----------------------------------------------Problem: The query returns fine (7 seconds), however if there are no rows in B, C, or D because of the Having clause filter(filter works the same with a where clause) The query DOESN'T return anything because b,c, or D brings back NULLSo to solve this I added "or Column is NULL" to the having/where filter, which solves the problem and brings me back all the necessary informationHOWEVER it takes 48minutes!!!Here is the code:----------------------------------------------------------SELECT mr_jobs.job_num AS Job_Number, mr_jobs.tbl_date AS Job_Date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc, SUM(mr_invoices.invoice_amt) AS Cost, SUM(mr_tasks.estimate) AS Ro_Estimate, SUM(mr_po_tasks.estimate) AS Po_EstimateFROM mr_jobs LEFT OUTER JOINmr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num LEFT OUTER JOINmr_invoices ON mr_jobs.job_num = mr_invoices.job_num LEFT OUTER JOINmr_tasks ON mr_jobs.job_num = mr_tasks.job_numGROUP BY mr_jobs.job_num,mr_jobs.tbl_date,mr_jobs.asset_name,mr_jobs.location,mr_jobs.status,mr_jobs.work_desc,mr_invoices.deleted, mr_tasks.deleted,mr_po_tasks.deleted,mr_tasks.job_num,mr_po_tasks.job_num HAVING(mr_invoices.deleted = 0 )AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null) --when this red portion is added the query takes 48 minutes AND (mr_po_tasks.deleted = 0 ) |
|
iancuct
Yak Posting Veteran
73 Posts |
Posted - 2006-08-03 : 09:51:39
|
table A has 13,000 rowstable B has 44,000 rowstable C has 34,000 rowstable D has 7,000 rowsThe join before the Grouping has about 6 Million rowsHow can i make this query return faster then 48 minutes !!! I am looking to be under 30 seconds :).Is there a problem with "is null" because if i take that out the query returns in 10 seconds, but of course its missing rows |
 |
|
Q
Yak Posting Veteran
76 Posts |
Posted - 2006-08-03 : 09:59:05
|
What if you change 'AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null)' intoAND (isnull(mr_tasks.deleted,0) = 0) |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-08-03 : 10:10:36
|
You are grouping 11 columns, but are only showing 6 of them.Removemr_invoices.deleted, mr_tasks.deleted,mr_po_tasks.deleted,mr_tasks.job_num,mr_po_tasks.job_num from the GROUP BY clause. Because you are doing a LEFT JOIN, you are grouping columns that potentially can be NULL.Also put all the HAVING statements in a WHERE statement, because they will filter faster then.Peter LarssonHelsingborg, Sweden |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-08-03 : 10:15:47
|
Try this codeSELECT mr_jobs.job_num AS Job_Number, mr_jobs.tbl_date AS Job_Date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc, SUM(mr_invoices.invoice_amt) AS Cost, SUM(mr_tasks.estimate) AS Ro_Estimate, SUM(mr_po_tasks.estimate) AS Po_EstimateFROM mr_jobsLEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_numLEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_numLEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_numWHERE mr_invoices.deleted = 0 AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null) AND mr_po_tasks.deleted = 0GROUP BY mr_jobs.job_num, mr_jobs.tbl_date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc Peter LarssonHelsingborg, Sweden |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-08-03 : 10:16:32
|
And try this oneSELECT mr_jobs.job_num AS Job_Number, mr_jobs.tbl_date AS Job_Date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc, SUM(mr_invoices.invoice_amt) AS Cost, SUM(mr_tasks.estimate) AS Ro_Estimate, SUM(mr_po_tasks.estimate) AS Po_EstimateFROM mr_jobsLEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num AND mr_po_tasks.deleted = 0LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num AND mr_invoices.deleted = 0LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num AND mr_tasks.deleted = 0GROUP BY mr_jobs.job_num, mr_jobs.tbl_date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc Peter LarssonHelsingborg, Sweden |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-08-03 : 10:18:17
|
What indexes do you have? make sure the job_num column in all four tables are indexed.Peter LarssonHelsingborg, Sweden |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-08-03 : 10:30:12
|
quote: Originally posted by Q What if you change 'AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null)' intoAND (isnull(mr_tasks.deleted,0) = 0)
That is actually slower.Peter LarssonHelsingborg, Sweden |
 |
|
Q
Yak Posting Veteran
76 Posts |
Posted - 2006-08-03 : 10:31:29
|
@peso: Never knew you could define any restrictions within the join part. Think it will be very usefull! Think the last solution will speed up a lot! |
 |
|
iancuct
Yak Posting Veteran
73 Posts |
Posted - 2006-08-03 : 10:53:10
|
Peso Thanks a lot, your solution runs in 32 seconds :)Waw!!! from 48 minutes to 32 seconds :) here is the solution which is your last solution post:SELECT mr_jobs.job_num AS Job_Number, mr_jobs.tbl_date AS Job_Date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc, SUM(mr_invoices.invoice_amt) AS Cost, SUM(mr_tasks.estimate) AS Ro_Estimate, SUM(mr_po_tasks.estimate) AS Po_EstimateFROM mr_jobsLEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num AND mr_po_tasks.deleted = 0LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num AND mr_invoices.deleted = 0LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num AND mr_tasks.deleted = 0GROUP BY mr_jobs.job_num, mr_jobs.tbl_date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_descYeah i didn't know you can put filters in the join :) |
 |
|
Q
Yak Posting Veteran
76 Posts |
Posted - 2006-08-03 : 11:01:55
|
48 minutes to 32 seconds! That's some progress!!!! |
 |
|
iancuct
Yak Posting Veteran
73 Posts |
Posted - 2006-08-03 : 11:08:13
|
Actually I have another question for Peso.YOur first solution returns in 20 seconds Your second solution returns in 32 secondsQ1: Why you don't need to ask if .delted is null When you put the filter along with the left joinQ2: Why Solution 1 is faster, regardless that its performing an additional filter ".deleted is null" for every row compared with solution 2Solution 1 with additional is null in the where clause for every table, returns 13231 rowsSELECT mr_jobs.job_num AS Job_Number, mr_jobs.tbl_date AS Job_Date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc, SUM(mr_invoices.invoice_amt) AS Cost, SUM(mr_tasks.estimate) AS Ro_Estimate, SUM(mr_po_tasks.estimate) AS Po_EstimateFROM mr_jobsLEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_numLEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_numLEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_numWHERE mr_invoices.deleted = 0 AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null) AND mr_po_tasks.deleted = 0GROUP BY mr_jobs.job_num, mr_jobs.tbl_date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_descSolution 2 returns 13467 rowsSELECT mr_jobs.job_num AS Job_Number, mr_jobs.tbl_date AS Job_Date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc, SUM(mr_invoices.invoice_amt) AS Cost, SUM(mr_tasks.estimate) AS Ro_Estimate, SUM(mr_po_tasks.estimate) AS Po_EstimateFROM mr_jobsLEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num AND mr_po_tasks.deleted = 0LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num AND mr_invoices.deleted = 0LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num AND mr_tasks.deleted = 0GROUP BY mr_jobs.job_num, mr_jobs.tbl_date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc |
 |
|
iancuct
Yak Posting Veteran
73 Posts |
Posted - 2006-08-03 : 11:15:20
|
solution 1 with additional "is null"SELECT mr_jobs.job_num AS Job_Number, mr_jobs.tbl_date AS Job_Date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc, SUM(mr_invoices.invoice_amt) AS Cost, SUM(mr_tasks.estimate) AS Ro_Estimate, SUM(mr_po_tasks.estimate) AS Po_EstimateFROM mr_jobsLEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_numLEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_numLEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_numWHERE (mr_invoices.deleted = 0 or mr_invoices.deleted is null) AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null) AND (mr_po_tasks.deleted = 0 or mr_po_tasks.deleted is null)GROUP BY mr_jobs.job_num, mr_jobs.tbl_date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_descReturns in 14 seconds on SQL server 2005 after initial return. :)Thanks a lot everyone |
 |
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2006-08-03 : 11:52:08
|
quote: Originally posted by iancuct solution 1 with additional "is null"SELECT mr_jobs.job_num AS Job_Number, mr_jobs.tbl_date AS Job_Date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc, SUM(mr_invoices.invoice_amt) AS Cost, SUM(mr_tasks.estimate) AS Ro_Estimate, SUM(mr_po_tasks.estimate) AS Po_EstimateFROM mr_jobsLEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_numLEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_numLEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_numWHERE (mr_invoices.deleted = 0 or mr_invoices.deleted is null) AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null) AND (mr_po_tasks.deleted = 0 or mr_po_tasks.deleted is null)GROUP BY mr_jobs.job_num, mr_jobs.tbl_date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_descReturns in 14 seconds on SQL server 2005 after initial return. :)Thanks a lot everyone
This will not return the same results -- you need to put the criteria in each JOIN expression. other than checking for NULL, you should never reference an outer query in your WHERE or HAVING clause, otherwise the effect will be that of an INNER JOIN.If you use "or xx is null" in your WHERE clause on an outer query, then rows will be filtered out that you did not intend. In fact, when you compared the two methods of doing this, you verified this -- the criteria in the WHERE clause returned less rows than the criteria in the JOIN expression.Why?It looks like you want to return all items from the mr_jobs table and any corresponding matching data in the outer tables where deleted = 0. If there are matching rows for each job where deleted=0 *or* where there are *no* matching entries at all in the outer table, your results come back fine. But what if there are *only* matches where deleted <> 0 in the outer tables? That condition does not satisfy your WHERE clause and the entire job is therefore not returned by the query, even though you seem to be indicating that you want all jobs to be returned.- Jeff |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-08-04 : 02:24:12
|
I agree Jeff.If there were only WHERE xxx.deleted = 0 in the WHERE clause, it it much faster to replace LEFT JOIN with INNER JOIN, since you filter the records WHERE xxx.deleted = 0. Not NULL (unknown), just zero. That indicates that the rows must match with jobnum columns.Hence I wrote to answers because I can't guess which logic poster iancuct wanted.Peter LarssonHelsingborg, Sweden |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-08-04 : 02:29:23
|
quote: Originally posted by iancuct Thanks a lot everyone
Which is faster for you?Solution 1SELECT mr_jobs.job_num AS Job_Number, mr_jobs.tbl_date AS Job_Date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc, SUM(mr_invoices.invoice_amt) AS Cost, SUM(mr_tasks.estimate) AS Ro_Estimate, SUM(mr_po_tasks.estimate) AS Po_EstimateFROM mr_jobsLEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_numLEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_numLEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_numWHERE (mr_invoices.deleted = 0 or mr_invoices.deleted is null) AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null) AND (mr_po_tasks.deleted = 0 or mr_po_tasks.deleted is null)GROUP BY mr_jobs.job_num, mr_jobs.tbl_date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc or solution 2SELECT mr_jobs.job_num AS Job_Number, mr_jobs.tbl_date AS Job_Date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc, SUM(mr_invoices.invoice_amt) AS Cost, SUM(mr_tasks.estimate) AS Ro_Estimate, SUM(mr_po_tasks.estimate) AS Po_EstimateFROM mr_jobsLEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num AND mr_po_tasks.deleted = 0LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num AND mr_invoices.deleted = 0LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num AND mr_tasks.deleted = 0GROUP BY mr_jobs.job_num, mr_jobs.tbl_date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc Does both queries return the same number of records? Post the times for the two solutions. If there are discrepancies, I think this is due to inproper indexing.Peter LarssonHelsingborg, Sweden |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-08-04 : 02:37:00
|
quote: Originally posted by Q 48 minutes to 32 seconds! That's some progress!!!!
That's good news any day.But still, 6 million rows from 4 tables (each about 20000-40000 rows) should not take about 30 seconds to JOIN. 4-8 second is normal in my environment.Peter LarssonHelsingborg, Sweden |
 |
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2006-08-04 : 09:03:26
|
quote: Originally posted by Peso
quote: Originally posted by iancuct Thanks a lot everyone
Which is faster for you?Solution 1SELECT mr_jobs.job_num AS Job_Number, mr_jobs.tbl_date AS Job_Date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc, SUM(mr_invoices.invoice_amt) AS Cost, SUM(mr_tasks.estimate) AS Ro_Estimate, SUM(mr_po_tasks.estimate) AS Po_EstimateFROM mr_jobsLEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_numLEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_numLEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_numWHERE (mr_invoices.deleted = 0 or mr_invoices.deleted is null) AND (mr_tasks.deleted = 0 or mr_tasks.deleted is null) AND (mr_po_tasks.deleted = 0 or mr_po_tasks.deleted is null)GROUP BY mr_jobs.job_num, mr_jobs.tbl_date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc or solution 2SELECT mr_jobs.job_num AS Job_Number, mr_jobs.tbl_date AS Job_Date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc, SUM(mr_invoices.invoice_amt) AS Cost, SUM(mr_tasks.estimate) AS Ro_Estimate, SUM(mr_po_tasks.estimate) AS Po_EstimateFROM mr_jobsLEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num AND mr_po_tasks.deleted = 0LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num AND mr_invoices.deleted = 0LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num AND mr_tasks.deleted = 0GROUP BY mr_jobs.job_num, mr_jobs.tbl_date, mr_jobs.asset_name, mr_jobs.location, mr_jobs.status, mr_jobs.work_desc Does both queries return the same number of records? Post the times for the two solutions. If there are discrepancies, I think this is due to inproper indexing.Peter LarssonHelsingborg, Sweden
It should be irrelevant which one is faster; they are not equivalent and do not return the same results.- Jeff |
 |
|
iancuct
Yak Posting Veteran
73 Posts |
Posted - 2006-08-04 : 14:41:56
|
Yes the statements are not equivalent :) I agree.The correct one would be the one with the:LEFT JOIN mr_po_tasks ON mr_jobs.job_num = mr_po_tasks.job_num AND mr_po_tasks.deleted = 0LEFT JOIN mr_invoices ON mr_jobs.job_num = mr_invoices.job_num AND mr_invoices.deleted = 0LEFT JOIN mr_tasks ON mr_jobs.job_num = mr_tasks.job_num AND mr_tasks.deleted = 0because this will bring back everything from table A and any matching rows from B,C, or D.the where clause would not bring back correct results because of the same reason jsmith8858 explainedP.S. I have changed the deleted field form a small int to a bit and now the query returns in 10 to 12 seconds.Reply to PESO : "4-8 second is normal in my environment" me : my environment might be a little slower then yours (slower computer/harddrives ) :) Thanks everyone again |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-08-04 : 15:20:13
|
Jeff, I can see that this is true for multiple outer joins.But is this also true when having only one outer join? Is there then a difference between the two methods?Or when there are multiple outer joins, but all outer joins have the same predictability, ie if all outer joins contains equal number of matching rows? For example main HouseTable having three outer joined tables named Kitchen, Bedroom and Toilet? For every House there is at least one matching in every outer joined table.Peter LarssonHelsingborg, Sweden |
 |
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2006-08-04 : 16:33:21
|
Even for a single OUTER JOIN, they are not equivalent.Suppose I want to return *all* players and the number of home runs they hit in 2005, if any.select P.Player, S.HomeRunsfrom Players Pleft outer join Stats S on P.Player = S.Playerwhere s.Year = 2005 or s.Year is nullWhat's wrong with that?What happens if the player has stats for 2004 only?The left outer join results in 1 row from the stats from 2004 being returned ... Now, look at there WHERE clause: S.Year is not null and s.Year doesn't equal 2005, so the row is filtered out and the player is not displayed.If you really want to return *all* players and their stats, if any, from 2005, then you need to write:select P.Player, S.HomeRunsfrom Players Pleft outer join Stats S on P.Player = S.Player and S.Year = 2005where s.Year = 2005 or s.Year is nullDoes this make sense? if not, let me know, I can write up some quick DDL/DML with an example.- Jeff |
 |
|
Next Page
|
|
|
|
|