Author |
Topic |
jess
Starting Member
17 Posts |
Posted - 2006-12-14 : 10:53:23
|
Well the question is very long but the answer will probably be very quick...I work with SQL Server 2000 and have taught myself what I need to know to get my work done here, so I don't always know the details such as which methods are more efficient than others.For example, I have a stored procedure that generates a report. The user inputs several parameters and the output is basically a big list of clients that meet the requirements, and lots of other details about those clients.I want to add another parameter to give the user the option of having the report return the results for:1. Only kids (under 18 years old) 2. Only Adults (18 or over)3. All clients (regardless of age) (This is what the report currently does)The user will therefore enter "child", "adult", or "both" into this parameter.My question is where is the most efficient place to put this in the code?Some very simplified examples:Say the code is 100 lines long total before this update.Do I copy the code 3 times and put a parameter in the where clause of each to find the right group, and then place IF statements around the 3 sections? The total code would then be more like 300 lines and would be something like this:IF [parameter] = 'child' SELECT .............. FROM ................ WHERE ...............AND <client qualifies as being a child>ELSE IF [parameter] = 'adult' SELECT .............. FROM ................ WHERE ...............AND <client qualifies as being an adult>ELSE IF [parameter] = 'both' SELECT .............. FROM ................ WHERE ...............To check if the client qualifies as being a child or an adult, I use the datediff function to compare their date of birth (which is in one of our tables) to getdate().Another question - if I were to use the idea above where I repeat the code section 3 times, is it better (efficiency-wise) to put the code that checks if the client qualifies as a child or an adult in the WHERE statement like the example I gave, or is it better to tack it on to one of the JOIN statements where I'm joining the table that holds their birthday?Or instead of the example above, can I use a CASE statement in the WHERE clause instead and just keep 1 section of code so it would be closer to the original 100 lines of code and would be something like: SELECT .............. FROM ................ WHERE ............... AND CASE WHEN <parameter = child> THEN <client qualifies as being a child> WHEN <parameter = adult> THEN <client qualifies as being an adult> WHEN <parameter = both> THEN <no extra check needed> ENDAnother option would be to keep the code the same as it used to be except have it return 1 extra value - the client's age. Then have the program on the front-end (Crystal Reports) only display results based on the age that was returned.Are there other options on how to place this extra check into the report?If you say the front-end solution is the most efficient, then can you please also tell me what your answer would be if that wasn't an option? It would help me tremendously to know which back-end methods are more efficient than other ones for the countless other reports and queries I work on.Thanks!Jess |
|
JoeNak
Constraint Violating Yak Guru
292 Posts |
Posted - 2006-12-14 : 11:26:17
|
I'd avoid the 3 select statement method for maintainability reasons. I'd also avoid letting the front end filter the data as that is usually less efficient (report generation time).I'd go with a where clause. Something like...WHERE @Param = 'Both'OR (@Param = 'Adult' And <definition of adult>)OR (@Param = 'Child' And <definition of child>) |
 |
|
jess
Starting Member
17 Posts |
Posted - 2006-12-14 : 12:43:54
|
AAaaahhhh yes, exactly what I was looking for!Thanks a million JoeNak.Does anyone have any other general efficiency pointers for me?Things like case statements vs if statements...Or does it matter if you place conditional statement in the JOIN clause vs the WHERE clause?Say you're joining tables A and B on a field [id] and narrowing your results down by 3 other statements regarding data in table B. Is there a big efficiency difference between doing it like this:SELECT......FROM [table a] aJOIN [table b] bON a.id = b.idWHERE b.<condition1>AND b.<condition2>AND b.<condition3>compared to this:SELECT......FROM [table a] aJOIN [table b] bON a.id = b.idAND b.<condition1>AND b.<condition2>AND b.<condition3>just curious.Thanks!Jess |
 |
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2006-12-14 : 16:27:05
|
quote: SELECT......FROM [table a] aJOIN [table b] bON a.id = b.idWHERE b.<condition1>AND b.<condition2>AND b.<condition3>compared to this:SELECT......FROM [table a] aJOIN [table b] bON a.id = b.idAND b.<condition1>AND b.<condition2>AND b.<condition3>
No, those two are equivalent. when it doubt, write them both out, one after another in the same SQL batch, in Query Analyzer, and then look at the execution plan and see how they compare.For your other question, if Age is an indexed column, then it will be most efficient to do it like this:create procedure getdata @type varchar(100)asdeclare @minAge int;declare @maxAge int;set @minAge = 0;set @maxAge = 999;if @type = 'child' set @maxAge = 17if @type = 'adult' set @minAge = 18select * from yourtable where Age between @minAge and @maxAge That avoids any need for OR's in your WHERE clause, which do tend to decrease the efficiency of your SQL Statement, and it also keeps it very simple and only requires 1 SELECT. Sometimes you need to step back and approach the problem from a different direction.Hope this helps ...- Jeff |
 |
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2006-12-14 : 16:34:25
|
Backend is in general more efficient than frontend, and the database is usually way faster than any application layer (except when you are dealing with presentation issues like date-format and such). I'd probably do something like this (pseudo-code):IF Child Startdate = 1900 Enddate = 1988ELSE IF Adult Startdate = 1988 Enddate = 2100ELSE IF All Startdate = 1900 Enddate = 2100SELECT...WHERE BirthDate BETWEEN StartDate AND EndDate Should give excellent query caching and with an index on birthdate it would probably be quite fast as well.--Lumbago"Real programmers don't document, if it was hard to write it should be hard to understand" |
 |
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2006-12-14 : 16:35:54
|
Oh...jeff beat me to it! I'm just a slow typer --Lumbago"Real programmers don't document, if it was hard to write it should be hard to understand" |
 |
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2006-12-14 : 16:39:41
|
quote: Originally posted by Lumbago Oh...jeff beat me to it! I'm just a slow typer --Lumbago"Real programmers don't document, if it was hard to write it should be hard to understand"
You've got to learn to be a more efficient typist !!! - Jeff |
 |
|
jess
Starting Member
17 Posts |
Posted - 2006-12-15 : 12:47:33
|
Thank you for the additional suggestions. I certainly do like to see solutions coming from different angles.Jeff -- I tried your suggestion with the minAge and maxAge and the results seemed to be the same as when using JoeNak's. (both ran the query in 1 second...)So I tried to expand the query to be a little more complicated and in this case Joe's suggestion consistently runs at 5 seconds and the minAge & maxAge solution runs at 6 seconds (according to the clock in Query Analyzer)Here's what I got when running a trace in SQL Profiler while executing each of the 2 methods twice:Joe-1 cpu 5641reads 189089duration 4733Jeff-1cpu 5219reads 190179duration 6500Joe-2cpu 5734reads 189078duration 4796Jeff-2cpu 5204reads 190148duration 6436Just thought it was interesting.Thanks again for all of the great responses!Jess |
 |
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2006-12-15 : 13:44:32
|
Is Age an indexed column?- Jeff |
 |
|
jess
Starting Member
17 Posts |
Posted - 2006-12-15 : 15:03:20
|
Actually age is calculated in the stored procedure usingdatediff(yy, p.dob, getdate())I believe dob (date of birth) that is used in the calculation is an indexed field in the table. |
 |
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2006-12-15 : 15:39:33
|
Again, same concept: you will be best off calculating the range of date values and then filtering on that. Filtering on a calculated value cannot make use of any indexes, since the calculation must be performed on the entire table to determine which results are accepted by the WHERE clause.So, I recommend to try calculating the minDOB and maxDOB and filter on that DOB column directly, then see how efficiently that works.- Jeff |
 |
|
jess
Starting Member
17 Posts |
Posted - 2006-12-15 : 17:13:17
|
OK, so I tried what you said and calculated the minDOB and maxDOB and placed that in the code. Here's the results I get from the trace from running both solutions twice each:Joe-1cpu: 22236Reads: 186081Duration: 16263Jeff-1cpu: 12125Reads: 94999Duration: 19716Joe-2cpu: 22064Reads: 186015Duration: 15593Jeff-2cpu: 12141Reads: 94999Duration: 19390What I don't understand is why even tough it has less CPU usage and less READS, the solution from Jeff takes longer? (around 19 seconds instead of 16)Any ideas? |
 |
|
|