Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Newbie help w/sub-querying...

Author  Topic 

youngchea
Starting Member

4 Posts

Posted - 2014-10-14 : 20:48:28
Hello,

This is my first post on the board. I would say I am b/w beginner/intermediate level when it comes to SQL. I use Hive/Hadoop and know that I should start to integrate sub-querying into my queries to make them run more efficiently and faster.

Below is a query that I use often as a template, often just modifying the day, order and truenumber. Would be great if you could assist me in explaining how to set up a sub-query for this.

select "betatest", order, age, gender, geo, count (1), count (distinct rdz.id)
from analytics1 rdz
join analytics_2 alvds on (rdz.id = alvds.id)
left outer join analytics3 spd on (alvds.zipcode = spd.zip)
where day >= '2014-08-01'
and day <= '2014-09-30'
and order = 317228536
and truenumber = 540192016
group by "betatest",order,age, gender,geo;

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-10-15 : 08:38:53
What do you want the subquery to do?
Go to Top of Page

youngchea
Starting Member

4 Posts

Posted - 2014-10-15 : 10:03:31
I want it to do the same thing, but written in a way so that it will run faster
Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-10-15 : 10:16:46
Using a subquery won't make it run faster. If you want better performance, look at the query plan. See if there are any missing indexes and add any that are missing.
Go to Top of Page

youngchea
Starting Member

4 Posts

Posted - 2014-10-15 : 11:52:27
quote:
Originally posted by gbritton

Using a subquery won't make it run faster. If you want better performance, look at the query plan. See if there are any missing indexes and add any that are missing.



Hmmm the engineers at my company advise me to use sub-queries so the queries run faster and more efficiently.

I have some queries where I added sub-queries and they do run MUCH faster.
Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-10-15 : 12:55:24
Well your engineers sound inexperienced! I'd love to see some before and after copies of queries where replacing some parts with subqueries helped. I'm willing to bet that it'll be an apples to oranges comparison.

In the query you posted, if the join columns and the columns in the WHERE clause are covered by indexes, you will already have an efficient execution plan. If they're not covered by an index, make it so!

BTW, have you studied the query plan yet?
Go to Top of Page

youngchea
Starting Member

4 Posts

Posted - 2014-10-15 : 14:17:50
quote:
Originally posted by gbritton

Well your engineers sound inexperienced! I'd love to see some before and after copies of queries where replacing some parts with subqueries helped. I'm willing to bet that it'll be an apples to oranges comparison.

In the query you posted, if the join columns and the columns in the WHERE clause are covered by indexes, you will already have an efficient execution plan. If they're not covered by an index, make it so!

BTW, have you studied the query plan yet?



havent studied query plan, can you point me in the right direction?

in terms of the queries, see below

w/o sub-quey
select "TUV", count(distinct listener)
from alvds
where day like '2014-07%'
and (country_code <> 'XX' and country_code <> 'YY'))a
group by "TUV"
;


with sub-query
select "TUV", count(*)
from
(select distinct listener_id
from alvds
where day like '2014-07%'
and (country_code <> 'XX' and country_code <> 'YY'))a
group by "TUV"
;
Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-10-15 : 14:32:34
Looking at your samples, I suspect that the second query (with subquery) might perform worse, not better. That's because the subquery reads through the data and produces a subset for the first query to consume. Hence two potential passes through the data. The first query does it all in one go. Note that, in both cases, the DISTINCT keyword may cause SQL to add a SORT operation to the plan, unless column listener is unique (but then you wouldn't need DISTINCT)

However, SQL will make up its own mind about how to process the query using indexes and statistics, which is why you need to look at the plan.

To see the plan , hit Ctrl-L in SSMS. They look complicated at first, but are not bad once you get used to them.

BTW, the best advice I ever received about writing queries is:

1. Write a query as simply as possible.
2. Don't waste time on optimization unless performance problems arise. (Don't try to chop off 10 seconds if the query runs once a week!)
3. If you must optimize, benchmark performance before and after to see what you gained with your effort.
Go to Top of Page
   

- Advertisement -