Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
albertkohl
Aged Yak Warrior
740 Posts |
Posted - 2009-09-13 : 20:45:36
|
Hey guys, i'm trying to scale my DB server outward a little, however i'm still not getting the performance increase i was hopeing for.basically i have one giant table, (approx 200 columns wide, about220 million rows, and the database is only updated quarterly) that i have split into 10 smaller tables and linked off a UID column.i'm basically writing a web-interface where users can go to a site,and punch in criteria, the server then generates a dynamic query,and submits to the SQL DB for results. problems is, on large universes, the server is taking to long, for example, this is aquery that will result in about 67 million rows. server takes 6minutes to finish, i need it in 30sec or less... anything i can doto get counting to run faster?SELECT state, count(*) from (SELECT m.state, row_number() over(partition by phone order by m.uid desc) as recidFROM TEST.DBO.MAIN m Join TEST.dbo.CONTACT c on m.uid = c.uid Where 1=1 and c.Name_Type = 'P' ) as fwhere recid = 1group by state order by state also worth mentioning, our database has more then 1 record perhousehold, hence the partitoning by phones (thanks to PESO for that)Thanks in advance for all your help, you guys always seem to help me out of a bind! |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
albertkohl
Aged Yak Warrior
740 Posts |
Posted - 2009-09-13 : 23:48:51
|
i have this thing indexed every-which-way. it's on a 3TB array, so i'm not really worried about wasting space indexing, and because there are so many different combos our reps can query the database, i basically capture everything that runs though the site and run it though DTA on a weekly basis... (honestly, i'm not quite that good at writing my own indexes yet, i dont quite understand all the fine-tuning ins-and-outs of setting up indexing)an no, it outputs 52 records, with a count for each state, totaling up to 67 million, the SORT cost on this query for example is something in the area of about 64%. i forgot how to get export the execution plan to text or i would display that for you as well :( |
 |
|
albertkohl
Aged Yak Warrior
740 Posts |
Posted - 2009-09-13 : 23:51:44
|
btw, here's the index for this particular query. USE [NatimarkOnline]GO/****** Object: Index [_dta_index_Consumer_8_1819257636_52311312_K21_K62_K5_K20_K19_K24_K6_K2_163] Script Date: 09/13/2009 20:54:13 ******/CREATE NONCLUSTERED INDEX [_dta_index_Consumer_8_1819257636_52311312_K21_K62_K5_K20_K19_K24_K6_K2_163] ON [dbo].[consumer] ( [NAME_TYPE] ASC, [HOME_BUSINESS] ASC, [STATE] ASC, [DO_NOT_CALL] ASC, [PHONE] ASC, [LAST_NAME] ASC, [ZIP] ASC, [STREET] ASC)INCLUDE ( [uid]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]GO |
 |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
albertkohl
Aged Yak Warrior
740 Posts |
Posted - 2009-09-14 : 01:15:02
|
how do i get the execution plan in a format i can post here? isnt there a way to output it to text?as for the index, these are the same tables really, the consumer table is the big massive one, i have the same issue on either the test.dbo. tables, or the consumer table.i can post both though, just need to know how to export the execute plan to text. |
 |
|
albertkohl
Aged Yak Warrior
740 Posts |
Posted - 2009-09-14 : 17:06:19
|
hope this helps, looks like it's running a table scan on main, but the sort is still 42% of the workload. cutting that back will help ALOT... also, any reccommendations on indexing main? again, i'm totally nubbish when it comes to indexing. |--Compute Scalar(DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[globalagg1011],0))) |--Stream Aggregate(GROUP BY:([m].[STATE]) DEFINE:([globalagg1011]=SUM([partialagg1010]))) |--Sort(ORDER BY:([m].[STATE] ASC)) |--Parallelism(Gather Streams) |--Hash Match(Partial Aggregate, HASH:([m].[STATE]), RESIDUAL:([Test].[dbo].[MAIN].[STATE] as [m].[STATE] = [Test].[dbo].[MAIN].[STATE] as [m].[STATE]) DEFINE:([partialagg1010]=COUNT(*))) |--Filter(WHERE:([Expr1006]=(1))) |--Parallelism(Distribute Streams, RoundRobin Partitioning) |--Sequence Project(DEFINE:([Expr1006]=row_number)) |--Segment |--Parallelism(Gather Streams, ORDER BY:([m].[PHONE] ASC, [m].[uid] DESC)) |--Sort(ORDER BY:([m].[PHONE] ASC, [m].[uid] DESC)) |--Hash Match(Inner Join, HASH:([c].[uid])=([m].[uid]), RESIDUAL:([Test].[dbo].[CONTACT].[uid] as [c].[uid]=[Test].[dbo].[MAIN].[uid] as [m].[uid])) |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([c].[uid])) | |--Index Seek(OBJECT:([Test].[dbo].[CONTACT].[_dta_index_CONTACT_6_421576540_5370704_K1_K17] AS [c]), SEEK:([c].[NAME_TYPE]='P') ORDERED FORWARD) |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m].[uid])) |--Table Scan(OBJECT:([Test].[dbo].[MAIN] AS [m])) |
 |
|
albertkohl
Aged Yak Warrior
740 Posts |
Posted - 2009-09-14 : 19:26:27
|
Okay, i added an index to fix the last table scan: |--Table Scan(OBJECT:([Test].[dbo].[MAIN] AS [m]the Sort is still costing 53% of the query, that's what i'm trying to see if i can clenaup. unless you guys see something else.now this is the new execution: |--Compute Scalar(DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[globalagg1011],0))) |--Stream Aggregate(GROUP BY:([m].[STATE]) DEFINE:([globalagg1011]=SUM([partialagg1010]))) |--Sort(ORDER BY:([m].[STATE] ASC)) |--Parallelism(Gather Streams) |--Hash Match(Partial Aggregate, HASH:([m].[STATE]), RESIDUAL:([Test].[dbo].[MAIN].[STATE] as [m].[STATE] = [Test].[dbo].[MAIN].[STATE] as [m].[STATE]) DEFINE:([partialagg1010]=COUNT(*))) |--Filter(WHERE:([Expr1006]=(1))) |--Parallelism(Distribute Streams, RoundRobin Partitioning) |--Sequence Project(DEFINE:([Expr1006]=row_number)) |--Segment |--Parallelism(Gather Streams, ORDER BY:([m].[PHONE] ASC, [m].[uid] DESC)) |--Sort(ORDER BY:([m].[PHONE] ASC, [m].[uid] DESC)) |--Hash Match(Inner Join, HASH:([c].[uid])=([m].[uid]), RESIDUAL:([Test].[dbo].[CONTACT].[uid] as [c].[uid]=[Test].[dbo].[MAIN].[uid] as [m].[uid])) |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([c].[uid])) | |--Index Seek(OBJECT:([Test].[dbo].[CONTACT].[_dta_index_CONTACT_6_421576540_5370704_K1_K17] AS [c]), SEEK:([c].[NAME_TYPE]='P') ORDERED FORWARD) |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([m].[uid])) |--Index Scan(OBJECT:([Test].[dbo].[MAIN].[_Index_main_state_count] AS [m])) here are the indexes:CREATE NONCLUSTERED INDEX [_Index_main_state_count] ON [dbo].[MAIN] ( [STATE] ASC, [PHONE] ASC, [uid] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]CREATE NONCLUSTERED INDEX [_dta_index_CONTACT_6_421576540_5370704_K1_K17] ON [dbo].[CONTACT] ( [NAME_TYPE] ASC, [uid] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]GO |
 |
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2009-09-15 : 08:51:30
|
I think you need an index on MAIN with uid placed first....since it's part of the join.CREATE NONCLUSTERED INDEX [_Index_main_state_count] ON [dbo].[MAIN] ( [uid] ASC, [STATE] ASC, [PHONE] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]I'd also query why you have ... "order by m.uid desc" for a "summary-by-state report" it appears to add little value. Can you experiment and take it out?Given you have stats turned off...you may need to re-compute stats for the tables,incides after any major data changes. Sorting in TEMPDB may help. Again...Experiment? change one-factor, measure, repeat. |
 |
|
albertkohl
Aged Yak Warrior
740 Posts |
Posted - 2009-09-15 : 13:13:52
|
i'll check all those and get back to you, thanks ALOT. if you and tara could subscribe to this thread, i'd REALLY appreciate it, i'll post back the results soon as i can (probably tonight or tomorrow)this is a test database, so i can change alot w/o having to worry about users getting pissy, just takes a while with the tables being so large. as for re-computing stats, the database is STATIC, it gets dropped and recreated every 3months, so i wouldnt need to worry about re-computing correct? |
 |
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2009-09-16 : 12:04:16
|
re stats...I don't know...are the stats created/good before/after the data is loaded? |
 |
|
albertkohl
Aged Yak Warrior
740 Posts |
Posted - 2009-09-16 : 12:35:17
|
stats and indexes are dropped, data is imported, and then recreatedafter we update it every quarter... i'm shuffling data around still(having to make some free space) so i havnt had a chance to teststuff yet, i can tell you though that with the order by m.uid, therow_number() function requires the orderby clause, i believe that'swhy it's there. but i'm still gunna test it a little later, justwaiting for the server to finish it's workload so i can play :D |
 |
|
albertkohl
Aged Yak Warrior
740 Posts |
Posted - 2009-09-19 : 13:55:29
|
Okay, so i've tried the indexing w/ the UID first, w/ the sort in tempdb on and off, and it seems like the original index is the fastest...still running @ like 15 minutes though.... so heres my next question, lets simplify it, this query: SELECT state, count(*) from (SELECT m.state, row_number() over (partition by phone order by m.uid desc) as recid FROM TEST.DBO.mem_MAIN m ) as f where recid = 1 group by state order by state its still taking 10 minuteshere's the execution: |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[globalagg1007],0))) |--Stream Aggregate(GROUP BY:([m].[STATE]) DEFINE:([globalagg1007]=SUM([partialagg1006]))) |--Sort(ORDER BY:([m].[STATE] ASC)) |--Parallelism(Gather Streams) |--Hash Match(Partial Aggregate, HASH:([m].[STATE]), RESIDUAL:([Test].[dbo].[mem_MAIN].[STATE] as [m].[STATE] = [Test].[dbo].[mem_MAIN].[STATE] as [m].[STATE]) DEFINE:([partialagg1006]=COUNT(*))) |--Filter(WHERE:([Expr1003]=(1))) |--Parallelism(Distribute Streams, RoundRobin Partitioning) |--Sequence Project(DEFINE:([Expr1003]=row_number)) |--Segment |--Parallelism(Gather Streams, ORDER BY:([m].[PHONE] ASC, [m].[uid] DESC)) |--Sort(ORDER BY:([m].[PHONE] ASC, [m].[uid] DESC)) |--Index Scan(OBJECT:([Test].[dbo].[mem_MAIN].[_Index_mem_main_state_count] AS [m])) Again, the |--Sort(ORDER BY:([m].[PHONE] ASC, [m].[uid] DESC)) line is taking 72% of the batch....any ideas on how i can get a query like this back in 30 seconds or less? |
 |
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2009-09-22 : 04:24:58
|
Out of curiosity have you base-lined what a simple SELECT COUNT(*) from the main table would perform like? Is your expectation of the ability of the DB to process that amount of data too high for the hardware/volume of data involved?Is there any way the table could be re-designed to remove the duplicate information?Also...do you need "100%" accurate results to this query - is it an MIS/Approximate. Could you live with the duplicates being counted? Are they a large portion of the database? Could you live with a one-off snapshot of number/spread of the duplicates and apply those percentages to any "duplicate inclusive" numbers to produce a "very, very close to accurate" non-duplicate report...albeit in a far faster time than with the duplicates resolved?You may be between a rock and a hard-place. Consider some alternatives, however outlandish they may seem. |
 |
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-09-23 : 08:51:54
|
Can you time this query and also verify if it gives the same result? ->SELECT state, count(*) from (SELECT m.state, phone, max(m.uid) as recid FROM TEST.DBO.MAIN m Join TEST.dbo.CONTACT c on m.uid = c.uid and c.Name_Type = 'P' group by m.state, phone ) as fgroup by state order by state - Lumbago |
 |
|
albertkohl
Aged Yak Warrior
740 Posts |
Posted - 2009-09-23 : 17:28:22
|
Andrew:the initial count takes about 1 minutes, however after that, every query for select count (*) takes about 2 seconds.as for the db server, it's got 32gb of memory, dual xeon 64x quad cords, and about 6 hard drives (1 array for os, 1 for logs/sql install, 1 for data store) so it's fairly beast i think. maybe i'm wrong. as for how accuarate it would have to be, i would think if we go no more then .5% would be accepatable, however bosses my cry a little... i would really just have to see what happens when we start filling our requests and if customers complain.Lumbago:your query took about 1/2 the time mine did. and that's a 3 query average HOWEVER it also produced 30,000 more records then mine. however the universe is like 225M so it's about .45% off (that's less the 1/2% percent)so whats the logic behind yours vs mine? and any way we can speed it up a little more? |
 |
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-09-24 : 03:32:10
|
Ok, the increased performance is mostly due to not using the row_number() function I would guess since you actually only need the highest "uid" for each phone number...right? I'd think that just doing MAX() is cheaper than first creating an ordered list and then choose the first in the order. I also moved the "c.Name_Type = 'P'" from the where clause to the join condition which is where I suspect the extra 30k rows are coming from. You should try and put it back in the where-clause and see if you get the right results. I'm a bit rusty on the indexing part but you should definitely have an index on both MAIN and CONTACT with the uid as the first column in the index. Can you also post the execution plan for my query with the indexes I proposed...?- Lumbago |
 |
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2009-09-24 : 03:34:14
|
Peso should really be a part of this discussion..Peso; where are you man?? - Lumbago |
 |
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2009-09-24 : 10:39:59
|
"the initial count takes about 1 minutes, however after that, every query for select count (*) takes about 2 seconds."This would seem to put a floor of 1min on how good your performance can get - and thus maybe imply that your initial requirement of 30secs may be unrealistic. The 2 second bit seems to indicate the benefit of CACHE. For true repeatable timings use the following between repetitions of code executing....unless you plan to run an automated job at the start of each day to geet the data into CACHE.DBCC DROPCLEANBUFFERSDBCC FREEPROCCACHEOne other option to look at may be "partitioning", where you can split the table/database over multiple spindles.Any if you can answer Lumbago's last request, it may help us more. |
 |
|
albertkohl
Aged Yak Warrior
740 Posts |
Posted - 2009-09-24 : 14:28:19
|
okay, so couple things, 1) I'm setting up the UID column to be thePK. while doing this, I have to re-index everything anyway because Ineed to drop and re-create the tables. so I'm setting the indexesall up to start w/ UID... I'll probably have that all done in a day or two, soon as I do I'llpost some more stuff. Lumb: I'm still off qty wise w/ your query vs mine even when Ichanged the name_type to the where clause. I'm kinda curious whythough.Andrew: the two dbcc commands, is that something that's gunna take awhile to run every day? I would have no problem (I don't think)running that every day at like 12:00am, are there any back-draws todoing so?as for partitioning, I kinda tried that w/ distributed-federatedviews, worked like crap. how about maybe doing it locally only, likehave for example:mem_contact1 with a name_type='P' constraint, and then amem_contact2 with a name_type<>'P' constraint, the massive majorityof our queries have the name_type='P' select so in that case I canignore the other crapoloa and weed down the universe of resultsbecause the table will ONLY have P's in it, is that right?what else? come on, hit me! I'm fired up now :D |
 |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
Previous Page&nsp;
Next Page
|
|
|
|
|