Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Carleon123
Starting Member
9 Posts |
Posted - 2004-07-13 : 15:55:12
|
| Hello everyone, I was hoping someone would be able to help me come up with a solution for the task at hand. I'm sorry if a similar topic exists elsewhere, I was not able to find one. The situation is as follows:I need to create reports on our company's sales data based on selected criteria. The problem I have is that there are two main tables being used (this years sales data and last years) that are about 10,000,000 records each. On top of that, I need to join them with about 4-5 lookup tables before querying them. At the moment, I am just using an inline sql query to do this. I am aware that this probably not even close to the best option, but I do not know what is. The current query may take a number of minutes (2-3) before returning the data (which, btw, is not a big set since all the fields are Sum values except the GROUP BY field).Could anyone suggest ways of seriously improving these query speeds? (Sorry if this is a very simple question for most of you).Any help is greatly appreciated!Mike |
|
|
kselvia
Aged Yak Warrior
526 Posts |
Posted - 2004-07-13 : 16:00:53
|
| Post the query you are using and we will see what we can do.--KenYour Kung-Fu is not strong. -- 'The Core' |
 |
|
|
Carleon123
Starting Member
9 Posts |
Posted - 2004-07-13 : 16:23:02
|
Well, here is a slightly simplified version of the query (it does not include last year's data). This one, by itself is running very slow, I can only imagine how much slower it would be when UNIONized with last years.SELECT dbo.lu_branchesTbl.sales_centre, SUM(dbo.Mike_SalesCYCN.volume) AS total_vol, SUM(dbo.Mike_SalesCYCN.cogs) AS total_cogs, SUM(dbo.Mike_SalesCYCN.dnnsi) AS total_nsi, SUM(dbo.Mike_SalesCYCN.dngp) AS total_gp, SUM(dbo.Mike_SalesCYCN.cma) AS total_cmaFROM dbo.lu_channelsTbl INNER JOIN dbo.Mike_SalesCYCN INNER JOIN dbo.lu_carbonatedTbl ON dbo.Mike_SalesCYCN.carbonated = dbo.lu_carbonatedTbl.carbonated ON dbo.lu_channelsTbl.channel_key = dbo.Mike_SalesCYCN.channel_key INNER JOIN dbo.lu_5brandsTbl ON dbo.Mike_SalesCYCN.brand_key = dbo.lu_5brandsTbl.brnd_key INNER JOIN dbo.lu_bustypeTbl ON dbo.Mike_SalesCYCN.bustyp = dbo.lu_bustypeTbl.business_type INNER JOIN dbo.lu_branchesTbl ON dbo.Mike_SalesCYCN.branch_key = dbo.lu_branchesTbl.branch_key INNER JOIN dbo.lu_packagesTbl ON dbo.Mike_SalesCYCN.pkg_key = dbo.lu_packagesTbl.package_key INNER JOIN dbo.lu_smf2003xTbl ON dbo.Mike_SalesCYCN.carbonated = dbo.lu_smf2003xTbl.carbonated AND dbo.Mike_SalesCYCN.pkg_key = dbo.lu_smf2003xTbl.[Package Code] AND dbo.Mike_SalesCYCN.brand_key = dbo.lu_smf2003xTbl.[Brand Code]GROUP BY dbo.lu_branchesTbl.sales_centre Did I say 4-5 JOINS? There are 7 here - and thats after de-normalizing 3 of the look_up tables. One of the bigger obstacles, I think, is the lu_smf2003xtbl table because the values of the corresponding field of that table has to be multiplied to the main sales data to get the proper results. I.e. for a region of, say, Ontario, the sales data will have to be multiplied by a different value than for Quebec.I hope I'm explaining things well enough, please let me know if I am leaving anything out! Thanks again. |
 |
|
|
eyechart
Master Smack Fu Yak Hacker
3575 Posts |
Posted - 2004-07-13 : 16:46:30
|
| do you have single column indexes on all the keys you join on? |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2004-07-13 : 16:52:20
|
| What does the execution plan show you?And which are the 10 million row tables?Brett8-) |
 |
|
|
Carleon123
Starting Member
9 Posts |
Posted - 2004-07-13 : 17:21:51
|
quote: Originally posted by eyechart do you have single column indexes on all the keys you join on?
No, I don't believe so. I have just started working on this project (inherited it, if you will) and there are many things I still need to do - such as indexing.quote: And which are the 10 million row tables?
The Mikes_SalesCYCN is the main data table (10 million rows). The other is Mikes_SalesPYCN (which is the same structure with Past Years data) - and is not included in the query above. |
 |
|
|
gwhiz
Yak Posting Veteran
78 Posts |
Posted - 2004-07-13 : 17:47:02
|
| I would start with indexing especially with the 10 million row table make sure that every column used in the joins has an index built. |
 |
|
|
Carleon123
Starting Member
9 Posts |
Posted - 2004-07-15 : 13:02:18
|
Alright, all the fields used have been indexed. I also simplified the query a bit to the following:SELECT dbo.lu_branches.branch_name, SUM(dbo.SalesCYCN.cases) AS total_vol, SUM(dbo.SalesCYCN.cost) AS total_cogs, SUM(dbo.SalesCYCN.DeadNet_NSI) AS total_dnnsi, SUM(dbo.SalesCYCN.DeadNet_GP) AS total_gp, SUM(dbo.SalesCYCN.CMA_on_inv) AS total_cmaFROM dbo.lu_channels INNER JOIN dbo.SalesCYCN ON dbo.lu_channels.channel_key = dbo.SalesCYCN.channel_key INNER JOIN dbo.lu_brands ON dbo.SalesCYCN.brand_key = dbo.lu_brands.brand_key INNER JOIN dbo.lu_bustype ON dbo.SalesCYCN.bustyp = dbo.lu_bustype.bustyp INNER JOIN dbo.lu_branches ON dbo.SalesCYCN.branch_key = dbo.lu_branches.branch_key INNER JOIN dbo.lu_packages ON dbo.SalesCYCN.pkg_key = dbo.lu_packages.pkg_keyGROUP BY dbo.lu_branches.branch_name Here is the execution plan for the above query: |--Compute Scalar(DEFINE:([Expr1012]=If ([globalagg1018]=0) then NULL else [globalagg1020], [Expr1013]=If ([globalagg1022]=0) then NULL else [globalagg1024], [Expr1014]=If ([globalagg1026]=0) then NULL else [globalagg1028], [Expr1015]=If ([globalagg1030] |--Stream Aggregate(GROUP BY:([lu_branches].[branch_name]) DEFINE:([globalagg1018]=SUM([partialagg1017]), [globalagg1020]=SUM([partialagg1019]), [globalagg1022]=SUM([partialagg1021]), [globalagg1024]=SUM([partialagg1023]), [globalagg1026]=SUM([parti |--Sort(ORDER BY:([lu_branches].[branch_name] ASC)) |--Nested Loops(Inner Join, OUTER REFERENCES:([SalesCYCN].[bustyp])) |--Hash Match(Aggregate, HASH:([SalesCYCN].[bustyp], [lu_branches].[branch_name]), RESIDUAL:([SalesCYCN].[bustyp]=[SalesCYCN].[bustyp] AND [lu_branches].[branch_name]=[lu_branches].[branch_name]) DEFINE:([partialagg1017]=COUNT_BIG([Sa | |--Hash Match(Inner Join, HASH:([lu_channels].[channel_key])=([SalesCYCN].[channel_key])) | |--Index Scan(OBJECT:([Sales].[dbo].[lu_channels].[IX_lu_channels_1])) | |--Hash Match(Inner Join, HASH:([lu_brands].[brand_key])=([SalesCYCN].[brand_key])) | |--Index Scan(OBJECT:([Sales].[dbo].[lu_brands].[IX_lu_brands])) | |--Hash Match(Inner Join, HASH:([lu_packages].[pkg_key])=([SalesCYCN].[pkg_key])) | |--Index Scan(OBJECT:([Sales].[dbo].[lu_packages].[IX_lu_packages_1])) | |--Hash Match(Inner Join, HASH:([lu_branches].[branch_key])=([SalesCYCN].[branch_key])) | |--Index Scan(OBJECT:([Sales].[dbo].[lu_branches].[IX_lu_branches])) | |--Table Scan(OBJECT:([Sales].[dbo].[SalesCYCN])) |--Index Seek(OBJECT:([Sales].[dbo].[lu_bustype].[PK_lu_bustype]), SEEK:([lu_bustype].[bustyp]=[SalesCYCN].[bustyp]) ORDERED FORWARD) If I run the above query for the year to date (all rows), it takes about 2 1/2 minutes to run. This does not even include the last year sales data. If anyone has any ideas, I would be greatly appreciative! Feel free to suggest any ideas (such as Stored Procedurs, etc).Thanks again. |
 |
|
|
|
|
|
|
|