Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Extremely Slow TSQL

Author  Topic 

jonathans
Starting Member

40 Posts

Posted - 2008-04-23 : 04:05:06
Hey Guys

I have a huge speed issue on one or two of my SQL Tables. I have included the basic design below.

Structure
Id
ParentId
Name

Group
Id
ParentId
Name
Weight

Products
Id
Name

StructureProducts
StructureId
ProductId
Imported

StructureGroups
StructureId
GroupId

GroupProducts
GroupId
ProductId

AnswerDates
Id
AssessmentDate

Scores <-- This table is the slow table
AnswerDateId
StructureId
GroupId (nullable)
ProductId (nullable)
Score >= 0 && <= 100


Ok, Structures are the start of everything. Structures, have children. If a group/product is Linked to a parent or child structure then that group/product is visible along the structure tree flow path. Groups, like structure have children. And also like structures, if a group is given a product, then that product is visible through the structure tree flow path.

Example:
Earth [Structure]
- Asia [Structure]
--- China [Structure]
--- Japan [Structure]
----- Computer Stuff [Group]
------- Desktops [Group]
------- Servers [Group]
------- Laptops [Group]
--------- HP [Product]
--------- Dell [Product]
--------- Fujitsu [Product]
- Europe [Structure]
--- Germany [Structure]
----- Berlin [Structure]
--- Italy [Structure]
----- Rome [Structure]
----- Venice [Structure]
- America [Structure]
--- United States of America [Structure]
----- New York [Structure]
------- Computer Stuff [Group]
--------- Desktops [Group]
--------- Servers [Group]
--------- Laptops [Group]
----------- HP [Product]
----------- Dell [Product]
------- Home Stuff [Group]
--------- Kitchen Stuff [Group]
--------- bedroom Stuff [Group]
----- Washington [Structure]
------- Computer Stuff [Group]
--------- Desktops [Group]
--------- Servers [Group]
--------- Laptops [Group]
----------- HP [Product]
----------- Dell [Product]
----------- Acer [Product]
------- Home Stuff [Group]
--------- Kitchen Stuff [Group]
--------- bedroom Stuff [Group]
----- Chicago [Structure]
------- Computer Stuff [Group]
--------- Desktops [Group]
--------- Servers [Group]
--------- Laptops [Group]
------- Home Stuff [Group]
--------- Kitchen Stuff [Group]
--------- bedroom Stuff [Group]
- Africa [Structure]
--- South Africa [Structure]
----- Johannesburg [Structure]
------- Computer Stuff [Group]
--------- Desktops [Group]
--------- Servers [Group]
--------- Laptops [Group]
----------- Acer [Product]
------- Home Stuff [Group]
--------- Kitchen Stuff [Group]
--------- bedroom Stuff [Group]
----- Durban [Structure]
----- Capte Town [Structure]
- Australasia [Structure]

So the initial steps that happen (with regards to Scoring) are as follows.
1. Insert root score (which would be for a structure, a group, an answer date and either a product or no product
2. Score the next group up along the treeview, using the scores for the groups at the same level as the original group (0 score if no score exists).
3. Continue this till GroupTree is at root (parentid == null)
4. Using the next structure up along the treeview, repeat steps 2 & 3.
5. Continue steps 4 until Structuree is at root (parentid == null)

Example
Scoring a product for Johannesburg Acer Laptop would go as follows
1. Initial score for [Acer] product against Group [Laptop] for Johannesburg.
2. Calculate Score for all products (productid = null) against Laptop for Johannesburg
3. Calculate Score for [Acer] product against Group [Computer Stuff] for Johannesburg
4. Calculate Score for all products against Group [computer Stuff] for Johannesburg
5. Calculate score for [Acer] product against all root groups for Johannesburg
5.1. Group [Comptuer Stuff] and [Home Stuff]
6. Calculate score for all products against all root groups for Johannesburg
6.1. Group [Comptuer Stuff] and [Home Stuff]
7. Calculate score for [Acer] Product against Group Laptop for South Africa
8. Calculate Score for all products (productid = null) against Laptop for South Africa
9. Calculate Score for [Acer] product against Group [Computer Stuff] for South Africa
10. Calculate Score for all products against Group [computer Stuff] for South Africa
11. Calculate score for [Acer] product against all root groups for South Africa
11.1. Group [Comptuer Stuff] and [Home Stuff]
12. Calculate score for all products against all root groups for South Africa
12.1. Group [Comptuer Stuff] and [Home Stuff]
13. Calculate score for [Acer] Product against Group Laptop for Africa
14. Calculate Score for all products (productid = null) against Laptop for Africa
15. Calculate Score for [Acer] product against Group [Computer Stuff] for Africa
16. Calculate Score for all products against Group [computer Stuff] for Africa
17. Calculate score for [Acer] product against all root groups for Africa
17.1. Group [Comptuer Stuff] and [Home Stuff]
18. Calculate score for all products against all root groups for Africa
18.1. Group [Comptuer Stuff] and [Home Stuff]
etc. etc. etc...

This basicly coveres the concept behind the basic scoring methodology. Now the methodology splits into 2. The first Methodology 1, say it should do these calculations using the Exact same date as the original scored date. (Ie. if i do a score today, only scores on today will be used in the calculations). The other, Methodology 2, says that it should do the calculations on the latest available date. (Ie. If i do a score today, only scores from today and the latest before today will be used in the calculations).

Now to add another problem to this already complex process, is that each Group and each product within a structure can have either of the 2 scoring methodologies assigned to it. Also, products can only be scored against the structures and groups that they are assigned to. Ie, Acer exists in Laptop Group, in Johannesburg or South Africa or Africa, but doesnt exist in New York.

Ok, so now that i've explained briefly how this scoring works, let me get to the heart of the problem. Basicly its speed (can clearly see why), though the speed issue only comes
up in 1 Place. And that is where it has to look backwards for the latest available score for the required group, structure and product.

For this to happen i wrote a function
ALTER FUNCTION [dbo].[GetLatestAnswerDateId]
( @StructureId INT,
@GroupId INT,
@ProductId INT,
@AnswerDateId INT )
RETURNS INT
AS
BEGIN
DECLARE @Id INT
DECLARE @Date DATETIME

SELECT TOP 1 @Date = [Date]
FROM [dbo].[AnswerDate]
WHERE [Id] = ISNULL(@AnswerDateId, [Id])
ORDER BY [Date] DESC

SELECT TOP 1 @Id = ad.id--gs.[AnswerDateId]
FROM [dbo].[Scoring] gs
INNER JOIN [dbo].[AnswerDate] ad ON ad.Id = gs.AnswerDateId
WHERE [StructureId] = @StructureId
AND ISNULL([GroupId], -1) = ISNULL(@GroupId, -1)
AND ISNULL([ProductId], -1) = ISNULL(@ProductId, -1)
AND [Date] <= @Date
ORDER BY [Date] DESC

RETURN @Id
END

Now on small amounts of data (1000 rows or so) its quick, though that is due to the fact that the data is minimal, but on large amounts of data this function runs for along time. Specificly in the context of the following when there is 6 months of scoring data (100 000+ rows) to peruse.

SELECT [StructureId], [GroupId], [AnswerDateId], [ProductId], [Score]
FROM [Scoring]
WHERE AnswerDateId = GetLatestAnswerDateId([Structure], [GroupId], [ProductId], null)
AND [StructureId] = South Africa
AND [GroupId] = Computer Stuff
AND [ProductId] = Acer



Any idea's on how to make this quick? or quicker?

My Current runtime for calculating the 2500 base scores (totals 100 000+- rows) takes 15 hours. Though this is an initial calculation and is only supposed to be done once.
Also, this calculations are all correct, so my only issue itself is the speed of the entire process.

Thanks In Adance

Jonathan

WARNING: Running on cold coffee!

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-23 : 04:06:33
Use a recursive CTE.



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

jonathans
Starting Member

40 Posts

Posted - 2008-04-23 : 04:09:58
As said, the speed isnt around the looping so much as a single set of query (at the bottom of this post). As for the execution, i am using a SQL CLR to do the the scoring, though, the CLR kicks off a stored procedure for each methodology.

Ie. CLR Procedure loops up through the trees, but at each level it kicks off another stored procedure based on the required scoring methodology (the basic query is the one at the bottom of post).

WARNING: Running on cold coffee!
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-23 : 04:14:09
Maybe if you follow these advices in this blog post
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

you will get quicker answer?



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-23 : 04:20:14
Also, instead of SELECT TOP 1 ... ORDER BY (twice) in the function, I think you can replace with the new ROW_NUMBER() built-in function.



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

jonathans
Starting Member

40 Posts

Posted - 2008-04-23 : 05:19:40
Could you explain to me how to use the ROW_NUMBER function to select the top 1 available date? Sorry trying to read up on it, but not getting my head around it.

Thanks

WARNING: Running on cold coffee!
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-23 : 05:34:25
ROW_NUMBER() OVER (PARTITION BY {some col(s)} ORDER BY {some datecol} DESC) AS RecID

And then filter for RecID = 1 to get "first" (maximum date) for each partition.
It is all there in Books Online.



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

bfoster
Starting Member

30 Posts

Posted - 2008-04-23 : 14:30:36
I'd suggest making sure you have appropriate indexes and rewriting the query to make better use of indexes.

IF (@AnswerDateId IS NULL)
SELECT MAX([Date]) FROM dbo.AnswerDate
ELSE
SELECT MAX([Date]) FROM dbo.AnswerDate WHERE Id = @AnswerDateId

IF (@GroupId IS NOT NULL AND @ProductId IS NOT NULL)
SELECT TOP 1 @Id = ad.id
FROM [dbo].[Scoring] gs
INNER JOIN [dbo].[AnswerDate] ad ON ad.Id = gs.AnswerDateId
WHERE [StructureId] = @StructureId
AND [GroupId] = @GroupId
AND [ProductId] = @ProductId
AND [Date] <= @Date
ORDER BY [Date] DESC
ELSE IF (@GroupId IS NOT NULL AND @ProductId IS NULL)
-- fill in the query
ELSE IF (@GroupId IS NULL AND @ProductId IS NOT NULL)
-- fill in the query
ELSE
SELECT TOP 1 @Id = ad.id
FROM [dbo].[Scoring] gs
INNER JOIN [dbo].[AnswerDate] ad ON ad.Id = gs.AnswerDateId
WHERE [StructureId] = @StructureId
AND [GroupId] IS NULL
AND [ProductId] IS NULL
AND [Date] <= @Date
ORDER BY [Date] DESC
Go to Top of Page
   

- Advertisement -