Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Help with query

Author  Topic 

dgburton1
Starting Member

20 Posts

Posted - 2008-06-30 : 00:01:17
Hi,

I would really appretiate some help with the following query :)

The question I have to answer is:

"Get the average price of all products in the given category" (a 'category' can contain multiple products)

The query results are to be used to render a graph.

Here is a link to a PDF of the database schema:
http://www.boltfile.com/directdownload/db.pdf

Essential 4 tables are involved, every Product has a Category, and every Product has multiple RawProducts (these represent instances of that Product stocked in multiple shops). A process scans the prices of the RawProducts at each shop once per day, and the RawProductPriceHistory logs when the price of a RawProduct changes, a row is only added to this table when the price changes - so we don't have one entry per day if the price has not changed, for exmaple:

RawProductPriceHistoryId RawProductId Date UnitPrice ProductId
1 37 10/06/08 2.22 1
2 20 21/06/08 3.10 2
3 20 22/06/08 2.21 2
4 36 29/06/08 2.19 1

Essentially the query I'm after has to return the average price of all products in category over time, I'm thinking that this would have to, for each row in the RawProductPriceHistory table, average price price of all RawProducts at this date for the associated Product, then do that for each date where there was an entry in RawProductPriceHistory...

Previously visakh16 helped me with an SP to return the prices for each RawProduct for a given Product for a given date - http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=104888 - could this possibly be used to help solve the above?

Thanks very much in advance for your help! :)

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-30 : 00:19:45
[code]SELECT p.Product,ph.datetimeofchange,AVG(price) AS AveragePrice
FROM Product p
INNER JOIN retailer_product rp
ON rp.associatedproductid=p.id
INNER JOIN price_history ph
ON ph.retailer_productid=rp.id
GROUP BY p.Product,ph.datetimeofchange[/code]
Go to Top of Page

dgburton1
Starting Member

20 Posts

Posted - 2008-06-30 : 00:28:38
Thanks visakh16, sorry I actually posted the wrong link to the database schema - here's the correct one - http://www.boltfile.com/directdownload/db.pdf - any change you could re-do the query?

many thanks :)
Go to Top of Page

dgburton1
Starting Member

20 Posts

Posted - 2008-06-30 : 00:35:36
Just to give a bit more information, the results I'm looking for would look like:

Pass in @CategoryId, then the results table would look something like:

DateTime AveragePrice
20/06/08 2.22
22/06/08 2.21
30/06/08 2.34

Where AveragePrice is the average of all the Products, made from all RawProducts associated to that Product.

thank you! :)
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-30 : 00:44:51
[code]CREATE PROC GetAvgPriceBYDay
@CategoryId
AS

SELECT ph.DateTime,AVG(price) AS AveragePrice
FROM Products p
INNER JOIN RawProducts rp
ON rp.productid=p.productid
INNER JOIN RawProductPriceHistory ph
ON ph.RawProductId=rp.RawProductId
WHERE p.CategoryId = @CategoryId
GROUP BY ph.DateTime
GO[/code]
Go to Top of Page

dgburton1
Starting Member

20 Posts

Posted - 2008-06-30 : 01:08:59
Hi,

I really appretiate your help so far! :)

I am running the following query along the lines of what you suggested, using a static ProductId:

SELECT p.Name, ph.DateTime, AVG(UnitPrice) AS AveragePrice
FROM Products p
INNER JOIN RawProducts rp
ON rp.ProductId=p.ProductId
INNER JOIN RawProductPriceHistory ph
ON ph.RawProductId = rp.RawProductId
where p.ProductId='962FE285-75CA-4E18-ADE1-10E066DBE26B'
GROUP BY p.Name, ph.DateTime
ORDER BY DateTime DESC

it gives these results:

Name DateTime AveragePrice
Click 2008-05-19 16:14:29.430 2.08
Click 2008-04-26 15:14:32.437 1.99
Click 2008-04-26 12:56:13.530 2.08
Click 2008-03-20 12:37:17.847 1.99
Click 2008-03-19 16:23:57.097 1.99
Click 2008-03-17 14:58:29.830 2.19
Click 2008-03-14 00:47:14.513 1.65

However as you can see, for 2008-04-26 there are two entries - I assume this is because there were 2 entries in the RawProductPriceHistory for that date, how can I get the query to return just 1 row for each date? - in addition I'd like to strip the DateTime of it's Time value - i.e. just return the date.

Once this query is working correctly I'd like to move on to do the same but passing in a CategoryId.

Thanks again! :)
Go to Top of Page

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2008-06-30 : 01:16:56
[code]SELECT p.Name, datetime = DATEADD(DAY, DATEDIFF(DAY, 0, ph.datetime), 0), AVG(UnitPrice) AS AveragePrice
FROM Products p
INNER JOIN RawProducts rp ON rp.ProductId=p.ProductId
INNER JOIN RawProductPriceHistory ph ON ph.RawProductId = rp.RawProductId
WHERE p.ProductId='962FE285-75CA-4E18-ADE1-10E066DBE26B'
GROUP BY p.Name, DATEADD(DAY, DATEDIFF(DAY, 0, ph.datetime), 0)
ORDER BY datetime DESC[/code]


KH
[spoiler]Time is always against us[/spoiler]

Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-06-30 : 02:14:12
quote:
Originally posted by dgburton1

Hi,

I really appretiate your help so far! :)

I am running the following query along the lines of what you suggested, using a static ProductId:

SELECT p.Name, ph.DateTime, AVG(UnitPrice) AS AveragePrice
FROM Products p
INNER JOIN RawProducts rp
ON rp.ProductId=p.ProductId
INNER JOIN RawProductPriceHistory ph
ON ph.RawProductId = rp.RawProductId
where p.ProductId='962FE285-75CA-4E18-ADE1-10E066DBE26B'
GROUP BY p.Name, ph.DateTime
ORDER BY DateTime DESC

it gives these results:

Name DateTime AveragePrice
Click 2008-05-19 16:14:29.430 2.08
Click 2008-04-26 15:14:32.437 1.99
Click 2008-04-26 12:56:13.530 2.08
Click 2008-03-20 12:37:17.847 1.99
Click 2008-03-19 16:23:57.097 1.99
Click 2008-03-17 14:58:29.830 2.19
Click 2008-03-14 00:47:14.513 1.65

However as you can see, for 2008-04-26 there are two entries - I assume this is because there were 2 entries in the RawProductPriceHistory for that date, how can I get the query to return just 1 row for each date? - in addition I'd like to strip the DateTime of it's Time value - i.e. just return the date.

Once this query is working correctly I'd like to move on to do the same but passing in a CategoryId.

Thanks again! :)


Why have you included name field also in group by? as per your requirement i thought you needed average perdate. To eliminate time part use like this

SELECT p.Name, DATEADD(d,DATEDIFF(d,0,ph.DateTime),0) AS DateTime, AVG(UnitPrice) AS AveragePrice
FROM Products p
INNER JOIN RawProducts rp
ON rp.ProductId=p.ProductId
INNER JOIN RawProductPriceHistory ph
ON ph.RawProductId = rp.RawProductId
where p.ProductId='962FE285-75CA-4E18-ADE1-10E066DBE26B'
GROUP BY p.Name, DATEADD(d,DTAEDIFF(d,0,ph.DateTime),0)
ORDER BY DateTime DESC
Go to Top of Page

dgburton1
Starting Member

20 Posts

Posted - 2008-06-30 : 08:58:37
Thanks for your help so far! :)

Please see the following SQL 2005 database backup as an example of the data:
http://www.boltfile.com/directdownload/example.bak

When I run the following query:

SELECT
p.Name,
DATEADD(d,DATEDIFF(d,0,ph.DateTime),0) AS DateTime,
AVG(UnitPrice) AS AveragePrice
FROM Products p
INNER JOIN RawProducts rp
ON rp.ProductId=p.ProductId
INNER JOIN RawProductPriceHistory ph
ON ph.RawProductId = rp.RawProductId
where p.ProductId='962fe285-75ca-4e18-ade1-10e066dbe26b'
GROUP BY p.Name, DATEADD(d,DATEDIFF(d,0,ph.DateTime),0)
ORDER BY DateTime DESC

I get these results:

Click 2008-05-19 00:00:00.000 2.08
Click 2008-04-26 00:00:00.000 2.035
Click 2008-03-20 00:00:00.000 1.99
Click 2008-03-19 00:00:00.000 1.99
Click 2008-03-17 00:00:00.000 2.19
Click 2008-03-14 00:00:00.000 1.65

However this is not quite what is needed as it's just taking the average price for any price changes that happen to occur on the same date, what I need the query to do is return the average price for all RawProducts associated with that Product - so even if the price only changes in one Shop (each RawProduct is associated with one Shop) the average price for that date is calculated by adding up the prices of the last changes from all shops - I believe this logic is similar to that in this thread - http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=104888

Here's an example, I run the following simple query on the RawProductPriceHistory table to get all price changes related to rawproducts related to the specified ProductId:

SELECT
rpph.DateTime, rpph.UnitPrice, s.Name, rp.ProductId
FROM
RawProductPriceHistory rpph, RawProducts rp, Shops s
WHERE
rpph.RawProductId = rp.RawProductId
AND
s.ShopId = rp.ShopId
AND
rp.ProductId = '962fe285-75ca-4e18-ade1-10e066dbe26b'
ORDER BY
[DateTime] DESC


Which returns:

2008-05-19 16:14:29.430 2.08 Sainsburys
2008-04-26 15:14:32.437 1.99 Sainsburys
2008-04-26 12:56:13.530 2.08 Tesco
2008-03-20 12:37:17.847 1.99 Boots
2008-03-19 16:23:57.097 1.99 Tesco
2008-03-17 14:58:29.830 2.19 Superdrug
2008-03-14 00:47:14.513 1.65 Sainsburys

The query results I need are shown below - with the working out shown to hopefully illustrate it a bit better :)

DateTime AveragePrice
2008-05-19 (2.08 + 2.19 + 2.08 + 1.99) / 4 = 2.085 (Sainsburys price increased from 1.99 to 2.08)
2008-04-26 (1.99 + 2.19 + 2.08 + 1.99) / 4 = 2.06 (Sainsburys price increased from 1.65 to 1.99)
2008-03-20 (1.65 + 2.19 + 2.08 + 1.99) / 4 = 1.977 (Tesco price increased from 1.99 to 2.08, product now stocked at Boots)
2008-03-19 (1.65 + 2.19 + 1.99) / 3 = 1.94 (product now stocked at Tesco)
2008-03-17 (1.65 + 2.19) / 2 = 1.92 (product now stocked at Superdrug)
2008-03-14 1.65 (product stocked at Sainsburys)

I really appretiate all you assistance! :):)
Go to Top of Page

dgburton1
Starting Member

20 Posts

Posted - 2008-07-01 : 06:15:41
Was anyone able to help with the above?

Many thanks! :)
Go to Top of Page

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2008-07-01 : 07:26:34
quote:
Originally posted by dgburton1
DateTime AveragePrice
2008-05-19 (2.08 + 2.19 + 2.08 + 1.99) / 4 = 2.085 (Sainsburys price increased from 1.99 to 2.08)
2008-04-26 (1.99 + 2.19 + 2.08 + 1.99) / 4 = 2.06 (Sainsburys price increased from 1.65 to 1.99)
2008-03-20 (1.65 + 2.19 + 2.08 + 1.99) / 4 = 1.977 (Tesco price increased from 1.99 to 2.08, product now stocked at Boots)
2008-03-19 (1.65 + 2.19 + 1.99) / 3 = 1.94 (product now stocked at Tesco)
2008-03-17 (1.65 + 2.19) / 2 = 1.92 (product now stocked at Superdrug)
2008-03-14 1.65 (product stocked at Sainsburys)



Can you explain more on the calculation ? Still don't quite understand it


KH
[spoiler]Time is always against us[/spoiler]

Go to Top of Page

dgburton1
Starting Member

20 Posts

Posted - 2008-07-01 : 11:24:10
Hi khtan,

Hopefully should be clearer if you download the backed-up database from the post above, but basically:

For each product in the Products table I have multiple RawProducts from the RawProducts table

associated. Basicaslly the each row in the RawProducts table represents an instance of the associated

Product being stocked in a particular Shop. I have this data in the RawProductPriceHistory table,

which contains a row for every time a price change is detected for a particular RawProduct in a Shop.

Here is the data in the RawProductPriceHistory table:

2008-05-19 16:14:29.430 2.08 Sainsburys
2008-04-26 15:14:32.437 1.99 Sainsburys
2008-04-26 12:56:13.530 2.08 Tesco
2008-03-20 12:37:17.847 1.99 Boots
2008-03-19 16:23:57.097 1.99 Tesco
2008-03-17 14:58:29.830 2.19 Superdrug
2008-03-14 00:47:14.513 1.65 Sainsburys

The query I am after needs to return the average price of a Product over time, as you can see from the

output of the RawProductPriceHistory table above the product becomes stocked in more shops over time

(i.e. to start with on the 14/3/08 it's only detected at the Sainsburys shop.

So the query needs to return the average price for the Product, at each point at which there is an entry in RawProductPriceHistory - however importantly this average price would be calculated by the current price for the Product at each Shop (i.e. each RawProduct) for that point in time - seeing as there is not an entry in RawProductPriceHistory every day (as the table only records when the price of a RawProduct in a Shop actually changes) I need the query to take the last known price change - to ensure that we get a correct average.

This is what I have tried to explain in the example output (this is the output I would like the query to generate) below - the working out shown is taken from the output of the RawProductPriceHistory table above - showing how I need the average value to be calculated.

DateTime AveragePrice
2008-05-19 (2.08 + 2.19 + 2.08 + 1.99) / 4 = 2.085 (Sainsburys price increased from 1.99 to 2.08)
2008-04-26 (1.99 + 2.19 + 2.08 + 1.99) / 4 = 2.06 (Sainsburys price increased from 1.65 to 1.99)
2008-03-20 (1.65 + 2.19 + 2.08 + 1.99) / 4 = 1.977 (Tesco price increased from 1.99 to 2.08, product now stocked at Boots)
2008-03-19 (1.65 + 2.19 + 1.99) / 3 = 1.94 (product now stocked at Tesco)
2008-03-17 (1.65 + 2.19) / 2 = 1.92 (product now stocked at Superdrug)
2008-03-14 1.65 (product stocked at Sainsburys)

Thanks again for all your help!
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-07-01 : 13:26:03
[code];With Price_CTE (Seq,Date,Price,Name,ProductId) AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY rp.ProductId ORDER BY rpph.DateTime),
DATEADD(d,DATEDIFF(d,0,rpph.DateTime),0), rpph.UnitPrice, s.Name, rp.ProductId
FROM
RawProductPriceHistory rpph
JOIN RawProducts rp
ON rpph.RawProductId = rp.RawProductId
JOIN Shops s
ON s.ShopId = rp.ShopId
)

SELECT c1.Date,c2.AvgPrice
FROM Price_CTE c1
OUTER APPLY (SELECT SUM(Price)*1.0/COUNT(*) AS AvgPrice
FROM Price_CTE t
INNER JOIN (SELECT Name,MAX(Date) as maxdate
FROM Price_CTE
WHERE ProductId=c1.ProductId
AND Seq<=c1.Seq)t1
ON t1.Name=t.Name
AND t1.maxdate=t.Date
WHERE t.ProductId=c1.ProductId
AND t.Seq<=c1.Seq)c2
WHERE c1.ProductId = '962fe285-75ca-4e18-ade1-10e066dbe26b'
ORDER BY c1.Date DESC[/code]
Go to Top of Page

dgburton1
Starting Member

20 Posts

Posted - 2008-07-01 : 17:36:10
Thanks for your quick response! - really appretiate all you help!

However I get the following when the query is run:

Msg 8120, Level 16, State 1, Line 2
Column 'Price_CTE.Name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

many thanks :)
Go to Top of Page

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2008-07-01 : 22:13:51
[code]
DateTime AveragePrice
2008-05-19 (2.08 + 2.19 + 2.08 + 1.99) / 4 = 2.085 (Sainsburys price increased from 1.99 to 2.08)
[/code]

you need to explain to us where did all the 4 value comes from ?
2.08 is from which record ? There are 2.08 and how about the other 2.08 which record is it taken from ?
2.19 is from which record etc etc

And why divide by 4 and some you divide by 2 and 3. You need to tell us your business logic. No way i can know by looking at your data alone.


KH
[spoiler]Time is always against us[/spoiler]

Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-07-02 : 00:03:52
quote:
Originally posted by dgburton1

Thanks for your quick response! - really appretiate all you help!

However I get the following when the query is run:

Msg 8120, Level 16, State 1, Line 2
Column 'Price_CTE.Name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

many thanks :)


;With Price_CTE (Seq,Date,Price,Name,ProductId) AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY rp.ProductId ORDER BY rpph.DateTime),
DATEADD(d,DATEDIFF(d,0,rpph.DateTime),0), rpph.UnitPrice, s.Name, rp.ProductId
FROM
RawProductPriceHistory rpph
JOIN RawProducts rp
ON rpph.RawProductId = rp.RawProductId
JOIN Shops s
ON s.ShopId = rp.ShopId
)

SELECT c1.Date,c2.AvgPrice
FROM Price_CTE c1
OUTER APPLY (SELECT SUM(Price)*1.0/COUNT(*) AS AvgPrice
FROM Price_CTE t
INNER JOIN (SELECT Name,MAX(Date) as maxdate
FROM Price_CTE
WHERE ProductId=c1.ProductId
AND Seq<=c1.Seq
GROUP BY Name)t1
ON t1.Name=t.Name
AND t1.maxdate=t.Date
WHERE t.ProductId=c1.ProductId
AND t.Seq<=c1.Seq)c2
WHERE c1.ProductId = '962fe285-75ca-4e18-ade1-10e066dbe26b'
ORDER BY c1.Date DESC
Go to Top of Page

dgburton1
Starting Member

20 Posts

Posted - 2008-07-02 : 00:52:00
Hi,

Thanks for your help so far! Results of the query:

2008-05-19 00:00:00.000 2.085
2008-04-26 00:00:00.000 1.9775
2008-04-26 00:00:00.000 2.0625
2008-03-20 00:00:00.000 1.955
2008-03-19 00:00:00.000 1.94333333333333
2008-03-17 00:00:00.000 1.92
2008-03-14 00:00:00.000 1.65

When I run the query I get two values for the same date - 26/05/08 - could we make it so it only gives one row per date?

In answer to your questions:

DateTime AveragePrice
2008-05-19 (2.08 + 2.19 + 2.08 + 1.99) / 4 = 2.085 (Sainsburys price increased from 1.99 to 2.08)

you need to explain to us where did all the 4 value comes from ?
2.08 is from which record ? There are 2.08 and how about the other 2.08 which record is it taken from ?
2.19 is from which record etc etc

The first 2.08 is from this record:
2008-05-19 16:14:29.430 2.08 Sainsburys

and the second is from this record:
2008-04-26 12:56:13.530 2.08 Tesco

The 2.19 is from this record:
2008-03-17 14:58:29.830 2.19 Superdrug

I divide by 2, 3 then 4 etc because that is the number of Shops that stock the Product (i.e. that have a RawProductPriceHistory record) at that point in time, i.e:

on 17/03/08 the product is only stocked in 2 shops, so I divide by 2:
2008-03-17 (1.65 + 2.19) / 2 = 1.92 (product now stocked at Superdrug)

...but by 26/06/2008 it's stocked at 4 shops - so I divide by 4:
2008-04-26 (1.99 + 2.19 + 2.08 + 1.99) / 4 = 2.06 (Sainsburys price increased from 1.65 to 1.99)

Could you let me know if it's possible to remove the duplicate 26/06/08 date record from the query results, and also strip out the time element of the datetime?

many thanks!
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-07-02 : 01:40:17
[code];With Price_CTE (Seq,Date,Price,Name,ProductId) AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY rp.ProductId ORDER BY rpph.DateTime),
DATEADD(d,DATEDIFF(d,0,rpph.DateTime),0), rpph.UnitPrice, s.Name, rp.ProductId
FROM
RawProductPriceHistory rpph
JOIN RawProducts rp
ON rpph.RawProductId = rp.RawProductId
JOIN Shops s
ON s.ShopId = rp.ShopId
)

SELECT c1.Date,MAX(c2.AvgPrice)
FROM Price_CTE c1
OUTER APPLY (SELECT SUM(Price)*1.0/COUNT(*) AS AvgPrice
FROM Price_CTE t
INNER JOIN (SELECT Name,MAX(Date) as maxdate
FROM Price_CTE
WHERE ProductId=c1.ProductId
AND Seq<=c1.Seq
GROUP BY Name)t1
ON t1.Name=t.Name
AND t1.maxdate=t.Date
WHERE t.ProductId=c1.ProductId
AND t.Seq<=c1.Seq)c2
WHERE c1.ProductId = '962fe285-75ca-4e18-ade1-10e066dbe26b'
GROUP BY c1.Date
ORDER BY c1.Date DESC[/code]
Go to Top of Page

dgburton1
Starting Member

20 Posts

Posted - 2008-07-02 : 02:47:55
Thanks - that looks like it does what we need! :)

Two questions,

Could we extend the query so it takes in as parameters:
@Date1 <- the start date
@Date2 <- the end date
@ProductIds <- comma seperated list of ProductId's to get data from - at the moment it only takes data from 1 ProductId, but in reality we'll need it to retrieve from multiple ProductIds.

The query should retrieve data between @Date1 and @Date2.

Also - the query seems to run quite slowly - just running the query above takes nearly 40 seconds, is there any way to optimize/speed up the execution?

Thanks again for all your help - really appretiate it!
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-07-02 : 03:39:28
quote:
Originally posted by dgburton1

Thanks - that looks like it does what we need! :)

Two questions,

Could we extend the query so it takes in as parameters:
@Date1 <- the start date
@Date2 <- the end date
@ProductIds <- comma seperated list of ProductId's to get data from - at the moment it only takes data from 1 ProductId, but in reality we'll need it to retrieve from multiple ProductIds.

The query should retrieve data between @Date1 and @Date2.

Also - the query seems to run quite slowly - just running the query above takes nearly 40 seconds, is there any way to optimize/speed up the execution?

Thanks again for all your help - really appretiate it!


Add the parameters to WHERE condition

WHERE ','+@ProductIds + ',' LIKE '%,'+ c1.ProductId +',%'
AND c1.Date BETWEEN @Date1 AND @Date2
Go to Top of Page

dgburton1
Starting Member

20 Posts

Posted - 2008-07-02 : 04:53:44
thanks :)

do you have any suggestions on improving the query execution speed?

thanks again!
Go to Top of Page
    Next Page

- Advertisement -