Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
dgburton1
Starting Member
20 Posts |
Posted - 2008-06-30 : 00:01:17
|
| Hi,I would really appretiate some help with the following query :)The question I have to answer is:"Get the average price of all products in the given category" (a 'category' can contain multiple products)The query results are to be used to render a graph.Here is a link to a PDF of the database schema:http://www.boltfile.com/directdownload/db.pdfEssential 4 tables are involved, every Product has a Category, and every Product has multiple RawProducts (these represent instances of that Product stocked in multiple shops). A process scans the prices of the RawProducts at each shop once per day, and the RawProductPriceHistory logs when the price of a RawProduct changes, a row is only added to this table when the price changes - so we don't have one entry per day if the price has not changed, for exmaple:RawProductPriceHistoryId RawProductId Date UnitPrice ProductId1 37 10/06/08 2.22 12 20 21/06/08 3.10 23 20 22/06/08 2.21 24 36 29/06/08 2.19 1Essentially the query I'm after has to return the average price of all products in category over time, I'm thinking that this would have to, for each row in the RawProductPriceHistory table, average price price of all RawProducts at this date for the associated Product, then do that for each date where there was an entry in RawProductPriceHistory...Previously visakh16 helped me with an SP to return the prices for each RawProduct for a given Product for a given date - http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=104888 - could this possibly be used to help solve the above?Thanks very much in advance for your help! :) |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-06-30 : 00:19:45
|
| [code]SELECT p.Product,ph.datetimeofchange,AVG(price) AS AveragePriceFROM Product pINNER JOIN retailer_product rpON rp.associatedproductid=p.idINNER JOIN price_history phON ph.retailer_productid=rp.idGROUP BY p.Product,ph.datetimeofchange[/code] |
 |
|
|
dgburton1
Starting Member
20 Posts |
Posted - 2008-06-30 : 00:28:38
|
| Thanks visakh16, sorry I actually posted the wrong link to the database schema - here's the correct one - http://www.boltfile.com/directdownload/db.pdf - any change you could re-do the query?many thanks :) |
 |
|
|
dgburton1
Starting Member
20 Posts |
Posted - 2008-06-30 : 00:35:36
|
| Just to give a bit more information, the results I'm looking for would look like:Pass in @CategoryId, then the results table would look something like:DateTime AveragePrice20/06/08 2.2222/06/08 2.2130/06/08 2.34Where AveragePrice is the average of all the Products, made from all RawProducts associated to that Product.thank you! :) |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-06-30 : 00:44:51
|
| [code]CREATE PROC GetAvgPriceBYDay@CategoryIdASSELECT ph.DateTime,AVG(price) AS AveragePriceFROM Products pINNER JOIN RawProducts rpON rp.productid=p.productidINNER JOIN RawProductPriceHistory phON ph.RawProductId=rp.RawProductIdWHERE p.CategoryId = @CategoryIdGROUP BY ph.DateTimeGO[/code] |
 |
|
|
dgburton1
Starting Member
20 Posts |
Posted - 2008-06-30 : 01:08:59
|
| Hi,I really appretiate your help so far! :)I am running the following query along the lines of what you suggested, using a static ProductId:SELECT p.Name, ph.DateTime, AVG(UnitPrice) AS AveragePriceFROM Products pINNER JOIN RawProducts rpON rp.ProductId=p.ProductIdINNER JOIN RawProductPriceHistory phON ph.RawProductId = rp.RawProductIdwhere p.ProductId='962FE285-75CA-4E18-ADE1-10E066DBE26B'GROUP BY p.Name, ph.DateTimeORDER BY DateTime DESCit gives these results:Name DateTime AveragePriceClick 2008-05-19 16:14:29.430 2.08Click 2008-04-26 15:14:32.437 1.99Click 2008-04-26 12:56:13.530 2.08Click 2008-03-20 12:37:17.847 1.99Click 2008-03-19 16:23:57.097 1.99Click 2008-03-17 14:58:29.830 2.19Click 2008-03-14 00:47:14.513 1.65However as you can see, for 2008-04-26 there are two entries - I assume this is because there were 2 entries in the RawProductPriceHistory for that date, how can I get the query to return just 1 row for each date? - in addition I'd like to strip the DateTime of it's Time value - i.e. just return the date.Once this query is working correctly I'd like to move on to do the same but passing in a CategoryId.Thanks again! :) |
 |
|
|
khtan
In (Som, Ni, Yak)
17689 Posts |
Posted - 2008-06-30 : 01:16:56
|
[code]SELECT p.Name, datetime = DATEADD(DAY, DATEDIFF(DAY, 0, ph.datetime), 0), AVG(UnitPrice) AS AveragePriceFROM Products p INNER JOIN RawProducts rp ON rp.ProductId=p.ProductId INNER JOIN RawProductPriceHistory ph ON ph.RawProductId = rp.RawProductIdWHERE p.ProductId='962FE285-75CA-4E18-ADE1-10E066DBE26B'GROUP BY p.Name, DATEADD(DAY, DATEDIFF(DAY, 0, ph.datetime), 0)ORDER BY datetime DESC[/code] KH[spoiler]Time is always against us[/spoiler] |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-06-30 : 02:14:12
|
quote: Originally posted by dgburton1 Hi,I really appretiate your help so far! :)I am running the following query along the lines of what you suggested, using a static ProductId:SELECT p.Name, ph.DateTime, AVG(UnitPrice) AS AveragePriceFROM Products pINNER JOIN RawProducts rpON rp.ProductId=p.ProductIdINNER JOIN RawProductPriceHistory phON ph.RawProductId = rp.RawProductIdwhere p.ProductId='962FE285-75CA-4E18-ADE1-10E066DBE26B'GROUP BY p.Name, ph.DateTimeORDER BY DateTime DESCit gives these results:Name DateTime AveragePriceClick 2008-05-19 16:14:29.430 2.08Click 2008-04-26 15:14:32.437 1.99Click 2008-04-26 12:56:13.530 2.08Click 2008-03-20 12:37:17.847 1.99Click 2008-03-19 16:23:57.097 1.99Click 2008-03-17 14:58:29.830 2.19Click 2008-03-14 00:47:14.513 1.65However as you can see, for 2008-04-26 there are two entries - I assume this is because there were 2 entries in the RawProductPriceHistory for that date, how can I get the query to return just 1 row for each date? - in addition I'd like to strip the DateTime of it's Time value - i.e. just return the date.Once this query is working correctly I'd like to move on to do the same but passing in a CategoryId.Thanks again! :)
Why have you included name field also in group by? as per your requirement i thought you needed average perdate. To eliminate time part use like thisSELECT p.Name, DATEADD(d,DATEDIFF(d,0,ph.DateTime),0) AS DateTime, AVG(UnitPrice) AS AveragePriceFROM Products pINNER JOIN RawProducts rpON rp.ProductId=p.ProductIdINNER JOIN RawProductPriceHistory phON ph.RawProductId = rp.RawProductIdwhere p.ProductId='962FE285-75CA-4E18-ADE1-10E066DBE26B'GROUP BY p.Name, DATEADD(d,DTAEDIFF(d,0,ph.DateTime),0)ORDER BY DateTime DESC |
 |
|
|
dgburton1
Starting Member
20 Posts |
Posted - 2008-06-30 : 08:58:37
|
| Thanks for your help so far! :)Please see the following SQL 2005 database backup as an example of the data:http://www.boltfile.com/directdownload/example.bakWhen I run the following query:SELECT p.Name, DATEADD(d,DATEDIFF(d,0,ph.DateTime),0) AS DateTime, AVG(UnitPrice) AS AveragePriceFROM Products pINNER JOIN RawProducts rpON rp.ProductId=p.ProductIdINNER JOIN RawProductPriceHistory phON ph.RawProductId = rp.RawProductIdwhere p.ProductId='962fe285-75ca-4e18-ade1-10e066dbe26b'GROUP BY p.Name, DATEADD(d,DATEDIFF(d,0,ph.DateTime),0)ORDER BY DateTime DESCI get these results:Click 2008-05-19 00:00:00.000 2.08Click 2008-04-26 00:00:00.000 2.035Click 2008-03-20 00:00:00.000 1.99Click 2008-03-19 00:00:00.000 1.99Click 2008-03-17 00:00:00.000 2.19Click 2008-03-14 00:00:00.000 1.65However this is not quite what is needed as it's just taking the average price for any price changes that happen to occur on the same date, what I need the query to do is return the average price for all RawProducts associated with that Product - so even if the price only changes in one Shop (each RawProduct is associated with one Shop) the average price for that date is calculated by adding up the prices of the last changes from all shops - I believe this logic is similar to that in this thread - http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=104888 Here's an example, I run the following simple query on the RawProductPriceHistory table to get all price changes related to rawproducts related to the specified ProductId:SELECT rpph.DateTime, rpph.UnitPrice, s.Name, rp.ProductIdFROM RawProductPriceHistory rpph, RawProducts rp, Shops sWHERE rpph.RawProductId = rp.RawProductId AND s.ShopId = rp.ShopIdAND rp.ProductId = '962fe285-75ca-4e18-ade1-10e066dbe26b'ORDER BY [DateTime] DESCWhich returns:2008-05-19 16:14:29.430 2.08 Sainsburys2008-04-26 15:14:32.437 1.99 Sainsburys2008-04-26 12:56:13.530 2.08 Tesco2008-03-20 12:37:17.847 1.99 Boots2008-03-19 16:23:57.097 1.99 Tesco2008-03-17 14:58:29.830 2.19 Superdrug2008-03-14 00:47:14.513 1.65 SainsburysThe query results I need are shown below - with the working out shown to hopefully illustrate it a bit better :)DateTime AveragePrice2008-05-19 (2.08 + 2.19 + 2.08 + 1.99) / 4 = 2.085 (Sainsburys price increased from 1.99 to 2.08)2008-04-26 (1.99 + 2.19 + 2.08 + 1.99) / 4 = 2.06 (Sainsburys price increased from 1.65 to 1.99)2008-03-20 (1.65 + 2.19 + 2.08 + 1.99) / 4 = 1.977 (Tesco price increased from 1.99 to 2.08, product now stocked at Boots)2008-03-19 (1.65 + 2.19 + 1.99) / 3 = 1.94 (product now stocked at Tesco)2008-03-17 (1.65 + 2.19) / 2 = 1.92 (product now stocked at Superdrug)2008-03-14 1.65 (product stocked at Sainsburys)I really appretiate all you assistance! :):) |
 |
|
|
dgburton1
Starting Member
20 Posts |
Posted - 2008-07-01 : 06:15:41
|
| Was anyone able to help with the above?Many thanks! :) |
 |
|
|
khtan
In (Som, Ni, Yak)
17689 Posts |
Posted - 2008-07-01 : 07:26:34
|
quote: Originally posted by dgburton1DateTime AveragePrice2008-05-19 (2.08 + 2.19 + 2.08 + 1.99) / 4 = 2.085 (Sainsburys price increased from 1.99 to 2.08)2008-04-26 (1.99 + 2.19 + 2.08 + 1.99) / 4 = 2.06 (Sainsburys price increased from 1.65 to 1.99)2008-03-20 (1.65 + 2.19 + 2.08 + 1.99) / 4 = 1.977 (Tesco price increased from 1.99 to 2.08, product now stocked at Boots)2008-03-19 (1.65 + 2.19 + 1.99) / 3 = 1.94 (product now stocked at Tesco)2008-03-17 (1.65 + 2.19) / 2 = 1.92 (product now stocked at Superdrug)2008-03-14 1.65 (product stocked at Sainsburys)
Can you explain more on the calculation ? Still don't quite understand it KH[spoiler]Time is always against us[/spoiler] |
 |
|
|
dgburton1
Starting Member
20 Posts |
Posted - 2008-07-01 : 11:24:10
|
| Hi khtan,Hopefully should be clearer if you download the backed-up database from the post above, but basically:For each product in the Products table I have multiple RawProducts from the RawProducts table associated. Basicaslly the each row in the RawProducts table represents an instance of the associated Product being stocked in a particular Shop. I have this data in the RawProductPriceHistory table, which contains a row for every time a price change is detected for a particular RawProduct in a Shop.Here is the data in the RawProductPriceHistory table:2008-05-19 16:14:29.430 2.08 Sainsburys2008-04-26 15:14:32.437 1.99 Sainsburys2008-04-26 12:56:13.530 2.08 Tesco2008-03-20 12:37:17.847 1.99 Boots2008-03-19 16:23:57.097 1.99 Tesco2008-03-17 14:58:29.830 2.19 Superdrug2008-03-14 00:47:14.513 1.65 SainsburysThe query I am after needs to return the average price of a Product over time, as you can see from the output of the RawProductPriceHistory table above the product becomes stocked in more shops over time (i.e. to start with on the 14/3/08 it's only detected at the Sainsburys shop.So the query needs to return the average price for the Product, at each point at which there is an entry in RawProductPriceHistory - however importantly this average price would be calculated by the current price for the Product at each Shop (i.e. each RawProduct) for that point in time - seeing as there is not an entry in RawProductPriceHistory every day (as the table only records when the price of a RawProduct in a Shop actually changes) I need the query to take the last known price change - to ensure that we get a correct average.This is what I have tried to explain in the example output (this is the output I would like the query to generate) below - the working out shown is taken from the output of the RawProductPriceHistory table above - showing how I need the average value to be calculated.DateTime AveragePrice2008-05-19 (2.08 + 2.19 + 2.08 + 1.99) / 4 = 2.085 (Sainsburys price increased from 1.99 to 2.08)2008-04-26 (1.99 + 2.19 + 2.08 + 1.99) / 4 = 2.06 (Sainsburys price increased from 1.65 to 1.99)2008-03-20 (1.65 + 2.19 + 2.08 + 1.99) / 4 = 1.977 (Tesco price increased from 1.99 to 2.08, product now stocked at Boots)2008-03-19 (1.65 + 2.19 + 1.99) / 3 = 1.94 (product now stocked at Tesco)2008-03-17 (1.65 + 2.19) / 2 = 1.92 (product now stocked at Superdrug)2008-03-14 1.65 (product stocked at Sainsburys)Thanks again for all your help! |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-07-01 : 13:26:03
|
| [code];With Price_CTE (Seq,Date,Price,Name,ProductId) AS(SELECT ROW_NUMBER() OVER(PARTITION BY rp.ProductId ORDER BY rpph.DateTime),DATEADD(d,DATEDIFF(d,0,rpph.DateTime),0), rpph.UnitPrice, s.Name, rp.ProductIdFROMRawProductPriceHistory rpphJOIN RawProducts rpON rpph.RawProductId = rp.RawProductId JOIN Shops sON s.ShopId = rp.ShopId)SELECT c1.Date,c2.AvgPriceFROM Price_CTE c1OUTER APPLY (SELECT SUM(Price)*1.0/COUNT(*) AS AvgPrice FROM Price_CTE t INNER JOIN (SELECT Name,MAX(Date) as maxdate FROM Price_CTE WHERE ProductId=c1.ProductId AND Seq<=c1.Seq)t1 ON t1.Name=t.Name AND t1.maxdate=t.Date WHERE t.ProductId=c1.ProductId AND t.Seq<=c1.Seq)c2WHERE c1.ProductId = '962fe285-75ca-4e18-ade1-10e066dbe26b'ORDER BY c1.Date DESC[/code] |
 |
|
|
dgburton1
Starting Member
20 Posts |
Posted - 2008-07-01 : 17:36:10
|
| Thanks for your quick response! - really appretiate all you help!However I get the following when the query is run:Msg 8120, Level 16, State 1, Line 2Column 'Price_CTE.Name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.many thanks :) |
 |
|
|
khtan
In (Som, Ni, Yak)
17689 Posts |
Posted - 2008-07-01 : 22:13:51
|
[code]DateTime AveragePrice2008-05-19 (2.08 + 2.19 + 2.08 + 1.99) / 4 = 2.085 (Sainsburys price increased from 1.99 to 2.08)[/code]you need to explain to us where did all the 4 value comes from ? 2.08 is from which record ? There are 2.08 and how about the other 2.08 which record is it taken from ?2.19 is from which record etc etcAnd why divide by 4 and some you divide by 2 and 3. You need to tell us your business logic. No way i can know by looking at your data alone. KH[spoiler]Time is always against us[/spoiler] |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-07-02 : 00:03:52
|
quote: Originally posted by dgburton1 Thanks for your quick response! - really appretiate all you help!However I get the following when the query is run:Msg 8120, Level 16, State 1, Line 2Column 'Price_CTE.Name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.many thanks :)
;With Price_CTE (Seq,Date,Price,Name,ProductId) AS(SELECT ROW_NUMBER() OVER(PARTITION BY rp.ProductId ORDER BY rpph.DateTime),DATEADD(d,DATEDIFF(d,0,rpph.DateTime),0), rpph.UnitPrice, s.Name, rp.ProductIdFROMRawProductPriceHistory rpphJOIN RawProducts rpON rpph.RawProductId = rp.RawProductId JOIN Shops sON s.ShopId = rp.ShopId)SELECT c1.Date,c2.AvgPriceFROM Price_CTE c1OUTER APPLY (SELECT SUM(Price)*1.0/COUNT(*) AS AvgPrice FROM Price_CTE t INNER JOIN (SELECT Name,MAX(Date) as maxdate FROM Price_CTE WHERE ProductId=c1.ProductId AND Seq<=c1.Seq GROUP BY Name)t1 ON t1.Name=t.Name AND t1.maxdate=t.Date WHERE t.ProductId=c1.ProductId AND t.Seq<=c1.Seq)c2WHERE c1.ProductId = '962fe285-75ca-4e18-ade1-10e066dbe26b'ORDER BY c1.Date DESC |
 |
|
|
dgburton1
Starting Member
20 Posts |
Posted - 2008-07-02 : 00:52:00
|
| Hi,Thanks for your help so far! Results of the query:2008-05-19 00:00:00.000 2.0852008-04-26 00:00:00.000 1.97752008-04-26 00:00:00.000 2.06252008-03-20 00:00:00.000 1.9552008-03-19 00:00:00.000 1.943333333333332008-03-17 00:00:00.000 1.922008-03-14 00:00:00.000 1.65When I run the query I get two values for the same date - 26/05/08 - could we make it so it only gives one row per date?In answer to your questions:DateTime AveragePrice2008-05-19 (2.08 + 2.19 + 2.08 + 1.99) / 4 = 2.085 (Sainsburys price increased from 1.99 to 2.08)you need to explain to us where did all the 4 value comes from ? 2.08 is from which record ? There are 2.08 and how about the other 2.08 which record is it taken from ?2.19 is from which record etc etcThe first 2.08 is from this record:2008-05-19 16:14:29.430 2.08 Sainsburysand the second is from this record:2008-04-26 12:56:13.530 2.08 TescoThe 2.19 is from this record:2008-03-17 14:58:29.830 2.19 SuperdrugI divide by 2, 3 then 4 etc because that is the number of Shops that stock the Product (i.e. that have a RawProductPriceHistory record) at that point in time, i.e:on 17/03/08 the product is only stocked in 2 shops, so I divide by 2:2008-03-17 (1.65 + 2.19) / 2 = 1.92 (product now stocked at Superdrug)...but by 26/06/2008 it's stocked at 4 shops - so I divide by 4:2008-04-26 (1.99 + 2.19 + 2.08 + 1.99) / 4 = 2.06 (Sainsburys price increased from 1.65 to 1.99)Could you let me know if it's possible to remove the duplicate 26/06/08 date record from the query results, and also strip out the time element of the datetime?many thanks! |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-07-02 : 01:40:17
|
| [code];With Price_CTE (Seq,Date,Price,Name,ProductId) AS(SELECT ROW_NUMBER() OVER(PARTITION BY rp.ProductId ORDER BY rpph.DateTime),DATEADD(d,DATEDIFF(d,0,rpph.DateTime),0), rpph.UnitPrice, s.Name, rp.ProductIdFROMRawProductPriceHistory rpphJOIN RawProducts rpON rpph.RawProductId = rp.RawProductId JOIN Shops sON s.ShopId = rp.ShopId)SELECT c1.Date,MAX(c2.AvgPrice)FROM Price_CTE c1OUTER APPLY (SELECT SUM(Price)*1.0/COUNT(*) AS AvgPrice FROM Price_CTE t INNER JOIN (SELECT Name,MAX(Date) as maxdate FROM Price_CTE WHERE ProductId=c1.ProductId AND Seq<=c1.Seq GROUP BY Name)t1 ON t1.Name=t.Name AND t1.maxdate=t.Date WHERE t.ProductId=c1.ProductId AND t.Seq<=c1.Seq)c2WHERE c1.ProductId = '962fe285-75ca-4e18-ade1-10e066dbe26b'GROUP BY c1.DateORDER BY c1.Date DESC[/code] |
 |
|
|
dgburton1
Starting Member
20 Posts |
Posted - 2008-07-02 : 02:47:55
|
| Thanks - that looks like it does what we need! :)Two questions,Could we extend the query so it takes in as parameters:@Date1 <- the start date@Date2 <- the end date @ProductIds <- comma seperated list of ProductId's to get data from - at the moment it only takes data from 1 ProductId, but in reality we'll need it to retrieve from multiple ProductIds.The query should retrieve data between @Date1 and @Date2.Also - the query seems to run quite slowly - just running the query above takes nearly 40 seconds, is there any way to optimize/speed up the execution?Thanks again for all your help - really appretiate it! |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-07-02 : 03:39:28
|
quote: Originally posted by dgburton1 Thanks - that looks like it does what we need! :)Two questions,Could we extend the query so it takes in as parameters:@Date1 <- the start date@Date2 <- the end date @ProductIds <- comma seperated list of ProductId's to get data from - at the moment it only takes data from 1 ProductId, but in reality we'll need it to retrieve from multiple ProductIds.The query should retrieve data between @Date1 and @Date2.Also - the query seems to run quite slowly - just running the query above takes nearly 40 seconds, is there any way to optimize/speed up the execution?Thanks again for all your help - really appretiate it!
Add the parameters to WHERE conditionWHERE ','+@ProductIds + ',' LIKE '%,'+ c1.ProductId +',%'AND c1.Date BETWEEN @Date1 AND @Date2 |
 |
|
|
dgburton1
Starting Member
20 Posts |
Posted - 2008-07-02 : 04:53:44
|
| thanks :) do you have any suggestions on improving the query execution speed?thanks again! |
 |
|
|
Next Page
|
|
|
|
|