Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Count of items to create trend line data

Author  Topic 

anilofar
Starting Member

9 Posts

Posted - 2011-01-18 : 18:40:05
I have following two tables


DECLARE @Item TABLE ( Id int)
INSERT INTO @Item VALUES (101)
INSERT INTO @Item VALUES (102)
INSERT INTO @Item VALUES (103)

DECLARE @ItemInfo TABLE (Id int, State char, DateCreated DateTime)
INSERT INTO @TaskInfo VALUES (101, 'A', '2010-12-20')
INSERT INTO @TaskInfo VALUES (101, 'A', '2010-12-21')
INSERT INTO @TaskInfo VALUES (101, 'B', '2010-12-23')
INSERT INTO @TaskInfo VALUES (102, 'A', '2010-12-20')
INSERT INTO @TaskInfo VALUES (102, 'B', '2010-12-21')
INSERT INTO @TaskInfo VALUES (102, 'B', '2010-12-22')
INSERT INTO @TaskInfo VALUES (103, 'B', '2010-12-20')
INSERT INTO @TaskInfo VALUES (103, 'A', '2010-12-23')
INSERT INTO @TaskInfo VALUES (103, 'C', '2010-12-24')


What I need to find is the count of items for each state from 2010-12-20 to 2010-12-24

So the result table should look something like


DECLARE @Result TABLE (Date DateTime, StateA char, Count int)
INSERT INTO @Result VALUES ('2010-12-20', 'A', 2)
INSERT INTO @Result VALUES ('2010-12-20', 'B', 1)
INSERT INTO @Result VALUES ('2010-12-20', 'C', 0)
INSERT INTO @Result VALUES ('2010-12-21', 'A', 1)
INSERT INTO @Result VALUES ('2010-12-21', 'B', 2)
INSERT INTO @Result VALUES ('2010-12-21', 'C', 0)
INSERT INTO @Result VALUES ('2010-12-22', 'A', 1)
INSERT INTO @Result VALUES ('2010-12-22', 'B', 2)
INSERT INTO @Result VALUES ('2010-12-22', 'C', 0)
INSERT INTO @Result VALUES ('2010-12-23', 'A', 1)
INSERT INTO @Result VALUES ('2010-12-23', 'B', 2)
INSERT INTO @Result VALUES ('2010-12-23', 'C', 0)
INSERT INTO @Result VALUES ('2010-12-24', 'A', 0)
INSERT INTO @Result VALUES ('2010-12-24', 'B', 2)
INSERT INTO @Result VALUES ('2010-12-24', 'C', 1)


Please note that for Item 101 there is no info for the 22nd and 24th in @ItemInfo table. But the state for 101 on 22nd should be same as 21st and state on 24th should be same as 23rd.

So, for any missing days, the state info for an item should be the same as day before.

Right, now I am doing a while loop for each day (@current_Date) from 2010-12-20 to 2010-12-24 as outlined below. But, this is REALLY slow when I need to calculate for 3 months on about 4 million rows. Is there another way to do this?


SELECT @current_Date,State,COUNT(Id)
FROM
(
SELECT B.Id
,ISNULL(
(
SELECT TOP 1 A.State
FROM @ItemInfo [A]
WHERE A.Id = B.Id AND A.DateCreated <= @current_Date
),'') [State]
FROM @Item [B]
)[TX]
GROUP BY State


Any help would be greatly appreciated. Thanks!

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-01-18 : 18:50:11
Something like this?
DECLARE @TaskInfo TABLE (Id int, State char, DateCreated DateTime)
INSERT INTO @TaskInfo VALUES (101, 'A', '2010-12-20')
INSERT INTO @TaskInfo VALUES (101, 'A', '2010-12-21')
INSERT INTO @TaskInfo VALUES (101, 'B', '2010-12-23')
INSERT INTO @TaskInfo VALUES (102, 'A', '2010-12-20')
INSERT INTO @TaskInfo VALUES (102, 'B', '2010-12-21')
INSERT INTO @TaskInfo VALUES (102, 'B', '2010-12-22')
INSERT INTO @TaskInfo VALUES (103, 'B', '2010-12-20')
INSERT INTO @TaskInfo VALUES (103, 'A', '2010-12-23')
INSERT INTO @TaskInfo VALUES (103, 'C', '2010-12-24')



SELECT dates.DateCreated,
states.State,
COUNT(t.Id) AS Count
FROM (
SELECT DISTINCT
DateCreated
FROM @TaskInfo
) AS dates
CROSS JOIN (
SELECT DISTINCT
State
FROM @TaskInfo
) AS states
LEFT JOIN @TaskInfo AS t ON t.DateCreated = dates.DateCreated
AND t.State = states.State
GROUP BY dates.DateCreated,
states.State
ORDER BY dates.DateCreated,
states.State



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

anilofar
Starting Member

9 Posts

Posted - 2011-01-18 : 19:04:12
Thanks Peso.

However, that doesn't give me the correct count. Please see above (@Result) Table

Your query gives:

2010-12-20 00:00:00.000 A 2
2010-12-20 00:00:00.000 B 1
2010-12-20 00:00:00.000 C 0
2010-12-21 00:00:00.000 A 1
2010-12-21 00:00:00.000 B 1
2010-12-21 00:00:00.000 C 0
2010-12-22 00:00:00.000 A 0
2010-12-22 00:00:00.000 B 1
2010-12-22 00:00:00.000 C 0
2010-12-23 00:00:00.000 A 1
2010-12-23 00:00:00.000 B 1
2010-12-23 00:00:00.000 C 0
2010-12-24 00:00:00.000 A 0
2010-12-24 00:00:00.000 B 0
2010-12-24 00:00:00.000 C 1

For instance, on the 21st I should have
2010-12-21 00:00:00.000 A 1
2010-12-21 00:00:00.000 B 2
2010-12-21 00:00:00.000 C 0

B has count of 2 here because I also need to count an item's state if it is missing for a certain day. This item's state will be same as previous day.

ie: On the 21st, Items 102 and 103 were in state B.

I hope I have explained it clearly.
Go to Top of Page

jcelko
Esteemed SQL Purist

547 Posts

Posted - 2011-01-18 : 19:05:13
Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. If you know how, follow ISO-11179 data element naming conventions and formatting rules. Temporal data should use ISO-8601 formats. Code should be in Standard SQL as much as possible and not local dialect.

CREATE TABLE Vague_Items
(vague_id INTEGER NOT NULL PRIMARY KEY);

INSERT INTO Vague_Items VALUES (101), (102), (103);

CREATE TABLE Item_History
(vague_id INTEGER NOT NULL,
REFERENCES Vague_Items (vague_id),
creation_date DATE NOT NULL,
foobar_state CHAR(1) NOT NULL
CHECK(State IN ('A', 'B', 'C')),
PRIMARY KEY (vague_id, creation_date)); -- guessing here

INSERT INTO Item_History (vague_id, foobar_state, creation_date)
VALUES (101, 'A', '2010-12-20'),
(101, 'A', '2010-12-21'),
(101, 'B', '2010-12-23'),
(102, 'A', '2010-12-20'),
(102, 'B', '2010-12-21'),
(102, 'B', '2010-12-22'),
(103, 'B', '2010-12-20'),
(103, 'A', '2010-12-23'),
(103, 'C', '2010-12-24');
>> What I need to find is the count of items for each state from 2010-12-20 to 2010-12-24 <<

SELECT creation_date,
SUM (CASE WHEN foobar_state = 'A' THEN 1 ELSE 0 END AS a_cnt,
SUM (CASE WHEN foobar_state = 'A' THEN 1 ELSE 0 END AS b_cnt,
SUM (CASE WHEN foobar_state = 'A' THEN 1 ELSE 0 END AS c_cnt
FROM Item_History
WHERE creation_date BETWEEN '2010-12-20' AND '2010-12-24'
GROUP BY creation_date;


--CELKO--
Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-01-18 : 19:31:04
I have a couple of suggestions.
But I need to know the ratio of missing records vs total number of records.


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

anilofar
Starting Member

9 Posts

Posted - 2011-01-18 : 20:10:19
I can't really give an exact number here since some items are updated almost daily whereas other items may not be updated for days.

But, on average for a given week, an item maybe updated two or three times a week.

Additional info, that may be helpful. In one instance, I am working with about 200000 distinct items and this will have close to approx 3 million rows of data for say 3 months.

I hope this helps.
Go to Top of Page

dineshrajan_it
Posting Yak Master

217 Posts

Posted - 2011-01-20 : 05:27:19
Hi,


DECLARE @Item TABLE ( Id int)
INSERT INTO @Item VALUES (101)
INSERT INTO @Item VALUES (102)
INSERT INTO @Item VALUES (103)

DECLARE @ItemInfo TABLE (Id int, State char, DateCreated DateTime)
INSERT INTO @ItemInfo VALUES (101, 'A', '2010-12-20')
INSERT INTO @ItemInfo VALUES (101, 'A', '2010-12-21')
INSERT INTO @ItemInfo VALUES (101, 'B', '2010-12-23')
INSERT INTO @ItemInfo VALUES (102, 'A', '2010-12-20')
INSERT INTO @ItemInfo VALUES (102, 'B', '2010-12-21')
INSERT INTO @ItemInfo VALUES (102, 'B', '2010-12-22')
INSERT INTO @ItemInfo VALUES (103, 'B', '2010-12-20')
INSERT INTO @ItemInfo VALUES (103, 'A', '2010-12-23')
INSERT INTO @ItemInfo VALUES (103, 'C', '2010-12-24')

select t.*
from
(
select [state] = isnull(info1.State,info2.state)
,[DateCreated] = isnull(info1.DateCreated,info2.DateCreated)
,sum(case when info1.State IS null then 0 else 1 end) as 'count'
--,info2.*
from
@ItemInfo info1
right outer join
(
select distinct datecreated,t.state
from @ItemInfo
cross apply
(
select 'state'='A'
union
select 'B'
union
select 'C'
)t
)info2 on info1.datecreated=info2.datecreated and info1.state=info2.state
group by info1.datecreated,info1.state,info2.datecreated,info2.state
) t
order by t.DateCreated,t.state

Iam a slow walker but i never walk back
Go to Top of Page

anilofar
Starting Member

9 Posts

Posted - 2011-01-20 : 14:27:20
Thanks for the reply.

However, that gives me the same results as what Peso's query does.

Looking back at the code that I posted. Looks to me like the correlated subquery inside the while loop is the one that is slowing things down. Especially when running this on 300000+ items for 3 months+.

Is there another way to write the following query, possibly using joins that will improve performance?


DECLARE @count INT = 1;
WHILE @count > 0
BEGIN
-- @DateTable is used as an anchor to find all the
-- dates that is to be used in the trendline chart
SELECT @current_Date = [Date] FROM @DateTable ORDER BY [Date]

SELECT @current_Date,State,COUNT(Id)
FROM
(
SELECT B.Id
,ISNULL(
(
SELECT TOP 1 A.State
FROM @ItemInfo [A]
WHERE A.Id = B.Id AND A.DateCreated <= @current_Date
),'') [State]
FROM @Item [B]
)[TX]
GROUP BY State

DELETE FROM @DateTable WHERE [Date] = @current_Date;
SELECT @count = COUNT(*) FROM @DateTable;
END


Go to Top of Page
   

- Advertisement -