Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Verification against another row

Author  Topic 

mugen2005
Starting Member

16 Posts

Posted - 2011-12-28 : 13:20:00
I’m trying to group together activities in a sequential order.
The Row column is incremental, the GroupIP is populated for the example, but that is the field I need to write in sql.

Row Activity groupID
1 begin 1
2 datax 1
3 datay 1
4 end 1
5 begin 5
6 end 5
7 begin 7
8 datay 7
9 end 7

So what it will do is the first complete (beginning to end) activity as 1 since that was the row it started on. The second complete activity will be 5, and the third will be 7. I’m doing this because of all the different activities that we use and need to sum durations of each activity belonging in the same group.

It looks like an inner join on itself is what’s needed, but I can’t seem to figure out how to get it to check just from beginning to end on the activity.

Any help would be greatly appreciated.
Thank you

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-12-28 : 14:00:06
One way to do it is as follows:
SELECT 
Row,
Activity,
DENSE_RANK() OVER ( ORDER BY groupID ) AS groupID
FROM
(
SELECT
t1.*,
row-ROW_NUMBER() OVER( PARTITION by t2.[end] ORDER BY row) groupID
FROM
YourTable t1
CROSS APPLY (SELECT MIN(row) AS [end] FROM YourTable t2 WHERE t2.row >= t1.row AND t2.activity = 'end') t2
)S;
Go to Top of Page

mugen2005
Starting Member

16 Posts

Posted - 2011-12-28 : 15:18:42
quote:
Originally posted by sunitabeck

One way to do it is as follows:
SELECT 
Row,
Activity,
DENSE_RANK() OVER ( ORDER BY groupID ) AS groupID
FROM
(
SELECT
t1.*,
row-ROW_NUMBER() OVER( PARTITION by t2.[end] ORDER BY row) groupID
FROM
YourTable t1
CROSS APPLY (SELECT MIN(row) AS [end] FROM YourTable t2 WHERE t2.row >= t1.row AND t2.activity = 'end') t2
)S;





Dense_rank, who would of thought? This is working well but giving the groupID starting at 1 every time. I need to get it to reflect the row number its on. Thank you for your help though. Its definitely helped me get off on the right foot.
Go to Top of Page

Vinnie881
Master Smack Fu Yak Hacker

1231 Posts

Posted - 2011-12-28 : 17:07:53
/*This will be a very fast query for you, just make sure you have a clustered index on the table to keep the order.
*/

declare @tmp table(id int primary key,activity varchar(20),groupid int)
insert into @tmp(id,activity)
select 1,'begin' union all
select 2,'datax' union all
select 3, 'datay' union all
select 4, 'end' union all
select 5, 'begin' union all
select 6, 'end' union all
select 7, 'begin' union all
select 8, 'datay' union all
select 9, 'end'

declare @groupid int,@anchor int
set @groupid = 0

update a
set @groupid = groupid = case when activity = 'begin' then id else @groupid end
,@anchor = id
from @tmp a

select * from @tmp

/*OUTPUT
id activity groupid
1 begin 1
2 datax 1
3 datay 1
4 end 1
5 begin 5
6 end 5
7 begin 7
8 datay 7
9 end 7
*/


Success is 10% Intelligence, 70% Determination, and 22% Stupidity.
\_/ _/ _/\_/ _/\_/ _/ _/- 881
Go to Top of Page

lazerath
Constraint Violating Yak Guru

343 Posts

Posted - 2011-12-28 : 17:11:25
-- initialize sample data
with cteYourTableExample
(Row, Activity, yourGroupID)
AS
(
select 1 ,'begin', 1 union all
select 2 ,'datax', 1 union all
select 3 ,'datay', 1 union all
select 4 ,'end', 1 union all
select 5 ,'begin', 5 union all
select 6 ,'end', 5 union all
select 7 ,'begin', 7 union all
select 8 ,'datay', 7 union all
select 9 ,'end', 7
)
-- Example Solution - CROSS APPLY
-- be sure to have a NONCLUSTERED INDEX on (Activity ASC, ROW DESC) for
-- maximum performance; ideally a filtered index if it's only for this.
select *
from cteYourTableExample c
cross apply (
-----------------
select top 1 d.Row as calcGroupID
from cteYourTableExample d
where d.Row <= c.Row
and d.Activity ='begin'
order by d.Row desc
-----------------
) z;

/* -- Results
Row Activity yourGroupID calcGroupID
1 begin 1 1
2 datax 1 1
3 datay 1 1
4 end 1 1
5 begin 5 5
6 end 5 5
7 begin 7 7
8 datay 7 7
9 end 7 7
--*/

Go to Top of Page

mugen2005
Starting Member

16 Posts

Posted - 2011-12-29 : 08:51:37
Lazerath and Vinnie Thank you SO much. I have used both methods and they work great. I was trying to make this more difficult then it needed to be, thanks again.
Go to Top of Page

mugen2005
Starting Member

16 Posts

Posted - 2011-12-29 : 09:33:35
1 more question.

How would you be able to specify a stop point. Say you wanted to stop at DataY and skip over END and pick back up at begining?

Go to Top of Page

Vinnie881
Master Smack Fu Yak Hacker

1231 Posts

Posted - 2011-12-29 : 13:00:04
If it's a large record base, check the stats and execution time, the query I provided should well out perform others.

As for your question, I am not sure if I am following you, here is a modified version of the query with a "STOP POINT":


declare @tmp table(id int primary key,activity varchar(20),groupid int)
insert into @tmp(id,activity)
select 1,'begin' union all
select 2,'datax' union all
select 3, 'datay' union all
select 4, 'end' union all
select 5, 'begin' union all
select 6, 'end' union all
select 7, 'begin' union all
select 8, 'datay' union all
select 9, 'end'

declare @groupid int,@anchor int
set @groupid = 0

update a
set @groupid = groupid = case when activity = 'begin' then id else @groupid end
,@anchor = id
from @tmp a
where id <=6

select * from @tmp

/*OUTPUT
id activity groupid
1 begin 1
2 datax 1
3 datay 1
4 end 1
5 begin 5
6 end 5
7 begin null
8 datay null
9 end null
*/


Please explain more what you mean by a stop point if this is not it.


Success is 10% Intelligence, 70% Determination, and 22% Stupidity.
\_/ _/ _/\_/ _/\_/ _/ _/- 881
Go to Top of Page

lazerath
Constraint Violating Yak Guru

343 Posts

Posted - 2012-01-03 : 13:22:37
quote:
Originally posted by Vinnie881

If it's a large record base, check the stats and execution time, the query I provided should well out perform others.



You'd be surprised. My testing shows the CROSS APPLY solution far outpaces the update method on my test system. YMMV, but here's the code so you can run it yourself:


GO

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Activity]') AND type in (N'U'))
DROP TABLE [dbo].[Activity]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[Activity](
Row [int] NOT NULL,
[Activity] [varchar](5) NOT NULL,
CONSTRAINT [PK_Activity] PRIMARY KEY CLUSTERED
(
Row ASC
)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) )

GO

SET ANSI_PADDING OFF
GO



WITH Numbers
AS
(
SELECT spt.number
FROM master..spt_values
AS spt
WHERE spt.[type]
= 'P'
)
-- initialize sample data
, cteYourTableExample
(Row, Activity, yourGroupID)
AS
(
select 1 ,'begin', 1 union all
select 2 ,'datax', 1 union all
select 3 ,'datay', 1 union all
select 4 ,'end', 1 union all
select 5 ,'begin', 5 union all
select 6 ,'end', 5 union all
select 7 ,'begin', 7 union all
select 8 ,'datay', 7 union all
select 9 ,'end', 7
)

insert into dbo.Activity
select number * 9 + ROW as Row, Activity
from cteYourTableExample
cross join Numbers
order by number, ROW


declare @idx int = 6;
While (@idx > 0)
begin
insert dbo.activity
select Row + (select MAX(Row) from dbo.activity), activity
from dbo.activity
set @idx += -1;
end
go

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE NONCLUSTERED INDEX idx_activity_1
ON dbo.activity
(Row DESC)
WHERE Activity = 'begin'
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

IF OBJECT_ID('tempdb..#t') is not null drop table #t;
GO
DECLARE @dt datetime = getdate();

-- CROSS APPLY test
select *
into #t
from dbo.Activity c
cross apply (
-----------------
select top 1 d.Row as calcGroupID
from dbo.Activity d
where d.Row <= c.Row
and d.Activity ='begin'
order by d.Row desc
-----------------
) z;

select 'cross apply' as method, DATEDIFF(ms,@dt,getdate()) as duration;
GO
IF OBJECT_ID('tempdb..#t') is not null drop table #t;
GO
DECLARE @dt datetime = getdate();

/* -- Results
method duration
cross apply 3213
--*/

declare @tmp table(id int primary key, activity varchar(20), groupid int)
insert @tmp (id,activity) select ROW, Activity from dbo.Activity

declare @groupid int = 0, @anchor int;

update A
set @groupid = groupid = case when activity = 'begin' then id else @groupid end
,@anchor = id
from @tmp a;

select *
into #t
from @tmp

select '@tmp + update' as method, DATEDIFF(ms,@dt,getdate()) as duration;

/* -- Results
method duration
@tmp + update 12190
--*/

GO
IF OBJECT_ID('tempdb..#t') is not null drop table #t;



Feel free to adjust the loop variable @idx to control the number of rows. In my testing, the cross apply out performed the tmp + update method in every scenario. This would likely change in favor of the update method if you skip the table variable @tmp and update a column on the original table.

However, my understanding is that the UPDATE method works due to the assumption that SQL processes the data in order of the clustered index. I am not sure that assumption always holds true. For instance, would this behavior work on a partitioned table? Does it work when the filegroup used by the table has multiple files spread across different volumes? I have not researched this point, so if anyone knows for sure one way or another please chime in and share your knowledge.
Go to Top of Page
   

- Advertisement -