Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
mugen2005
Starting Member
16 Posts |
Posted - 2011-12-28 : 13:20:00
|
| I’m trying to group together activities in a sequential order. The Row column is incremental, the GroupIP is populated for the example, but that is the field I need to write in sql.Row Activity groupID 1 begin 12 datax 13 datay 14 end 15 begin 56 end 57 begin 78 datay 79 end 7So what it will do is the first complete (beginning to end) activity as 1 since that was the row it started on. The second complete activity will be 5, and the third will be 7. I’m doing this because of all the different activities that we use and need to sum durations of each activity belonging in the same group.It looks like an inner join on itself is what’s needed, but I can’t seem to figure out how to get it to check just from beginning to end on the activity. Any help would be greatly appreciated. Thank you |
|
|
sunitabeck
Master Smack Fu Yak Hacker
5155 Posts |
Posted - 2011-12-28 : 14:00:06
|
One way to do it is as follows:SELECT Row, Activity, DENSE_RANK() OVER ( ORDER BY groupID ) AS groupIDFROM( SELECT t1.*, row-ROW_NUMBER() OVER( PARTITION by t2.[end] ORDER BY row) groupID FROM YourTable t1 CROSS APPLY (SELECT MIN(row) AS [end] FROM YourTable t2 WHERE t2.row >= t1.row AND t2.activity = 'end') t2)S; |
 |
|
|
mugen2005
Starting Member
16 Posts |
Posted - 2011-12-28 : 15:18:42
|
quote: Originally posted by sunitabeck One way to do it is as follows:SELECT Row, Activity, DENSE_RANK() OVER ( ORDER BY groupID ) AS groupIDFROM( SELECT t1.*, row-ROW_NUMBER() OVER( PARTITION by t2.[end] ORDER BY row) groupID FROM YourTable t1 CROSS APPLY (SELECT MIN(row) AS [end] FROM YourTable t2 WHERE t2.row >= t1.row AND t2.activity = 'end') t2)S;
Dense_rank, who would of thought? This is working well but giving the groupID starting at 1 every time. I need to get it to reflect the row number its on. Thank you for your help though. Its definitely helped me get off on the right foot. |
 |
|
|
Vinnie881
Master Smack Fu Yak Hacker
1231 Posts |
Posted - 2011-12-28 : 17:07:53
|
/*This will be a very fast query for you, just make sure you have a clustered index on the table to keep the order.*/declare @tmp table(id int primary key,activity varchar(20),groupid int)insert into @tmp(id,activity)select 1,'begin' union allselect 2,'datax' union allselect 3, 'datay' union allselect 4, 'end' union allselect 5, 'begin' union allselect 6, 'end' union allselect 7, 'begin' union allselect 8, 'datay' union allselect 9, 'end' declare @groupid int,@anchor intset @groupid = 0update aset @groupid = groupid = case when activity = 'begin' then id else @groupid end,@anchor = idfrom @tmp aselect * from @tmp/*OUTPUTid activity groupid1 begin 12 datax 13 datay 14 end 15 begin 56 end 57 begin 78 datay 79 end 7*/ Success is 10% Intelligence, 70% Determination, and 22% Stupidity.\_/ _/ _/\_/ _/\_/ _/ _/- 881 |
 |
|
|
lazerath
Constraint Violating Yak Guru
343 Posts |
Posted - 2011-12-28 : 17:11:25
|
| -- initialize sample datawith cteYourTableExample(Row, Activity, yourGroupID)AS(select 1 ,'begin', 1 union allselect 2 ,'datax', 1 union allselect 3 ,'datay', 1 union allselect 4 ,'end', 1 union allselect 5 ,'begin', 5 union allselect 6 ,'end', 5 union allselect 7 ,'begin', 7 union allselect 8 ,'datay', 7 union allselect 9 ,'end', 7)-- Example Solution - CROSS APPLY-- be sure to have a NONCLUSTERED INDEX on (Activity ASC, ROW DESC) for -- maximum performance; ideally a filtered index if it's only for this.select *from cteYourTableExample ccross apply (-----------------select top 1 d.Row as calcGroupIDfrom cteYourTableExample d where d.Row <= c.Row and d.Activity ='begin' order by d.Row desc----------------- ) z;/* -- ResultsRow Activity yourGroupID calcGroupID1 begin 1 12 datax 1 13 datay 1 14 end 1 15 begin 5 56 end 5 57 begin 7 78 datay 7 79 end 7 7--*/ |
 |
|
|
mugen2005
Starting Member
16 Posts |
Posted - 2011-12-29 : 08:51:37
|
| Lazerath and Vinnie Thank you SO much. I have used both methods and they work great. I was trying to make this more difficult then it needed to be, thanks again. |
 |
|
|
mugen2005
Starting Member
16 Posts |
Posted - 2011-12-29 : 09:33:35
|
| 1 more question. How would you be able to specify a stop point. Say you wanted to stop at DataY and skip over END and pick back up at begining? |
 |
|
|
Vinnie881
Master Smack Fu Yak Hacker
1231 Posts |
Posted - 2011-12-29 : 13:00:04
|
If it's a large record base, check the stats and execution time, the query I provided should well out perform others.As for your question, I am not sure if I am following you, here is a modified version of the query with a "STOP POINT":declare @tmp table(id int primary key,activity varchar(20),groupid int)insert into @tmp(id,activity)select 1,'begin' union allselect 2,'datax' union allselect 3, 'datay' union allselect 4, 'end' union allselect 5, 'begin' union allselect 6, 'end' union allselect 7, 'begin' union allselect 8, 'datay' union allselect 9, 'end'declare @groupid int,@anchor intset @groupid = 0update aset @groupid = groupid = case when activity = 'begin' then id else @groupid end,@anchor = idfrom @tmp awhere id <=6select * from @tmp/*OUTPUTid activity groupid1 begin 12 datax 13 datay 14 end 15 begin 56 end 57 begin null8 datay null9 end null*/ Please explain more what you mean by a stop point if this is not it. Success is 10% Intelligence, 70% Determination, and 22% Stupidity.\_/ _/ _/\_/ _/\_/ _/ _/- 881 |
 |
|
|
lazerath
Constraint Violating Yak Guru
343 Posts |
Posted - 2012-01-03 : 13:22:37
|
quote: Originally posted by Vinnie881 If it's a large record base, check the stats and execution time, the query I provided should well out perform others.
You'd be surprised. My testing shows the CROSS APPLY solution far outpaces the update method on my test system. YMMV, but here's the code so you can run it yourself:GOIF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Activity]') AND type in (N'U'))DROP TABLE [dbo].[Activity]GOSET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGOSET ANSI_PADDING ONGOCREATE TABLE [dbo].[Activity]( Row [int] NOT NULL, [Activity] [varchar](5) NOT NULL, CONSTRAINT [PK_Activity] PRIMARY KEY CLUSTERED ( Row ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) )GOSET ANSI_PADDING OFFGOWITH NumbersAS(SELECT spt.numberFROM master..spt_valuesAS sptWHERE spt.[type]= 'P')-- initialize sample data, cteYourTableExample(Row, Activity, yourGroupID)AS(select 1 ,'begin', 1 union allselect 2 ,'datax', 1 union allselect 3 ,'datay', 1 union allselect 4 ,'end', 1 union allselect 5 ,'begin', 5 union allselect 6 ,'end', 5 union allselect 7 ,'begin', 7 union allselect 8 ,'datay', 7 union allselect 9 ,'end', 7)insert into dbo.Activityselect number * 9 + ROW as Row, Activityfrom cteYourTableExamplecross join Numbersorder by number, ROW declare @idx int = 6;While (@idx > 0)begin insert dbo.activity select Row + (select MAX(Row) from dbo.activity), activity from dbo.activity set @idx += -1;endgo SET QUOTED_IDENTIFIER ONGOSET ANSI_PADDING ONGO CREATE NONCLUSTERED INDEX idx_activity_1ON dbo.activity (Row DESC)WHERE Activity = 'begin'WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)GOIF OBJECT_ID('tempdb..#t') is not null drop table #t;GODECLARE @dt datetime = getdate();-- CROSS APPLY testselect *into #tfrom dbo.Activity ccross apply (-----------------select top 1 d.Row as calcGroupIDfrom dbo.Activity d where d.Row <= c.Row and d.Activity ='begin' order by d.Row desc----------------- ) z;select 'cross apply' as method, DATEDIFF(ms,@dt,getdate()) as duration;GOIF OBJECT_ID('tempdb..#t') is not null drop table #t;GODECLARE @dt datetime = getdate();/* -- Resultsmethod durationcross apply 3213--*/declare @tmp table(id int primary key, activity varchar(20), groupid int)insert @tmp (id,activity) select ROW, Activity from dbo.Activitydeclare @groupid int = 0, @anchor int;update Aset @groupid = groupid = case when activity = 'begin' then id else @groupid end,@anchor = idfrom @tmp a;select * into #tfrom @tmpselect '@tmp + update' as method, DATEDIFF(ms,@dt,getdate()) as duration;/* -- Resultsmethod duration@tmp + update 12190--*/GOIF OBJECT_ID('tempdb..#t') is not null drop table #t;Feel free to adjust the loop variable @idx to control the number of rows. In my testing, the cross apply out performed the tmp + update method in every scenario. This would likely change in favor of the update method if you skip the table variable @tmp and update a column on the original table.However, my understanding is that the UPDATE method works due to the assumption that SQL processes the data in order of the clustered index. I am not sure that assumption always holds true. For instance, would this behavior work on a partitioned table? Does it work when the filegroup used by the table has multiple files spread across different volumes? I have not researched this point, so if anyone knows for sure one way or another please chime in and share your knowledge. |
 |
|
|
|
|
|
|
|