Combine(roll-up) rows - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Combine(roll-up) rows

Author

Topic

jamezw1
Starting Member

5 Posts

Posted - 2007-07-03 : 17:46:44

I need to roll-up rows (approx. 20 million) in a table, combining rows with continuous dates for the same ID. Ex.

ID Strt_dt End_Dt
1 1/1/2007 1/31/07
1 2/1/2007 2/28/2007
1 3/6/2007 3/30/2007
1 4/1/2007 4/30/2007
1 5/1/2007 5/31/2007
2 1/1/2007 1/31/07
2 2/1/2007 2/28/2007
2 3/1/2007 3/20/2007
2 4/1/2007 4/30/2007
2 5/1/2007 5/31/2007

Needs to be reduced to:
ID Strt_dt End_Dt
1 1/1/2007 2/28/2007
1 3/6/2007 5/31/2007
2 1/1/2007 3/20/2007
2 4/1/2007 5/31/2007

Any help would be GREATLY appreciated.

sathiya
Starting Member

4 Posts

Posted - 2007-07-04 : 05:57:15

select id,min(stdate), max(enddate) from @sample
group by id

Pl check whether this query is useful for u

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-07-04 : 06:23:54

Jamezw1, your expected output is wrong because there are 31 days in march.
March 30 and April 1 is not continuous. This is the output based on the sample data above

id	startdt		enddt
1	2007-01-01	2007-02-28
1	2007-03-06	2007-03-30
1	2007-04-01	2007-05-31
2	2007-01-01	2007-03-20
2	2007-04-01	2007-05-31

Sathiya, your suggestion does not work. have you tried it?

This is my suggestion. Try it and please report back the time it takes to run on 20 million records.

create table #sample (id int, startdt datetime, enddt datetime)

insert	#sample
select	1, '1/1/2007', '1/31/07' union all
select	1, '2/1/2007', '2/28/2007' union all
select	1, '3/6/2007', '3/30/2007' union all
select	1, '4/1/2007', '4/30/2007' union all
select	1, '5/1/2007', '5/31/2007' union all
select	2, '1/1/2007', '1/31/07' union all
select	2, '2/1/2007', '2/28/2007' union all
select	2, '3/1/2007', '3/20/2007' union all
select	2, '4/1/2007', '4/30/2007' union all
select	2, '5/1/2007', '5/31/2007'

alter table #sample add grp int
go
create clustered index ix_sample on #sample (id, startdt)
go

declare @last datetime,
	@grp int,
	@id int

update	#sample
set	@grp = grp = case when startdt <= @last + 1 and id = @id then @grp else isnull(@grp, 0) + 1 end,
	@id = id,
	@last = enddt

select		min(id) AS id,
		min(startdt) AS startdt,
		max(enddt) as enddt
from		#sample
group by	grp
order by	grp

alter table #sample drop column grp

Peter Larsson
Helsingborg, Sweden

jamezw1
Starting Member

5 Posts

Posted - 2007-07-05 : 14:28:18

Peso that code is so cool!

It ran in about 6 minutes.

Thank you!!!

Du e ju hel kool kille.
ja:vligt schysst. Tackar sa. mycket!!!
(My friend here is from Sweden)

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-07-05 : 15:07:31

I'll bet that creating the clustered index takes at least half the time...

Tackar!

Peter Larsson
Helsingborg, Sweden

jamezw1
Starting Member

5 Posts

Posted - 2007-07-05 : 18:50:59

I do have one question about the code though. What is the purpose of the Min(id)?

Peso, thanks again!

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-07-05 : 19:05:05

It needs to be an aggregate function because i am grouping by column grp, not the id.
MIN could just as easy be MAX or even AVG.

Peter Larsson
Helsingborg, Sweden

jamezw1
Starting Member

5 Posts

Posted - 2007-07-09 : 19:07:27

Ok, now that I got it working with all of the other parameters, I was informed that we cannot use update statements because they fill up the transaction log. Soooo, any chance you know how to work around the transaction log or do you know how to do this without using an update?

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-07-10 : 02:21:59

You could try a CURSOR.

Peter Larsson
Helsingborg, Sweden

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-07-10 : 02:41:59

Try this

--create sample data
create table #sample (id int, startdt datetime, enddt datetime)

insert	#sample
select	1, '1/1/2007', '1/31/07' union all
select	1, '2/1/2007', '2/28/2007' union all
select	1, '3/6/2007', '3/30/2007' union all
select	1, '4/1/2007', '4/30/2007' union all
select	1, '5/1/2007', '5/31/2007' union all
select	2, '1/1/2007', '1/31/07' union all
select	2, '2/1/2007', '2/28/2007' union all
select	2, '3/1/2007', '3/20/2007' union all
select	2, '4/1/2007', '4/30/2007' union all
select	2, '5/1/2007', '5/31/2007'

-- create staging area
create table #stage (id int, startdt datetime, enddt datetime)

-- do the work
declare @lastid int,
	@newid int,
	@laststart datetime,
	@newstart datetime,
	@lastend datetime,
	@newend datetime

declare cur cursor for select id, startdt, enddt from #sample order by id, startdt

OPEN cur

FETCH NEXT FROM cur 
INTO @lastid, @laststart, @lastend

while @@fetch_status = 0
	begin
		FETCH NEXT FROM cur 
		INTO @newid, @newstart, @newend
		
		if @newid <> @lastid or @lastend < @newstart - 1
			begin
				insert	#stage
				values (@lastid, @laststart, @lastend)
				
				select	@laststart = @newstart,
					@lastend = @newend,
					@lastid = @newid
			end
		else
			select	@lastend = @newend,
				@lastid = @newid
	end

insert	#stage
values (@lastid, @laststart, @lastend)
		
CLOSE cur
DEALLOCATE cur

select	*
from	#stage

drop table #stage

Peter Larsson
Helsingborg, Sweden

jamezw1
Starting Member

5 Posts

Posted - 2007-07-12 : 18:37:02

Ended up doing this in about 5 steps, turned out we can't use cursors either.

Either way, thanks again Peso!

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-07-13 : 03:28:21

Can't use cursor? Why?

Peter Larsson
Helsingborg, Sweden

johnsql
Posting Yak Master

161 Posts

Posted - 2007-10-22 : 10:05:47

quote:

Originally posted by Peso

Jamezw1, your expected output is wrong because there are 31 days in march.
March 30 and April 1 is not continuous. This is the output based on the sample data above

id	startdt		enddt
1	2007-01-01	2007-02-28
1	2007-03-06	2007-03-30
1	2007-04-01	2007-05-31
2	2007-01-01	2007-03-20
2	2007-04-01	2007-05-31

Sathiya, your suggestion does not work. have you tried it?

This is my suggestion. Try it and please report back the time it takes to run on 20 million records.

create table #sample (id int, startdt datetime, enddt datetime)

insert	#sample
select	1, '1/1/2007', '1/31/07' union all
select	1, '2/1/2007', '2/28/2007' union all
select	1, '3/6/2007', '3/30/2007' union all
select	1, '4/1/2007', '4/30/2007' union all
select	1, '5/1/2007', '5/31/2007' union all
select	2, '1/1/2007', '1/31/07' union all
select	2, '2/1/2007', '2/28/2007' union all
select	2, '3/1/2007', '3/20/2007' union all
select	2, '4/1/2007', '4/30/2007' union all
select	2, '5/1/2007', '5/31/2007'

alter table #sample add grp int
go
create clustered index ix_sample on #sample (id, startdt)
go

declare @last datetime,
	@grp int,
	@id int

update	#sample
set	@grp = grp = case when startdt <= @last + 1 and id = @id then @grp else isnull(@grp, 0) + 1 end,
	@id = id,
	@last = enddt

select		min(id) AS id,
		min(startdt) AS startdt,
		max(enddt) as enddt
from		#sample
group by	grp
order by	grp

alter table #sample drop column grp

Peter Larsson
Helsingborg, Sweden

I have a question here. Why do you make clustered index for tabel #sample over only 2 columns "id", "startdt" but NOT all columns of that table?

Thanks.
johnsql

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-10-22 : 10:15:03

There is no need to make a bigger index than necessary.

E 12°55'05.25"
N 56°04'39.16"

johnsql
Posting Yak Master

161 Posts

Posted - 2007-10-22 : 11:58:14

quote:
Originally posted by Peso

There is no need to make a bigger index than necessary.

E 12°55'05.25"
N 56°04'39.16"

Thanks for your reply. Can you tell me what is the purpose of using such index on the temp table? performance and other things else?

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-10-23 : 01:43:42

It's just a trick to ensure the hack "set @grp = grp = case ... else ... end" to function properly.

E 12°55'05.25"
N 56°04'39.16"

TomasDelSol
Starting Member

1 Post

Posted - 2008-03-26 : 10:55:20

I've been struggling with this problem and this is the only place on the internet that it's been addressed.

However, is there a way to do this using pure SQL and the analytic functions?

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-03-26 : 11:08:27

Yes, you can use correlated subqueries to accomplish this.
As always it depends on the amount of data because you in reality do a "triangular join" ( n*(n+1)/2 ) lookups.

E 12°55'05.25"
N 56°04'39.16"

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources