Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Combine(roll-up) rows

Author  Topic 

jamezw1
Starting Member

5 Posts

Posted - 2007-07-03 : 17:46:44
I need to roll-up rows (approx. 20 million) in a table, combining rows with continuous dates for the same ID. Ex.

ID Strt_dt End_Dt
1 1/1/2007 1/31/07
1 2/1/2007 2/28/2007
1 3/6/2007 3/30/2007
1 4/1/2007 4/30/2007
1 5/1/2007 5/31/2007
2 1/1/2007 1/31/07
2 2/1/2007 2/28/2007
2 3/1/2007 3/20/2007
2 4/1/2007 4/30/2007
2 5/1/2007 5/31/2007

Needs to be reduced to:
ID Strt_dt End_Dt
1 1/1/2007 2/28/2007
1 3/6/2007 5/31/2007
2 1/1/2007 3/20/2007
2 4/1/2007 5/31/2007


Any help would be GREATLY appreciated.

sathiya
Starting Member

4 Posts

Posted - 2007-07-04 : 05:57:15
select id,min(stdate), max(enddate) from @sample
group by id

Pl check whether this query is useful for u
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-07-04 : 06:23:54
Jamezw1, your expected output is wrong because there are 31 days in march.
March 30 and April 1 is not continuous. This is the output based on the sample data above
id	startdt		enddt
1 2007-01-01 2007-02-28
1 2007-03-06 2007-03-30
1 2007-04-01 2007-05-31
2 2007-01-01 2007-03-20
2 2007-04-01 2007-05-31

Sathiya, your suggestion does not work. have you tried it?

This is my suggestion. Try it and please report back the time it takes to run on 20 million records.
create table #sample (id int, startdt datetime, enddt datetime)

insert #sample
select 1, '1/1/2007', '1/31/07' union all
select 1, '2/1/2007', '2/28/2007' union all
select 1, '3/6/2007', '3/30/2007' union all
select 1, '4/1/2007', '4/30/2007' union all
select 1, '5/1/2007', '5/31/2007' union all
select 2, '1/1/2007', '1/31/07' union all
select 2, '2/1/2007', '2/28/2007' union all
select 2, '3/1/2007', '3/20/2007' union all
select 2, '4/1/2007', '4/30/2007' union all
select 2, '5/1/2007', '5/31/2007'

alter table #sample add grp int
go
create clustered index ix_sample on #sample (id, startdt)
go

declare @last datetime,
@grp int,
@id int

update #sample
set @grp = grp = case when startdt <= @last + 1 and id = @id then @grp else isnull(@grp, 0) + 1 end,
@id = id,
@last = enddt

select min(id) AS id,
min(startdt) AS startdt,
max(enddt) as enddt
from #sample
group by grp
order by grp

alter table #sample drop column grp

Peter Larsson
Helsingborg, Sweden
Go to Top of Page

jamezw1
Starting Member

5 Posts

Posted - 2007-07-05 : 14:28:18
Peso that code is so cool!

It ran in about 6 minutes.

Thank you!!!



Du e ju hel kool kille.
ja:vligt schysst. Tackar sa. mycket!!!
(My friend here is from Sweden)
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-07-05 : 15:07:31
I'll bet that creating the clustered index takes at least half the time...

Tackar!


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

jamezw1
Starting Member

5 Posts

Posted - 2007-07-05 : 18:50:59
I do have one question about the code though. What is the purpose of the Min(id)?

Peso, thanks again!
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-07-05 : 19:05:05
It needs to be an aggregate function because i am grouping by column grp, not the id.
MIN could just as easy be MAX or even AVG.


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

jamezw1
Starting Member

5 Posts

Posted - 2007-07-09 : 19:07:27
Ok, now that I got it working with all of the other parameters, I was informed that we cannot use update statements because they fill up the transaction log. Soooo, any chance you know how to work around the transaction log or do you know how to do this without using an update?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-07-10 : 02:21:59
You could try a CURSOR.


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-07-10 : 02:41:59
Try this
--create sample data
create table #sample (id int, startdt datetime, enddt datetime)

insert #sample
select 1, '1/1/2007', '1/31/07' union all
select 1, '2/1/2007', '2/28/2007' union all
select 1, '3/6/2007', '3/30/2007' union all
select 1, '4/1/2007', '4/30/2007' union all
select 1, '5/1/2007', '5/31/2007' union all
select 2, '1/1/2007', '1/31/07' union all
select 2, '2/1/2007', '2/28/2007' union all
select 2, '3/1/2007', '3/20/2007' union all
select 2, '4/1/2007', '4/30/2007' union all
select 2, '5/1/2007', '5/31/2007'

-- create staging area
create table #stage (id int, startdt datetime, enddt datetime)

-- do the work
declare @lastid int,
@newid int,
@laststart datetime,
@newstart datetime,
@lastend datetime,
@newend datetime

declare cur cursor for select id, startdt, enddt from #sample order by id, startdt

OPEN cur

FETCH NEXT FROM cur
INTO @lastid, @laststart, @lastend

while @@fetch_status = 0
begin
FETCH NEXT FROM cur
INTO @newid, @newstart, @newend

if @newid <> @lastid or @lastend < @newstart - 1
begin
insert #stage
values (@lastid, @laststart, @lastend)

select @laststart = @newstart,
@lastend = @newend,
@lastid = @newid
end
else
select @lastend = @newend,
@lastid = @newid
end

insert #stage
values (@lastid, @laststart, @lastend)

CLOSE cur
DEALLOCATE cur

select *
from #stage

drop table #stage


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

jamezw1
Starting Member

5 Posts

Posted - 2007-07-12 : 18:37:02
Ended up doing this in about 5 steps, turned out we can't use cursors either.

Either way, thanks again Peso!
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-07-13 : 03:28:21
Can't use cursor? Why?


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

johnsql
Posting Yak Master

161 Posts

Posted - 2007-10-22 : 10:05:47
quote:
Originally posted by Peso

Jamezw1, your expected output is wrong because there are 31 days in march.
March 30 and April 1 is not continuous. This is the output based on the sample data above
id	startdt		enddt
1 2007-01-01 2007-02-28
1 2007-03-06 2007-03-30
1 2007-04-01 2007-05-31
2 2007-01-01 2007-03-20
2 2007-04-01 2007-05-31

Sathiya, your suggestion does not work. have you tried it?

This is my suggestion. Try it and please report back the time it takes to run on 20 million records.
create table #sample (id int, startdt datetime, enddt datetime)

insert #sample
select 1, '1/1/2007', '1/31/07' union all
select 1, '2/1/2007', '2/28/2007' union all
select 1, '3/6/2007', '3/30/2007' union all
select 1, '4/1/2007', '4/30/2007' union all
select 1, '5/1/2007', '5/31/2007' union all
select 2, '1/1/2007', '1/31/07' union all
select 2, '2/1/2007', '2/28/2007' union all
select 2, '3/1/2007', '3/20/2007' union all
select 2, '4/1/2007', '4/30/2007' union all
select 2, '5/1/2007', '5/31/2007'

alter table #sample add grp int
go
create clustered index ix_sample on #sample (id, startdt)
go

declare @last datetime,
@grp int,
@id int

update #sample
set @grp = grp = case when startdt <= @last + 1 and id = @id then @grp else isnull(@grp, 0) + 1 end,
@id = id,
@last = enddt

select min(id) AS id,
min(startdt) AS startdt,
max(enddt) as enddt
from #sample
group by grp
order by grp

alter table #sample drop column grp

Peter Larsson
Helsingborg, Sweden



I have a question here. Why do you make clustered index for tabel #sample over only 2 columns "id", "startdt" but NOT all columns of that table?

Thanks.
johnsql

Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-10-22 : 10:15:03
There is no need to make a bigger index than necessary.



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

johnsql
Posting Yak Master

161 Posts

Posted - 2007-10-22 : 11:58:14
quote:
Originally posted by Peso

There is no need to make a bigger index than necessary.



E 12°55'05.25"
N 56°04'39.16"




Thanks for your reply. Can you tell me what is the purpose of using such index on the temp table? performance and other things else?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-10-23 : 01:43:42
It's just a trick to ensure the hack "set @grp = grp = case ... else ... end" to function properly.



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

TomasDelSol
Starting Member

1 Post

Posted - 2008-03-26 : 10:55:20
I've been struggling with this problem and this is the only place on the internet that it's been addressed.

However, is there a way to do this using pure SQL and the analytic functions?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-03-26 : 11:08:27
Yes, you can use correlated subqueries to accomplish this.
As always it depends on the amount of data because you in reality do a "triangular join" ( n*(n+1)/2 ) lookups.


E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page
   

- Advertisement -