Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
jamezw1
Starting Member
5 Posts |
Posted - 2007-07-03 : 17:46:44
|
I need to roll-up rows (approx. 20 million) in a table, combining rows with continuous dates for the same ID. Ex.ID Strt_dt End_Dt1 1/1/2007 1/31/071 2/1/2007 2/28/20071 3/6/2007 3/30/20071 4/1/2007 4/30/20071 5/1/2007 5/31/20072 1/1/2007 1/31/072 2/1/2007 2/28/20072 3/1/2007 3/20/20072 4/1/2007 4/30/20072 5/1/2007 5/31/2007Needs to be reduced to:ID Strt_dt End_Dt1 1/1/2007 2/28/20071 3/6/2007 5/31/20072 1/1/2007 3/20/20072 4/1/2007 5/31/2007Any help would be GREATLY appreciated. |
|
sathiya
Starting Member
4 Posts |
Posted - 2007-07-04 : 05:57:15
|
select id,min(stdate), max(enddate) from @samplegroup by idPl check whether this query is useful for u |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-07-04 : 06:23:54
|
Jamezw1, your expected output is wrong because there are 31 days in march.March 30 and April 1 is not continuous. This is the output based on the sample data aboveid startdt enddt1 2007-01-01 2007-02-281 2007-03-06 2007-03-301 2007-04-01 2007-05-312 2007-01-01 2007-03-202 2007-04-01 2007-05-31 Sathiya, your suggestion does not work. have you tried it?This is my suggestion. Try it and please report back the time it takes to run on 20 million records.create table #sample (id int, startdt datetime, enddt datetime)insert #sampleselect 1, '1/1/2007', '1/31/07' union allselect 1, '2/1/2007', '2/28/2007' union allselect 1, '3/6/2007', '3/30/2007' union allselect 1, '4/1/2007', '4/30/2007' union allselect 1, '5/1/2007', '5/31/2007' union allselect 2, '1/1/2007', '1/31/07' union allselect 2, '2/1/2007', '2/28/2007' union allselect 2, '3/1/2007', '3/20/2007' union allselect 2, '4/1/2007', '4/30/2007' union allselect 2, '5/1/2007', '5/31/2007'alter table #sample add grp intgocreate clustered index ix_sample on #sample (id, startdt)godeclare @last datetime, @grp int, @id intupdate #sampleset @grp = grp = case when startdt <= @last + 1 and id = @id then @grp else isnull(@grp, 0) + 1 end, @id = id, @last = enddtselect min(id) AS id, min(startdt) AS startdt, max(enddt) as enddtfrom #samplegroup by grporder by grpalter table #sample drop column grp Peter LarssonHelsingborg, Sweden |
|
|
jamezw1
Starting Member
5 Posts |
Posted - 2007-07-05 : 14:28:18
|
Peso that code is so cool!It ran in about 6 minutes.Thank you!!!Du e ju hel kool kille.ja:vligt schysst. Tackar sa. mycket!!!(My friend here is from Sweden) |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-07-05 : 15:07:31
|
I'll bet that creating the clustered index takes at least half the time...Tackar!Peter LarssonHelsingborg, Sweden |
|
|
jamezw1
Starting Member
5 Posts |
Posted - 2007-07-05 : 18:50:59
|
I do have one question about the code though. What is the purpose of the Min(id)?Peso, thanks again! |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-07-05 : 19:05:05
|
It needs to be an aggregate function because i am grouping by column grp, not the id.MIN could just as easy be MAX or even AVG.Peter LarssonHelsingborg, Sweden |
|
|
jamezw1
Starting Member
5 Posts |
Posted - 2007-07-09 : 19:07:27
|
Ok, now that I got it working with all of the other parameters, I was informed that we cannot use update statements because they fill up the transaction log. Soooo, any chance you know how to work around the transaction log or do you know how to do this without using an update? |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-07-10 : 02:21:59
|
You could try a CURSOR.Peter LarssonHelsingborg, Sweden |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-07-10 : 02:41:59
|
Try this--create sample datacreate table #sample (id int, startdt datetime, enddt datetime)insert #sampleselect 1, '1/1/2007', '1/31/07' union allselect 1, '2/1/2007', '2/28/2007' union allselect 1, '3/6/2007', '3/30/2007' union allselect 1, '4/1/2007', '4/30/2007' union allselect 1, '5/1/2007', '5/31/2007' union allselect 2, '1/1/2007', '1/31/07' union allselect 2, '2/1/2007', '2/28/2007' union allselect 2, '3/1/2007', '3/20/2007' union allselect 2, '4/1/2007', '4/30/2007' union allselect 2, '5/1/2007', '5/31/2007'-- create staging areacreate table #stage (id int, startdt datetime, enddt datetime)-- do the workdeclare @lastid int, @newid int, @laststart datetime, @newstart datetime, @lastend datetime, @newend datetimedeclare cur cursor for select id, startdt, enddt from #sample order by id, startdtOPEN curFETCH NEXT FROM cur INTO @lastid, @laststart, @lastendwhile @@fetch_status = 0 begin FETCH NEXT FROM cur INTO @newid, @newstart, @newend if @newid <> @lastid or @lastend < @newstart - 1 begin insert #stage values (@lastid, @laststart, @lastend) select @laststart = @newstart, @lastend = @newend, @lastid = @newid end else select @lastend = @newend, @lastid = @newid endinsert #stagevalues (@lastid, @laststart, @lastend) CLOSE curDEALLOCATE curselect *from #stagedrop table #stage Peter LarssonHelsingborg, Sweden |
|
|
jamezw1
Starting Member
5 Posts |
Posted - 2007-07-12 : 18:37:02
|
Ended up doing this in about 5 steps, turned out we can't use cursors either. Either way, thanks again Peso! |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-07-13 : 03:28:21
|
Can't use cursor? Why?Peter LarssonHelsingborg, Sweden |
|
|
johnsql
Posting Yak Master
161 Posts |
Posted - 2007-10-22 : 10:05:47
|
quote: Originally posted by Peso Jamezw1, your expected output is wrong because there are 31 days in march.March 30 and April 1 is not continuous. This is the output based on the sample data aboveid startdt enddt1 2007-01-01 2007-02-281 2007-03-06 2007-03-301 2007-04-01 2007-05-312 2007-01-01 2007-03-202 2007-04-01 2007-05-31 Sathiya, your suggestion does not work. have you tried it?This is my suggestion. Try it and please report back the time it takes to run on 20 million records.create table #sample (id int, startdt datetime, enddt datetime)insert #sampleselect 1, '1/1/2007', '1/31/07' union allselect 1, '2/1/2007', '2/28/2007' union allselect 1, '3/6/2007', '3/30/2007' union allselect 1, '4/1/2007', '4/30/2007' union allselect 1, '5/1/2007', '5/31/2007' union allselect 2, '1/1/2007', '1/31/07' union allselect 2, '2/1/2007', '2/28/2007' union allselect 2, '3/1/2007', '3/20/2007' union allselect 2, '4/1/2007', '4/30/2007' union allselect 2, '5/1/2007', '5/31/2007'alter table #sample add grp intgocreate clustered index ix_sample on #sample (id, startdt)godeclare @last datetime, @grp int, @id intupdate #sampleset @grp = grp = case when startdt <= @last + 1 and id = @id then @grp else isnull(@grp, 0) + 1 end, @id = id, @last = enddtselect min(id) AS id, min(startdt) AS startdt, max(enddt) as enddtfrom #samplegroup by grporder by grpalter table #sample drop column grp Peter LarssonHelsingborg, Sweden
I have a question here. Why do you make clustered index for tabel #sample over only 2 columns "id", "startdt" but NOT all columns of that table?Thanks.johnsql |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-10-22 : 10:15:03
|
There is no need to make a bigger index than necessary. E 12°55'05.25"N 56°04'39.16" |
|
|
johnsql
Posting Yak Master
161 Posts |
Posted - 2007-10-22 : 11:58:14
|
quote: Originally posted by Peso There is no need to make a bigger index than necessary. E 12°55'05.25"N 56°04'39.16"
Thanks for your reply. Can you tell me what is the purpose of using such index on the temp table? performance and other things else? |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-10-23 : 01:43:42
|
It's just a trick to ensure the hack "set @grp = grp = case ... else ... end" to function properly. E 12°55'05.25"N 56°04'39.16" |
|
|
TomasDelSol
Starting Member
1 Post |
Posted - 2008-03-26 : 10:55:20
|
I've been struggling with this problem and this is the only place on the internet that it's been addressed. However, is there a way to do this using pure SQL and the analytic functions? |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-03-26 : 11:08:27
|
Yes, you can use correlated subqueries to accomplish this.As always it depends on the amount of data because you in reality do a "triangular join" ( n*(n+1)/2 ) lookups. E 12°55'05.25"N 56°04'39.16" |
|
|
|
|
|
|
|