Subtract a row from next row - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Subtract a row from next row

Author

Topic

dbonneau
Yak Posting Veteran

50 Posts

Posted - 2009-04-14 : 00:29:35

Hi,
I would like to indicate on Flag field so I can group it by "A" and "B".
Please look at my sample dataset below.

Price Diff Flag
841.75
841.75 0
842 0.25 A
842 0 A
842 0 A
842 0 A
842 0 A
842 0 A
842 0 A
842 0 A
842 0 A
842.25 0.25 A
842 -0.25 B
842 0 B
842 0 B
842 0 B
842 0 B
842.25 0.25 A
842 -0.25 B
842 0 B
842 0 B
842 0 B
842 0 B
842 0 B
842 0 B
842 0 B

First, I would like to take the difference between a Row and next Row from Price Field, and output the values on Diff field.
And then indicate it as "A" or "B" on Flag field.

For example,take (row2-row1),(row3-row2),(row4-row3),(row5-row4), and so forth from price field, and indicate it as "A" or "B" on Flag field.

The following are for Flag condition
If Diff is positive value, Flag it as "A".
OR If current Diff is 0 and previous non-zero value is positive, indicate it as "A".

If Diff is negative value, Flag it as "B".
OR If current Diff value is 0 and previous non-zero value is negative, indicate it as "B".

I didn't indicate "A" or "B on the second row on Flag field becuase I don't have any non-zero value above to use to indicate for second row.

Thank you so much !

sakets_2000
Master Smack Fu Yak Hacker

1472 Posts

Posted - 2009-04-14 : 12:31:05

how do you find the next row ? Is there a datetime or identity field for this ?

dbonneau
Yak Posting Veteran

50 Posts

Posted - 2009-04-14 : 12:39:58

quote:
Originally posted by sakets_2000

how do you find the next row ? Is there a datetime or identity field for this ?

Do I need identity field to find the next row ?
If that is a case, can we add numbers field to identify it ?

Thanks!

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2009-04-14 : 12:45:16

It doesn't have to be an identity but you do need some indication in the table data to define the sequence of the rows. It could be a date column. If you just add an identity column now there is not guarantee that the values (generated for existing data) will correspond to the sequence they were added to the table.

Be One with the Optimizer
TG

dbonneau
Yak Posting Veteran

50 Posts

Posted - 2009-04-14 : 13:33:03

quote:
Originally posted by TG

It doesn't have to be an identity but you do need some indication in the table data to define the sequence of the rows. It could be a date column. If you just add an identity column now there is not guarantee that the values (generated for existing data) will correspond to the sequence they were added to the table.

Be One with the Optimizer
TG

Date Time Price
4/9/2009 8:00 841.75
4/9/2009 8:00 841.75
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842.25
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842.25
This is orignial dataset. But The date and Time are not unique values since date and time just represent the price change in that interval.
What if I use row_number() to assign number ?

select row_number () over (order by date, time) as sqnum

Thanks!

IceCreamWizard
Starting Member

11 Posts

Posted - 2009-04-14 : 13:49:42

Admittedly, here's a non-elegant solution and it uses assumptions of the data (as pointed out by previous posters).

My solution assumes a sequential column exists in the order the data is inserted. My guess is that data is always inserted and never deleted (sort of like S&P 500 data :). If so, a trigger can be used to ensure this assumption (see below***).

Also, performance of this solution is, shall we say, not necessarily good. It requires on the order of n + n^2 passes, at worst. If you have tens of thousands of rows, it may take some time. I'm sure someone who reads this can take my initial attempt here and improve it.


-- Script for the table my solutions runs on.
-- The ID column must be sequential in the order inserted
CREATE TABLE [dbo].[FlagPlusMinus_v2] (
	[ID] int IDENTITY(1, 1) NOT NULL,
	[Price] float NULL,
	[Diff] float NULL,
	[Flag] nchar(1) NULL
)

-- Reset data for testing
update Sandbox.dbo.FlagPlusMinus_v2
set Diff = null, Flag = null
where ID > 1
-- Assign the first row with defaults
update Sandbox.dbo.FlagPlusMinus_v2
set Diff = 0.0, Flag = '-'
where ID = 1

-- ********* Solutions begins **********
-- This first pass updates the Diff column. It also updates
-- the Flag column where Diff > 0 or Diff < 0.
update Sandbox.dbo.FlagPlusMinus_v2
 set Diff = results.rDiff,
 	Flag = results.rFlag
from
	(select
		t1.ID rID,
		(t1.Price - t2.Price) 'rDiff',
		rFlag = case
			when (t1.Price - t2.Price) > 0 then	'A'
			when (t1.Price - t2.Price) < 0 then	'B'
			else null
		end
	from
		FlagPlusMinus_v2 t1,
		FlagPlusMinus_v2 t2
	where
		t1.ID = (t2.ID + 1)
	) results
where
	ID = results.rID

-- This is where it gets ugly.
-- Loops through data until all Flag values assigned.
-- What happens is that the row's Flag column, following a row where its
--    Flag is not null, is updated with the previous Flag value.
while (select count(ID) from FlagPlusMinus_v2 where Flag is null) > 0 begin
	update Sandbox.dbo.FlagPlusMinus_v2
	set Flag = results.rFlag
	from
		(select
			t1.ID rId,
			rFlag = t2.Flag
		from
			FlagPlusMinus_v2 t1,
			FlagPlusMinus_v2 t2
		where
			t1.ID = (t2.ID + 1)
			and t1.Diff = 0.0
			and t1.Flag is null
		) results
	where
		ID = results.rID
		and Flag is null
end

To understand this, step through the inside of the loop several times and see how the data is updated. If there are
few instances of consecutive "Diff = 0", then this loop will perform moderately (and hopefully, acceptably) well.

I hope this helps or at least puts on the right path (no guarantees :).
Good Luck,
Matt

*** If using a newly created table, I would think that an Identity column would be sufficient. Otherwise, use a separate table with sequence number and a trigger to update a sequential column would be needed.

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2009-04-14 : 13:54:55

quote:
Originally posted by dbonneau
What if I use row_number() to assign number ?

select row_number () over (order by date, time) as sqnum

Thanks!

Then you still don't have guaranteed consistency from one call to the next as to the sequence since your over clause does not contain unique values. Do your rows get inserted one by one or several at a time? if one by one then identity column may be the solution - at least going forward. You will still have the problem if of properly sequencing the existing data.

Be One with the Optimizer
TG

dbonneau
Yak Posting Veteran

50 Posts

Posted - 2009-04-14 : 14:31:10

quote:
Originally posted by TG

quote:
Originally posted by dbonneau
What if I use row_number() to assign number ?

select row_number () over (order by date, time) as sqnum

Thanks!

Do your rows get inserted one by one or several at a time? if of properly sequencing the existing data.

I use bulk insert function to load data into a table from txt file.

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2009-04-14 : 14:53:24

So you know that the txt file is always in the physical order you want?
And there is nothing in the source data to reflect that sequence?

I'm not sure if bulk insert will always load the data in the same order as the source file. I suspect that it inserts in batches which would not guarantee the order - Anyone else know if that is the case?

bottom line is that unless you can come up with a way to have a unique column or combination of columns there is no way to know the actual sequence for these rows:

4/9/2009 8:00 841.75
4/9/2009 8:00 842
4/9/2009 8:00 842.25
4/9/2009 8:00 842
4/9/2009 8:00 842.25

Be One with the Optimizer
TG

sakets_2000
Master Smack Fu Yak Hacker

1472 Posts

Posted - 2009-04-14 : 15:11:05

Yes. Your best bet is having that identity field come from the source. To TG's point, I think bulk insert will keep the order, but I haven't seen this documented anywhere.

sakets_2000
Master Smack Fu Yak Hacker

1472 Posts

Posted - 2009-04-14 : 15:15:18

dbonneau, I remember seeing in your original post an identity column. You later edited it and removed that column. It had values like a1,a2,a3...
Do you have that??

dbonneau
Yak Posting Veteran

50 Posts

Posted - 2009-04-14 : 15:42:19

quote:
Originally posted by sakets_2000

dbonneau, I remember seeing in your original post an identity column. You later edited it and removed that column. It had values like a1,a2,a3...
Do you have that??

Yes, I had that. I am sorry to confuse you. I just added that field so people can understand it better (a2 -a1), (a3-a2)...

The real dataset looks like this.

Date Time Price
4/9/2009 8:00 841.75
4/9/2009 8:00 841.75
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842
4/9/2009 8:00 842

Thanks !

dbonneau
Yak Posting Veteran

50 Posts

Posted - 2009-04-14 : 15:44:05

quote:
Originally posted by sakets_2000

Yes. Your best bet is having that identity field come from the source. To TG's point, I think bulk insert will keep the order, but I haven't seen this documented anywhere.

Then Why can't I just add sequence number using Row_Number ()?

Thanks !

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2009-04-14 : 15:49:15

Because you are not guaranteed to get consistent results one call to the next and you definitely aren't guaranteed to be in the same order as the source file. According to sakets_2000 there is a good chance that with an identity column you will be able to order by the original sequence of the text file. And since the unique value will be stored with the data you are guaranteed to get consistent results one call to the next.

Be One with the Optimizer
TG

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources