| Author |
Topic |
|
dbonneau
Yak Posting Veteran
50 Posts |
Posted - 2009-04-14 : 00:29:35
|
| Hi,I would like to indicate on Flag field so I can group it by "A" and "B".Please look at my sample dataset below. Price Diff Flag 841.75 841.75 0 842 0.25 A 842 0 A 842 0 A 842 0 A 842 0 A 842 0 A 842 0 A 842 0 A 842 0 A 842.25 0.25 A 842 -0.25 B 842 0 B 842 0 B 842 0 B 842 0 B 842.25 0.25 A 842 -0.25 B 842 0 B 842 0 B 842 0 B 842 0 B 842 0 B 842 0 B 842 0 BFirst, I would like to take the difference between a Row and next Row from Price Field, and output the values on Diff field.And then indicate it as "A" or "B" on Flag field.For example,take (row2-row1),(row3-row2),(row4-row3),(row5-row4), and so forth from price field, and indicate it as "A" or "B" on Flag field.The following are for Flag conditionIf Diff is positive value, Flag it as "A".OR If current Diff is 0 and previous non-zero value is positive, indicate it as "A".If Diff is negative value, Flag it as "B". OR If current Diff value is 0 and previous non-zero value is negative, indicate it as "B". I didn't indicate "A" or "B on the second row on Flag field becuase I don't have any non-zero value above to use to indicate for second row.Thank you so much ! |
|
|
sakets_2000
Master Smack Fu Yak Hacker
1472 Posts |
Posted - 2009-04-14 : 12:31:05
|
| how do you find the next row ? Is there a datetime or identity field for this ? |
 |
|
|
dbonneau
Yak Posting Veteran
50 Posts |
Posted - 2009-04-14 : 12:39:58
|
quote: Originally posted by sakets_2000 how do you find the next row ? Is there a datetime or identity field for this ?
Do I need identity field to find the next row ?If that is a case, can we add numbers field to identify it ?Thanks! |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-04-14 : 12:45:16
|
| It doesn't have to be an identity but you do need some indication in the table data to define the sequence of the rows. It could be a date column. If you just add an identity column now there is not guarantee that the values (generated for existing data) will correspond to the sequence they were added to the table.Be One with the OptimizerTG |
 |
|
|
dbonneau
Yak Posting Veteran
50 Posts |
Posted - 2009-04-14 : 13:33:03
|
quote: Originally posted by TG It doesn't have to be an identity but you do need some indication in the table data to define the sequence of the rows. It could be a date column. If you just add an identity column now there is not guarantee that the values (generated for existing data) will correspond to the sequence they were added to the table.Be One with the OptimizerTG
Date Time Price4/9/2009 8:00 841.754/9/2009 8:00 841.754/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 842.254/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 842.25This is orignial dataset. But The date and Time are not unique values since date and time just represent the price change in that interval.What if I use row_number() to assign number ?select row_number () over (order by date, time) as sqnum Thanks! |
 |
|
|
IceCreamWizard
Starting Member
11 Posts |
Posted - 2009-04-14 : 13:49:42
|
Admittedly, here's a non-elegant solution and it uses assumptions of the data (as pointed out by previous posters).My solution assumes a sequential column exists in the order the data is inserted. My guess is that data is always inserted and never deleted (sort of like S&P 500 data :). If so, a trigger can be used to ensure this assumption (see below***).Also, performance of this solution is, shall we say, not necessarily good. It requires on the order of n + n^2 passes, at worst. If you have tens of thousands of rows, it may take some time. I'm sure someone who reads this can take my initial attempt here and improve it.-- Script for the table my solutions runs on.-- The ID column must be sequential in the order insertedCREATE TABLE [dbo].[FlagPlusMinus_v2] ( [ID] int IDENTITY(1, 1) NOT NULL, [Price] float NULL, [Diff] float NULL, [Flag] nchar(1) NULL)-- Reset data for testingupdate Sandbox.dbo.FlagPlusMinus_v2set Diff = null, Flag = nullwhere ID > 1-- Assign the first row with defaultsupdate Sandbox.dbo.FlagPlusMinus_v2set Diff = 0.0, Flag = '-'where ID = 1-- ********* Solutions begins **********-- This first pass updates the Diff column. It also updates-- the Flag column where Diff > 0 or Diff < 0.update Sandbox.dbo.FlagPlusMinus_v2 set Diff = results.rDiff, Flag = results.rFlagfrom (select t1.ID rID, (t1.Price - t2.Price) 'rDiff', rFlag = case when (t1.Price - t2.Price) > 0 then 'A' when (t1.Price - t2.Price) < 0 then 'B' else null end from FlagPlusMinus_v2 t1, FlagPlusMinus_v2 t2 where t1.ID = (t2.ID + 1) ) resultswhere ID = results.rID-- This is where it gets ugly.-- Loops through data until all Flag values assigned.-- What happens is that the row's Flag column, following a row where its-- Flag is not null, is updated with the previous Flag value.while (select count(ID) from FlagPlusMinus_v2 where Flag is null) > 0 begin update Sandbox.dbo.FlagPlusMinus_v2 set Flag = results.rFlag from (select t1.ID rId, rFlag = t2.Flag from FlagPlusMinus_v2 t1, FlagPlusMinus_v2 t2 where t1.ID = (t2.ID + 1) and t1.Diff = 0.0 and t1.Flag is null ) results where ID = results.rID and Flag is nullend To understand this, step through the inside of the loop several times and see how the data is updated. If there arefew instances of consecutive "Diff = 0", then this loop will perform moderately (and hopefully, acceptably) well.I hope this helps or at least puts on the right path (no guarantees :).Good Luck,Matt*** If using a newly created table, I would think that an Identity column would be sufficient. Otherwise, use a separate table with sequence number and a trigger to update a sequential column would be needed. |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-04-14 : 13:54:55
|
quote: Originally posted by dbonneauWhat if I use row_number() to assign number ?select row_number () over (order by date, time) as sqnum Thanks!
Then you still don't have guaranteed consistency from one call to the next as to the sequence since your over clause does not contain unique values. Do your rows get inserted one by one or several at a time? if one by one then identity column may be the solution - at least going forward. You will still have the problem if of properly sequencing the existing data.Be One with the OptimizerTG |
 |
|
|
dbonneau
Yak Posting Veteran
50 Posts |
Posted - 2009-04-14 : 14:31:10
|
quote: Originally posted by TG
quote: Originally posted by dbonneauWhat if I use row_number() to assign number ?select row_number () over (order by date, time) as sqnum Thanks!
Do your rows get inserted one by one or several at a time? if of properly sequencing the existing data.
I use bulk insert function to load data into a table from txt file. |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-04-14 : 14:53:24
|
| So you know that the txt file is always in the physical order you want? And there is nothing in the source data to reflect that sequence? I'm not sure if bulk insert will always load the data in the same order as the source file. I suspect that it inserts in batches which would not guarantee the order - Anyone else know if that is the case?bottom line is that unless you can come up with a way to have a unique column or combination of columns there is no way to know the actual sequence for these rows:4/9/2009 8:00 841.754/9/2009 8:00 8424/9/2009 8:00 842.254/9/2009 8:00 8424/9/2009 8:00 842.25Be One with the OptimizerTG |
 |
|
|
sakets_2000
Master Smack Fu Yak Hacker
1472 Posts |
Posted - 2009-04-14 : 15:11:05
|
| Yes. Your best bet is having that identity field come from the source. To TG's point, I think bulk insert will keep the order, but I haven't seen this documented anywhere. |
 |
|
|
sakets_2000
Master Smack Fu Yak Hacker
1472 Posts |
Posted - 2009-04-14 : 15:15:18
|
| dbonneau, I remember seeing in your original post an identity column. You later edited it and removed that column. It had values like a1,a2,a3...Do you have that?? |
 |
|
|
dbonneau
Yak Posting Veteran
50 Posts |
Posted - 2009-04-14 : 15:42:19
|
quote: Originally posted by sakets_2000 dbonneau, I remember seeing in your original post an identity column. You later edited it and removed that column. It had values like a1,a2,a3...Do you have that??
Yes, I had that. I am sorry to confuse you. I just added that field so people can understand it better (a2 -a1), (a3-a2)...The real dataset looks like this.Date Time Price4/9/2009 8:00 841.754/9/2009 8:00 841.754/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 8424/9/2009 8:00 842Thanks ! |
 |
|
|
dbonneau
Yak Posting Veteran
50 Posts |
Posted - 2009-04-14 : 15:44:05
|
quote: Originally posted by sakets_2000 Yes. Your best bet is having that identity field come from the source. To TG's point, I think bulk insert will keep the order, but I haven't seen this documented anywhere.
Then Why can't I just add sequence number using Row_Number ()?Thanks ! |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2009-04-14 : 15:49:15
|
| Because you are not guaranteed to get consistent results one call to the next and you definitely aren't guaranteed to be in the same order as the source file. According to sakets_2000 there is a good chance that with an identity column you will be able to order by the original sequence of the text file. And since the unique value will be stored with the data you are guaranteed to get consistent results one call to the next.Be One with the OptimizerTG |
 |
|
|
|