| Author |
Topic |
|
johey
Starting Member
10 Posts |
Posted - 2007-04-18 : 14:49:19
|
| Hi everyone,I'm having trouble with the row_number function. I know how it works, but it doesn't behave exactly as I want.. :-)I use something like this:row_number() over (partition by T1.person,T1.code,T1.val order by T1.person,T1.code,T1.yr,T1.mnth,T1.val) rownumObject is to track historical changes in the data. I need to know when "val" changes for a certain person and code. I want the first val for each change. It almost works, but causes a problem if a val changes to another val and then later back to a former one. The rownum keeps counting, starting from the last time the value was found. Here an example to clearify:person yr mnth code val rownumX 2006 1 100 14.985.900 1X 2006 2 100 14.985.900 2X 2006 3 100 14.985.900 3X 2006 4 100 14.985.900 4X 2006 5 100 14.985.900 5X 2006 6 100 14.985.900 6X 2006 7 100 14.985.900 7X 2006 8 100 13.864.700 1X 2006 9 100 14.985.900 8X 2006 10 100 14.985.900 9X 2006 11 100 14.985.900 10X 2006 12 100 9.953.500 1As you can see, it goes wrong at 9/2006: I don't want to see 8 in the rownum column, but 1. Of course, this is by design, but how can I avoid this?Anyone?!Thanks a million in advance,johey |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2007-04-18 : 15:21:04
|
Val should not be included in your partitioning or ordering, just the other columns:row_number() over (partition by T1.person,T1.code,T1.val order by T1.person,T1.code,T1.yr,T1.mnth,T1.val) rownumIn addition, what is the primary key of this table? If there is not a unique constraint on person/code/year/month then you will not get consistent data.- Jeffhttp://weblogs.sqlteam.com/JeffS |
 |
|
|
johey
Starting Member
10 Posts |
Posted - 2007-04-19 : 02:54:31
|
| Jeff,Thanks for your answer. I've tried your suggestion, but it doesn't do the trick. Actually it's worse. It gives results as:person yr mnth code val rownumX 2006 1 100 14.985.900 1X 2006 2 100 14.985.900 2X 2006 3 100 14.985.900 3X 2006 4 100 14.985.900 4X 2006 5 100 14.985.900 5X 2006 6 100 14.985.900 6X 2006 7 100 14.985.900 7X 2006 8 100 13.864.700 8X 2006 9 100 14.985.900 9X 2006 10 100 14.985.900 10X 2006 11 100 14.985.900 11X 2006 12 100 9.953.500 12So it just counts the months and codes without noticing the change in val.That unique constraint is handled, thanks.I now think I cannot use row_number for this purpose. I'll try to outer join the resultset to itself on the combination of person, code and the (possibly) following month.Regards,johey |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-04-19 : 03:16:28
|
A dirty CURSOR solution, but it works.-- Prepare sample dataDECLARE @Sample TABLE (Person CHAR(1), Yr INT, Mnth INT, Code INT, Val VARCHAR(10), Num INT)INSERT @Sample (Person, Yr, Mnth, Code, Val)SELECT 'X', 2006, 1, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 7, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 2, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 5, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 3, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 4, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 8, 100, '13.864.700' UNION ALLSELECT 'X', 2006, 6, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 11, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 9, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 10, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 12, 100, '9.953.500'DECLARE @LastVal VARCHAR(10), @CurrVal VARCHAR(10), @Num INT, @Mnth INTDECLARE curNum CURSOR FORWARD_ONLY READ_ONLY FOR SELECT Mnth, Val FROM @Sample ORDER BY MnthOPEN curNumFETCH NEXTFROM curNumINTO @Mnth, @CurrValWHILE @@FETCH_STATUS = 0 BEGIN IF @LastVal < @CurrVal OR @LastVal IS NULL SET @Num = 1 ELSE SET @Num = @Num +1 UPDATE @Sample SET Num = @Num WHERE Mnth = @Mnth SET @LastVal = @CurrVal FETCH NEXT FROM curNum INTO @Mnth, @CurrVal ENDCLOSE curNumDEALLOCATE curNumselect * From @sampleorder by mnth Peter LarssonHelsingborg, Sweden |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2007-04-19 : 08:51:33
|
Ah! I think I understand now. I *think* this returns what you are after, no cursorsm but I am not sure how efficient it will be:DECLARE @Sample TABLE (Person CHAR(1), Yr INT, Mnth INT, Code INT, Val VARCHAR(10), Num INT)INSERT @Sample (Person, Yr, Mnth, Code, Val)SELECT 'X', 2006, 1, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 7, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 2, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 5, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 3, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 4, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 8, 100, '13.864.700' UNION ALLSELECT 'X', 2006, 6, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 11, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 9, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 10, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 12, 100, '9.953.500';with SampleData as( select *, row_number() over (partition by T1.person,T1.code order by T1.person,T1.code,T1.yr,T1.mnth) rownum from @sample t1) select a.Person, a.Yr, a.Mnth, a.Code, a.Valfrom SampleData aleft outer join SampleData b on (a.Person = b.Person and a.Code = b.Code and a.rownum = b.rownum + 1)where b.person is null OR b.val != a.val We basically generate a row num for each Person/Code when ordered by year/month, and then we compare each rownum in that partition with the one just before it. If the vals do not match (or there is no previous one), we return that row.By the way -- none of these solutions are conclusive with your sample data, since your data doesn't include more than person or more than 1 code. Whenever you want a solution that works "per person" or "per code" or something like that, always try to include more than 1 person or code in your data, otherwise there is no way to know for sure if it will work when applied to data with multiple values in those fields. It is very important to use a true representative set of data when testing and writing this algorithms, and choosing the wrong sample data could result in the algorithm looking like it works perfectly when testing, but when applied to your real-world case it will not work at all.- Jeffhttp://weblogs.sqlteam.com/JeffS |
 |
|
|
johey
Starting Member
10 Posts |
Posted - 2007-04-19 : 12:16:43
|
| Jeff and Peso,Thanks for all your efforts. Meanwhile I've figured out an alternative solution. It's a little more complicated than both of your suggestions because I also need to know the last date a value occured (to know till when it's valid). My approach:- I outer join the resultset from above to itself on the combination of person, code and the (possibly) following month (in order to know if the current value is changed in the following month)- I outer join the resultset a second time to itself on the combination of person, code and the (possibly) previous month (in order to know if the current value is changed in the previous month)- This gives a resultset which I filter from the unchanged records. The resultset now contains starting records and also ending records. I put this resultset in a cte.- Next I get the starting records from the cte in one subquery and the ending records in another one.- Finally I join both results on person, code and val plus the date of the beginning record has to preceed that from the ending record. In the same step I put a group by statement to get the minimum ending record date (to prevent the "gap" error as explained in my initial post)I'll spare you the real query! ;-)Thanks again,johey |
 |
|
|
sqlalways
Starting Member
3 Posts |
Posted - 2007-04-20 : 02:59:03
|
| select row_number() over (partition by a order by a) as rownumber,<field1>,field2,field3.. from (select null as a ,<field1>,<filed2>,..... from <your table name> ) x |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-04-20 : 07:04:36
|
| Huh?Peter LarssonHelsingborg, Sweden |
 |
|
|
herothecat
Starting Member
19 Posts |
Posted - 2007-04-20 : 22:40:44
|
johey,OK, this is going to look pretty complicated, and it's even more complicated to explain. I hope you have as much fun figuring it out as I did!! I also know that this isn't using the rownumber function, but it works and you've given me something to really look into. I'll let you know if I find a way to do this with that function.But this code works very nicely if all you need to do is return the LAST DATE and VALUES when a change occurs. If you need the all other info as well, then just left join this on the person, code, year and date.I combined the year and month to ease the calcs a bit.
DECLARE @Sample TABLE (Person CHAR(1), Yr INT, Mnth INT, Code INT, Val VARCHAR(10))INSERT @Sample (Person, Yr, Mnth, Code, Val)SELECT 'X', 2006, 1, 100, '11.985.900' UNION ALLSELECT 'X', 2006, 2, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 3, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 4, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 5, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 6, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 7, 100, '13.864.700' UNION ALLSELECT 'X', 2006, 8, 100, '14.985.900' UNION ALLSELECT 'X', 2006, 9, 100, '14.985.900' UNION ALLSELECT 'm', 2007, 1, 100, '9.953.500' UNION ALLSELECT 'm', 2007, 2, 100, '14.985.900' UNION ALLSELECT 'm', 2007, 3, 100, '13.864.700' UNION ALLSELECT 'Y', 2006, 10, 100, '14.985.900' UNION ALLSELECT 'Y', 2006, 11, 100, '14.985.900' UNION ALLSELECT 'Y', 2007, 4, 100, '14.985.900' UNION ALLSELECT 'Y', 2007, 5, 100, '14.985.900' UNION ALLSELECT 'Y', 2007, 6, 100, '14.985.900' UNION ALLSELECT 'Y', 2007, 7, 200, '14.985.900' UNION ALLSELECT 'Y', 2007, 8, 200, '14.985.900' UNION ALLSELECT 'Y', 2007, 9, 200, '900' UNION ALLSELECT 'Y', 2007, 10, 200, '14.985.900' UNION ALLSELECT 'Y', 2007, 11, 200, '14.985.900' UNION ALLSELECT 'Y', 2007, 12, 300, '9.953.500' UNION ALL SELECT 'a', 2008, 1, 100, '11.985.900' UNION ALLSELECT 'a', 2008, 2, 100, '14.985.900' UNION ALLSELECT 'a', 2008, 3, 100, '13.864.700' UNION ALLSELECT 'a', 2008, 4, 100, '14.985.900' UNION ALLSELECT 'a', 2008, 5, 100, '14.985.900' UNION ALLSELECT 'a', 2008, 6, 100, '14.985.900' UNION ALLSELECT 'a', 2008, 7, 100, '13.864.700' UNION ALLSELECT 'a', 2008, 8, 100, '14.985.900' UNION ALLSELECT 'a', 2008, 9, 100, '14.985.900'SELECT Person,Code,theDate,ValFROM (SELECT DISTINCT a.person, a.Code ,cast(cast(a.Mnth AS varchar(2)) + '/01/' + casT(a.Yr AS VARCHAR(4)) AS DATETIME) AS theDate ,a.val -- get the month/yr (or date) that is before the current person/code ,(SELECT cast(cast(min(Mnth) AS varchar(2)) + '/01/' + casT(min(Yr) AS VARCHAR(4)) AS DATETIME) AS nextMnth FROM @sample AS b WHERE cast(cast(a.Mnth AS varchar(2)) + '/01/' + casT(a.Yr AS VARCHAR(4)) AS DATETIME) < cast(cast(b.Mnth AS varchar(2)) + '/01/' + casT(b.Yr AS VARCHAR(4)) AS DATETIME) AND a.Person = b.person AND a.Code = b.Code) AS nextDate -- get the month/yr (or date) that is before the current record and the value is different ,(SELECT cast(cast(min(Mnth) AS varchar(2)) + '/01/' + casT(min(Yr) AS VARCHAR(4)) AS DATETIME) AS prvMnth FROM @sample AS b WHERE cast(cast(a.Mnth AS varchar(2)) + '/01/' + casT(a.Yr AS VARCHAR(4)) AS DATETIME) < cast(cast(b.Mnth AS varchar(2)) + '/01/' + casT(b.Yr AS VARCHAR(4)) AS DATETIME) AND a.Person = b.person AND a.Code = b.Code AND b.Val <> a.Val) AS lastDateB4change FROM @sample AS a) AS tmpWHERE lastDateB4change <= nextDate OR nextDate IS NULL ORDER BY theDate,Person,Code,Val As always, I never say my way is the best!Try it and let me know what you find using your actual data. Specifically the performance.Why push the envelope when you can just open it? |
 |
|
|
herothecat
Starting Member
19 Posts |
Posted - 2007-04-20 : 22:45:24
|
| Oh, Peter,My thoughts exactly! HUH?Why push the envelope when you can just open it? |
 |
|
|
johey
Starting Member
10 Posts |
Posted - 2007-04-23 : 09:32:34
|
| Herothecat,Thanks for your reply and hard work. As said, I've already found a solution. I've tested your query though. As I'm kind of busy (understatement) I stopped waiting for results after 4 minutes (although your solution looks like a good one). I should have mentioned that the base table contains 12 million records.. ;-) My query only takes 2 minutes, so I'm very happy with it. Thanks again,johey |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-04-23 : 09:39:16
|
| If your query takes "only 2 minutes" on 12 million records, I would like to see it and learn from you.Also, run the "dirty CURSOR" method to compare times.Peter LarssonHelsingborg, Sweden |
 |
|
|
herothecat
Starting Member
19 Posts |
Posted - 2007-04-23 : 17:48:53
|
| I'd also like to see your query.And run mine as well, I'm curious to see the performance differences of the three.Like Peter said we can learn from you....:-)Why push the envelope when you can just open it? |
 |
|
|
johey
Starting Member
10 Posts |
Posted - 2007-04-26 : 03:08:14
|
| Herocat's query takes almost 3 hours. But it doesn't supply me yet with all needed data. Moreover: the data I gave as example already is a result from a subquery of mine. So I can honestly state that my query is the fastest option. :-DHere it is:WITH CteD AS( -- get date helper select M.Yr,M.Mnth,M.StartDate,M.EndDate from AxConversionTemp.Helper.DateYearMonth M where M.StartDate >= @idStartDate and M.EndDate <= @idStopDate ),CteE AS( -- get empl helper select E.AceEmplId,E.EmplId from AxConversionTemp.Helper.ConvEmployee E),CteJ AS( -- get jur helper select J.Company,J.JurNr from AxConversionTemp.Helper.ConvJurNr J),CteL AS( select PT1.EMPLID,PT1.COMPANY ,OBT1.OB_DT ,OBT1.codeAce,OBT1.S from ( -- Personnel data select PCB.EMPLID,PCB.COMPANY,PCB.END_DT ContractEndDt,PE.ancien_COLR AncienDt from Info.pers.ps_contract_bel PCB inner join Info.pers.ps_employment PE on PE.EMPLID = PCB.emplId where PCB.emplId + ' ' + convert(varchar,PCB.effdt) in ( select PCBX.emplId + ' ' + convert(varchar,max(PCBX.effdt)) from Info.pers.ps_contract_bel PCBX group by PCBX.emplID ) and PE.emplId + ' ' + convert(varchar,PE.EMPL_RCD) in ( select T.emplId + ' ' + convert(varchar,max(T.EMPL_RCD)) from Info.pers.ps_employment T group by T.emplID ) and PCB.CONTR_DUR_BEL in ('01','02') ) PT1 inner join ( select OBT2.emplId,SDM.StartDate OB_DT ,OBT2.codeAce ,OBT2.S from CteD SDM inner join ( -- codes and values summed per converted code/person/month select POI.emplId,POI.OB_A_YY_DT Yr,POI.OB_A_MM_DT Mnth ,CL.codeAce ,SUM(POI.OB_AMT_9_4) S from AxConversionTemp.Helper.Convcode CL inner join Info.pers.ps_ob_inre POI on CL.codeInfoHr = POI.OB_CODE where CL.codeInfoHr <> '300' --historiek via PS_JOB and CL.DestinyTable in ('LOM','BLM') group by POI.EMPLID,POI.OB_A_YY_DT,POI.OB_A_MM_DT,CL.codeAce ) OBT2 on OBT2.Yr = SDM.Yr and OBT2.Mnth = SDM.Mnth ) OBT1 on PT1.emplId = OBT1.emplId),CteL1 AS( -- only start and end records; no unchanged records in between SELECT L.EMPLID,L.COMPANY,L.OB_DT,L.codeAce,L.S ,LPrev.OB_DT PrevOB_DT,LNext.OB_DT NextOB_DT FROM CteL L LEFT OUTER JOIN CteL LPrev ON L.EmplID = LPrev.EmplID AND L.codeAce = LPrev.codeAce AND L.OB_DT = dateadd(M,1,LPrev.OB_DT) AND L.S = LPrev.S LEFT OUTER JOIN CteL LNext ON L.EmplID = LNext.EmplID AND L.codeAce = LNext.codeAce AND L.OB_DT = dateadd(M,-1,LNext.OB_DT) AND L.S = LNext.S WHERE (LPrev.OB_DT IS NULL OR LNext.OB_DT IS NULL)),CteL2 AS( -- get end date for each start record SELECT LFrom.COMPANY,LFrom.EmplID,LFrom.OB_DT,LFrom.codeAce,LFrom.S,LTill.OB_Dt OB_DtTill FROM ( -- all start records SELECT LFrom.EMPLID,LFrom.COMPANY,LFrom.OB_DT,LFrom.codeAce,LFrom.S FROM CteL1 LFrom WHERE LFrom.PrevOB_DT IS NULL ) LFrom left outer join (-- all end records SELECT LTill.EMPLID,LTill.COMPANY,LTill.OB_DT,LTill.codeAce,LTill.S FROM CteL1 LTill WHERE LTill.PrevOB_DT IS NOT NULL AND LTill.NextOB_DT IS NULL ) LTill ON LFrom.EmplID = LTill.EmplID AND LFrom.codeAce = LTill.codeAce AND LFrom.OB_DT < LTill.OB_DT AND LFrom.S = LTill.S)INSERT INTO AxConversionTemp.PERS.AceHist ([jur] ,[wgnr] ,[wnnr] ,[volgnr] ,[begindatum] ,[einddatum] ,[code] ,[aantal] ,[bedragPerEenheid] ,[percentage] ,[bedrag]) select J.JurNr,@lsDestwgnr,E.AceEmplID,@lnDestvolgnr ,L3.OB_DT,D.EndDate,L3.codeAce ,@lnDestaantal,@lnDestbedragPerEenheid,@lsConstAceUitbetalingPct ,L3.S from ( -- find first (=lowest) end date per code/person/date SELECT L2.COMPANY,L2.EmplID,L2.OB_DT,L2.codeAce,L2.S,IsNull(IsNull(MIN(L2.OB_DtTill),L2.OB_DT),@ldConstMaxSysdate) OB_DtTill FROM CteL2 L2 GROUP BY L2.COMPANY,L2.EmplID,L2.OB_DT,L2.codeAce,L2.S ) L3 INNER JOIN CteD D on L3.OB_DtTill = D.StartDate INNER JOIN CteE E on L3.EmplId = E.EmplId INNER JOIN CteJ J on L3.COMPANY = J.Company;Sorry, no time to explain. I think though my query isn't that complicated after all.Unfortunately I really don't have time to try the cursor method.Thanks again for all your suggestions.Regards,johey |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-04-26 : 03:16:12
|
quote: Originally posted by johey Unfortunately I really don't have time to try the cursor method.Thanks again for all your suggestions.
You're welcome.Peter LarssonHelsingborg, Sweden |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-04-26 : 04:01:38
|
quote: Originally posted by johey Moreover: the data I gave as example already is a result from a subquery of mine. So I can honestly state that my query is the fastest option.
No wonder since you are the only one with access to all information and all you give us is some small subset. And you expect us to come up with something clever based on carefully shared sample data?Peter LarssonHelsingborg, Sweden |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-04-26 : 05:35:03
|
Good news!This technique takes less than 3 seconds for 1.2 million records!CREATE UNIQUE CLUSTERED INDEX IX_Samples ON SourceTable (Person, Yr, Mnth)DECLARE @LastVal VARCHAR(10), @Num INT, @Mnth INTSET @LastVal = REPLICATE(CHAR(255), 10)UPDATE SourceTableSET @Num = Num = CASE WHEN Val < @LastVal THEN 1 ELSE @Num + 1 END, @LastVal = Val Beat that!Peter LarssonHelsingborg, Sweden |
 |
|
|
johey
Starting Member
10 Posts |
Posted - 2007-04-26 : 06:28:14
|
quote: Originally posted by Peso
quote: Originally posted by johey Moreover: the data I gave as example already is a result from a subquery of mine. So I can honestly state that my query is the fastest option.
No wonder since you are the only one with access to all information and all you give us is some small subset. And you expect us to come up with something clever based on carefully shared sample data?
Very true, my mistake. I did just ask a solution for the rownum-problem though. After my post where I explained my alternative solution I didn't expect any more suggestions because my problem was already solved. Sorry if I was unclear about that. |
 |
|
|
herothecat
Starting Member
19 Posts |
Posted - 2007-04-26 : 08:50:13
|
| 3 Hours....:-(Why push the envelope when you can just open it? |
 |
|
|
johey
Starting Member
10 Posts |
Posted - 2007-04-26 : 09:45:39
|
| Don't feel bad about it, my first attempt wasn't finished after 12h! |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-04-26 : 16:11:50
|
| Then 3 seconds is really bad?Peter LarssonHelsingborg, Sweden |
 |
|
|
Next Page
|
|
|