how force row_number continue numbering?? - SQL Server Forums

Home | Weblogs | Forums

Site Sponsored By: SQLDSC - SQL Server Desired State Configuration

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

how force row_number continue numbering??

Author

Topic

johey
Starting Member

10 Posts

Posted - 2007-04-18 : 14:49:19

Hi everyone,

I'm having trouble with the row_number function. I know how it works, but it doesn't behave exactly as I want.. :-)

I use something like this:
row_number() over (partition by T1.person,T1.code,T1.val order by T1.person,T1.code,T1.yr,T1.mnth,T1.val) rownum

Object is to track historical changes in the data. I need to know when "val" changes for a certain person and code. I want the first val for each change. It almost works, but causes a problem if a val changes to another val and then later back to a former one. The rownum keeps counting, starting from the last time the value was found. Here an example to clearify:

person yr mnth code val rownum
X 2006 1 100 14.985.900 1
X 2006 2 100 14.985.900 2
X 2006 3 100 14.985.900 3
X 2006 4 100 14.985.900 4
X 2006 5 100 14.985.900 5
X 2006 6 100 14.985.900 6
X 2006 7 100 14.985.900 7
X 2006 8 100 13.864.700 1
X 2006 9 100 14.985.900 8
X 2006 10 100 14.985.900 9
X 2006 11 100 14.985.900 10
X 2006 12 100 9.953.500 1

As you can see, it goes wrong at 9/2006: I don't want to see 8 in the rownum column, but 1. Of course, this is by design, but how can I avoid this?

Anyone?!

Thanks a million in advance,
johey

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-04-18 : 15:21:04

Val should not be included in your partitioning or ordering, just the other columns:

row_number() over (partition by T1.person,T1.code~~,T1.val~~ order by T1.person,T1.code,T1.yr,T1.mnth~~,T1.val~~) rownum

In addition, what is the primary key of this table? If there is not a unique constraint on person/code/year/month then you will not get consistent data.

- Jeff
http://weblogs.sqlteam.com/JeffS

johey
Starting Member

10 Posts

Posted - 2007-04-19 : 02:54:31

Jeff,

Thanks for your answer. I've tried your suggestion, but it doesn't do the trick. Actually it's worse. It gives results as:

person yr mnth code val rownum
X 2006 1 100 14.985.900 1
X 2006 2 100 14.985.900 2
X 2006 3 100 14.985.900 3
X 2006 4 100 14.985.900 4
X 2006 5 100 14.985.900 5
X 2006 6 100 14.985.900 6
X 2006 7 100 14.985.900 7
X 2006 8 100 13.864.700 8
X 2006 9 100 14.985.900 9
X 2006 10 100 14.985.900 10
X 2006 11 100 14.985.900 11
X 2006 12 100 9.953.500 12

So it just counts the months and codes without noticing the change in val.

That unique constraint is handled, thanks.

I now think I cannot use row_number for this purpose. I'll try to outer join the resultset to itself on the combination of person, code and the (possibly) following month.

Regards,
johey

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-04-19 : 03:16:28

A dirty CURSOR solution, but it works.

-- Prepare sample data
DECLARE @Sample TABLE (Person CHAR(1), Yr INT, Mnth INT, Code INT, Val VARCHAR(10), Num INT)

INSERT	@Sample (Person, Yr, Mnth, Code, Val)
SELECT	'X', 2006, 1, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 7, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 2, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 5, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 3, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 4, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 8, 100, '13.864.700' UNION ALL
SELECT	'X', 2006, 6, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 11, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 9, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 10, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 12, 100, '9.953.500'

DECLARE	@LastVal VARCHAR(10),
	@CurrVal VARCHAR(10),
	@Num INT,
	@Mnth INT

DECLARE	curNum CURSOR	FORWARD_ONLY READ_ONLY FOR
				SELECT		Mnth,
						Val
				FROM		@Sample
				ORDER BY	Mnth

OPEN curNum

FETCH	NEXT
FROM	curNum
INTO	@Mnth,
	@CurrVal

WHILE @@FETCH_STATUS = 0
	BEGIN
		IF @LastVal < @CurrVal OR @LastVal IS NULL
			SET @Num = 1
		ELSE
			SET @Num = @Num +1

		UPDATE	@Sample
		SET	Num = @Num
		WHERE	Mnth = @Mnth

		SET @LastVal = @CurrVal

		FETCH	NEXT
		FROM	curNum
		INTO	@Mnth,
			@CurrVal
	END

CLOSE curNum
DEALLOCATE curNum

select * From @sample
order by mnth

Peter Larsson
Helsingborg, Sweden

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-04-19 : 08:51:33

Ah! I think I understand now.

I *think* this returns what you are after, no cursorsm but I am not sure how efficient it will be:


DECLARE @Sample TABLE (Person CHAR(1), Yr INT, Mnth INT, Code INT, Val VARCHAR(10), Num INT)

INSERT	@Sample (Person, Yr, Mnth, Code, Val)
SELECT	'X', 2006, 1, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 7, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 2, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 5, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 3, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 4, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 8, 100, '13.864.700' UNION ALL
SELECT	'X', 2006, 6, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 11, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 9, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 10, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 12, 100, '9.953.500';


with SampleData as
(
 select *, row_number() over (partition by T1.person,T1.code order by T1.person,T1.code,T1.yr,T1.mnth) rownum
 from @sample t1
) 
select 
  a.Person, a.Yr, a.Mnth, a.Code, a.Val
from
  SampleData a
left outer join
  SampleData b on (a.Person = b.Person and a.Code = b.Code and a.rownum = b.rownum + 1)
where
  b.person is null OR b.val != a.val

We basically generate a row num for each Person/Code when ordered by year/month, and then we compare each rownum in that partition with the one just before it. If the vals do not match (or there is no previous one), we return that row.

By the way -- none of these solutions are conclusive with your sample data, since your data doesn't include more than person or more than 1 code. Whenever you want a solution that works "per person" or "per code" or something like that, always try to include more than 1 person or code in your data, otherwise there is no way to know for sure if it will work when applied to data with multiple values in those fields. It is very important to use a true representative set of data when testing and writing this algorithms, and choosing the wrong sample data could result in the algorithm looking like it works perfectly when testing, but when applied to your real-world case it will not work at all.

- Jeff
http://weblogs.sqlteam.com/JeffS

johey
Starting Member

10 Posts

Posted - 2007-04-19 : 12:16:43

Jeff and Peso,

Thanks for all your efforts. Meanwhile I've figured out an alternative solution. It's a little more complicated than both of your suggestions because I also need to know the last date a value occured (to know till when it's valid).

My approach:
- I outer join the resultset from above to itself on the combination of person, code and the (possibly) following month (in order to know if the current value is changed in the following month)
- I outer join the resultset a second time to itself on the combination of person, code and the (possibly) previous month (in order to know if the current value is changed in the previous month)
- This gives a resultset which I filter from the unchanged records. The resultset now contains starting records and also ending records. I put this resultset in a cte.
- Next I get the starting records from the cte in one subquery and the ending records in another one.
- Finally I join both results on person, code and val plus the date of the beginning record has to preceed that from the ending record. In the same step I put a group by statement to get the minimum ending record date (to prevent the "gap" error as explained in my initial post)

I'll spare you the real query! ;-)

Thanks again,
johey

sqlalways
Starting Member

3 Posts

Posted - 2007-04-20 : 02:59:03

select row_number() over (partition by a order by a) as rownumber,<field1>,field2,field3.. from (select null as a ,<field1>,<filed2>,..... from <your table name> ) x

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-04-20 : 07:04:36

Huh?

Peter Larsson
Helsingborg, Sweden

herothecat
Starting Member

19 Posts

Posted - 2007-04-20 : 22:40:44

johey,

OK, this is going to look pretty complicated, and it's even more complicated to explain. I hope you have as much fun figuring it out as I did!!

I also know that this isn't using the rownumber function, but it works and you've given me something to really look into. I'll let you know if I find a way to do this with that function.

But this code works very nicely if all you need to do is return the LAST DATE and VALUES when a change occurs. If you need the all other info as well, then just left join this on the person, code, year and date.

I combined the year and month to ease the calcs a bit.


DECLARE @Sample TABLE (Person CHAR(1), Yr INT, Mnth INT, Code INT, Val VARCHAR(10))

INSERT	@Sample (Person, Yr, Mnth, Code, Val)
SELECT	'X', 2006, 1, 100, '11.985.900' UNION ALL
SELECT	'X', 2006, 2, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 3, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 4, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 5, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 6, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 7, 100, '13.864.700' UNION ALL
SELECT	'X', 2006, 8, 100, '14.985.900' UNION ALL
SELECT	'X', 2006, 9, 100, '14.985.900' UNION ALL
SELECT	'm', 2007, 1, 100, '9.953.500'  UNION ALL
SELECT	'm', 2007, 2, 100, '14.985.900' UNION ALL
SELECT	'm', 2007, 3, 100, '13.864.700' UNION ALL
SELECT	'Y', 2006, 10, 100, '14.985.900' UNION ALL
SELECT	'Y', 2006, 11, 100, '14.985.900' UNION ALL
SELECT	'Y', 2007, 4, 100, '14.985.900' UNION ALL
SELECT	'Y', 2007, 5, 100, '14.985.900' UNION ALL
SELECT	'Y', 2007, 6, 100, '14.985.900' UNION ALL
SELECT	'Y', 2007, 7, 200, '14.985.900' UNION ALL
SELECT	'Y', 2007, 8, 200, '14.985.900' UNION ALL
SELECT	'Y', 2007, 9, 200, '900' UNION ALL
SELECT	'Y', 2007, 10, 200, '14.985.900' UNION ALL
SELECT	'Y', 2007, 11, 200, '14.985.900' UNION ALL
SELECT	'Y', 2007, 12, 300, '9.953.500' UNION ALL 
SELECT	'a', 2008, 1, 100, '11.985.900' UNION ALL
SELECT	'a', 2008, 2, 100, '14.985.900' UNION ALL
SELECT	'a', 2008, 3, 100, '13.864.700' UNION ALL
SELECT	'a', 2008, 4, 100, '14.985.900' UNION ALL
SELECT	'a', 2008, 5, 100, '14.985.900' UNION ALL
SELECT	'a', 2008, 6, 100, '14.985.900' UNION ALL
SELECT	'a', 2008, 7, 100, '13.864.700' UNION ALL
SELECT	'a', 2008, 8, 100, '14.985.900' UNION ALL
SELECT	'a', 2008, 9, 100, '14.985.900'

SELECT Person,Code,theDate,Val
FROM 
    (SELECT DISTINCT a.person, a.Code
            ,cast(cast(a.Mnth AS varchar(2)) + '/01/' + casT(a.Yr AS VARCHAR(4)) AS DATETIME)
                AS theDate
            ,a.val
        -- get the month/yr (or date) that is before the current person/code
            ,(SELECT
                cast(cast(min(Mnth) AS varchar(2)) + '/01/' + casT(min(Yr) AS VARCHAR(4)) AS DATETIME)
                    AS nextMnth
              FROM @sample AS b
              WHERE cast(cast(a.Mnth AS varchar(2)) + '/01/' + casT(a.Yr AS VARCHAR(4)) AS DATETIME)
                    <
                   cast(cast(b.Mnth AS varchar(2)) + '/01/' + casT(b.Yr AS VARCHAR(4)) AS DATETIME)
                   AND a.Person = b.person 
                   AND a.Code = b.Code) AS nextDate
        -- get the month/yr (or date) that is before the current record and the value is different
            ,(SELECT 
                cast(cast(min(Mnth) AS varchar(2)) + '/01/' + casT(min(Yr) AS VARCHAR(4)) AS DATETIME)
                    AS prvMnth
              FROM @sample AS b 
              WHERE cast(cast(a.Mnth AS varchar(2)) + '/01/' + casT(a.Yr AS VARCHAR(4)) AS DATETIME)
                    <
                   cast(cast(b.Mnth AS varchar(2)) + '/01/' + casT(b.Yr AS VARCHAR(4)) AS DATETIME)
                   AND a.Person = b.person 
                   AND a.Code = b.Code
                   AND b.Val <> a.Val) AS lastDateB4change
    FROM @sample AS a) AS tmp
WHERE 
    lastDateB4change <= nextDate OR nextDate IS NULL 
ORDER BY
    theDate,Person,Code,Val

As always, I never say my way is the best!

Try it and let me know what you find using your actual data. Specifically the performance.

Why push the envelope when you can just open it?

herothecat
Starting Member

19 Posts

Posted - 2007-04-20 : 22:45:24

Oh, Peter,

My thoughts exactly! HUH?

Why push the envelope when you can just open it?

johey
Starting Member

10 Posts

Posted - 2007-04-23 : 09:32:34

Herothecat,
Thanks for your reply and hard work. As said, I've already found a solution. I've tested your query though. As I'm kind of busy (understatement) I stopped waiting for results after 4 minutes (although your solution looks like a good one). I should have mentioned that the base table contains 12 million records.. ;-) My query only takes 2 minutes, so I'm very happy with it.
Thanks again,
johey

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-04-23 : 09:39:16

If your query takes "only 2 minutes" on 12 million records, I would like to see it and learn from you.
Also, run the "dirty CURSOR" method to compare times.

Peter Larsson
Helsingborg, Sweden

herothecat
Starting Member

19 Posts

Posted - 2007-04-23 : 17:48:53

I'd also like to see your query.
And run mine as well, I'm curious to see the performance differences of the three.

Like Peter said we can learn from you....

:-)

Why push the envelope when you can just open it?

johey
Starting Member

10 Posts

Posted - 2007-04-26 : 03:08:14

Herocat's query takes almost 3 hours. But it doesn't supply me yet with all needed data. Moreover: the data I gave as example already is a result from a subquery of mine. So I can honestly state that my query is the fastest option. :-D

Here it is:

WITH
CteD AS
( -- get date helper
select M.Yr,M.Mnth,M.StartDate,M.EndDate
from AxConversionTemp.Helper.DateYearMonth M
where M.StartDate >= @idStartDate
and M.EndDate <= @idStopDate
)
,CteE AS
( -- get empl helper
select E.AceEmplId,E.EmplId
from AxConversionTemp.Helper.ConvEmployee E
)
,CteJ AS
( -- get jur helper
select J.Company,J.JurNr
from AxConversionTemp.Helper.ConvJurNr J
)
,CteL AS
(
select PT1.EMPLID,PT1.COMPANY
,OBT1.OB_DT
,OBT1.codeAce,OBT1.S
from
(
-- Personnel data
select PCB.EMPLID,PCB.COMPANY,PCB.END_DT ContractEndDt,PE.ancien_COLR AncienDt
from Info.pers.ps_contract_bel PCB
inner join Info.pers.ps_employment PE
on PE.EMPLID = PCB.emplId
where PCB.emplId + ' ' + convert(varchar,PCB.effdt) in
(
select PCBX.emplId + ' ' + convert(varchar,max(PCBX.effdt))
from Info.pers.ps_contract_bel PCBX
group by PCBX.emplID
)
and PE.emplId + ' ' + convert(varchar,PE.EMPL_RCD) in
(
select T.emplId + ' ' + convert(varchar,max(T.EMPL_RCD))
from Info.pers.ps_employment T
group by T.emplID
)
and PCB.CONTR_DUR_BEL in ('01','02')
) PT1
inner join (
select OBT2.emplId,SDM.StartDate OB_DT
,OBT2.codeAce
,OBT2.S
from CteD SDM
inner join
(
-- codes and values summed per converted code/person/month
select POI.emplId,POI.OB_A_YY_DT Yr,POI.OB_A_MM_DT Mnth
,CL.codeAce
,SUM(POI.OB_AMT_9_4) S
from AxConversionTemp.Helper.Convcode CL
inner join Info.pers.ps_ob_inre POI
on CL.codeInfoHr = POI.OB_CODE
where CL.codeInfoHr <> '300' --historiek via PS_JOB
and CL.DestinyTable in ('LOM','BLM')
group by POI.EMPLID,POI.OB_A_YY_DT,POI.OB_A_MM_DT,CL.codeAce
) OBT2
on OBT2.Yr = SDM.Yr and OBT2.Mnth = SDM.Mnth
) OBT1
on PT1.emplId = OBT1.emplId
)
,CteL1 AS
( -- only start and end records; no unchanged records in between
SELECT L.EMPLID,L.COMPANY,L.OB_DT,L.codeAce,L.S
,LPrev.OB_DT PrevOB_DT,LNext.OB_DT NextOB_DT
FROM CteL L
LEFT OUTER JOIN CteL LPrev
ON L.EmplID = LPrev.EmplID
AND L.codeAce = LPrev.codeAce
AND L.OB_DT = dateadd(M,1,LPrev.OB_DT)
AND L.S = LPrev.S
LEFT OUTER JOIN CteL LNext
ON L.EmplID = LNext.EmplID
AND L.codeAce = LNext.codeAce
AND L.OB_DT = dateadd(M,-1,LNext.OB_DT)
AND L.S = LNext.S
WHERE (LPrev.OB_DT IS NULL OR LNext.OB_DT IS NULL)
)
,CteL2 AS
(
-- get end date for each start record
SELECT LFrom.COMPANY,LFrom.EmplID,LFrom.OB_DT,LFrom.codeAce,LFrom.S,LTill.OB_Dt OB_DtTill
FROM
( -- all start records
SELECT LFrom.EMPLID,LFrom.COMPANY,LFrom.OB_DT,LFrom.codeAce,LFrom.S
FROM CteL1 LFrom
WHERE LFrom.PrevOB_DT IS NULL
) LFrom
left outer join
(-- all end records
SELECT LTill.EMPLID,LTill.COMPANY,LTill.OB_DT,LTill.codeAce,LTill.S
FROM CteL1 LTill
WHERE LTill.PrevOB_DT IS NOT NULL
AND LTill.NextOB_DT IS NULL
) LTill
ON LFrom.EmplID = LTill.EmplID
AND LFrom.codeAce = LTill.codeAce
AND LFrom.OB_DT < LTill.OB_DT
AND LFrom.S = LTill.S
)
INSERT INTO AxConversionTemp.PERS.AceHist
([jur]
,[wgnr]
,[wnnr]
,[volgnr]
,[begindatum]
,[einddatum]
,[code]
,[aantal]
,[bedragPerEenheid]
,[percentage]
,[bedrag])
select J.JurNr,@lsDestwgnr,E.AceEmplID,@lnDestvolgnr
,L3.OB_DT,D.EndDate,L3.codeAce
,@lnDestaantal,@lnDestbedragPerEenheid,@lsConstAceUitbetalingPct
,L3.S
from
(
-- find first (=lowest) end date per code/person/date
SELECT L2.COMPANY,L2.EmplID,L2.OB_DT,L2.codeAce,L2.S,IsNull(IsNull(MIN(L2.OB_DtTill),L2.OB_DT),@ldConstMaxSysdate) OB_DtTill
FROM CteL2 L2
GROUP BY L2.COMPANY,L2.EmplID,L2.OB_DT,L2.codeAce,L2.S
) L3
INNER JOIN CteD D
on L3.OB_DtTill = D.StartDate
INNER JOIN CteE E
on L3.EmplId = E.EmplId
INNER JOIN CteJ J
on L3.COMPANY = J.Company;

Sorry, no time to explain. I think though my query isn't that complicated after all.

Unfortunately I really don't have time to try the cursor method.

Thanks again for all your suggestions.

Regards,
johey

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-04-26 : 03:16:12

quote:
Originally posted by johey

Unfortunately I really don't have time to try the cursor method.

Thanks again for all your suggestions.

You're welcome.

Peter Larsson
Helsingborg, Sweden

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-04-26 : 04:01:38

quote:
Originally posted by johey

Moreover: the data I gave as example already is a result from a subquery of mine. So I can honestly state that my query is the fastest option.

No wonder since you are the only one with access to all information and all you give us is some small subset.
And you expect us to come up with something clever based on carefully shared sample data?

Peter Larsson
Helsingborg, Sweden

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-04-26 : 05:35:03

Good news!
This technique takes less than 3 seconds for 1.2 million records!

CREATE UNIQUE CLUSTERED INDEX IX_Samples ON SourceTable (Person, Yr, Mnth)

DECLARE	@LastVal VARCHAR(10),
	@Num INT,
	@Mnth INT

SET	@LastVal = REPLICATE(CHAR(255), 10)

UPDATE	SourceTable
SET	@Num = Num =	CASE
				WHEN Val < @LastVal THEN 1
				ELSE @Num + 1
			END,
	@LastVal = Val

Beat that!

Peter Larsson
Helsingborg, Sweden

johey
Starting Member

10 Posts

Posted - 2007-04-26 : 06:28:14

quote:
Originally posted by Peso

quote:
Originally posted by johey

Moreover: the data I gave as example already is a result from a subquery of mine. So I can honestly state that my query is the fastest option.
No wonder since you are the only one with access to all information and all you give us is some small subset.
And you expect us to come up with something clever based on carefully shared sample data?

Very true, my mistake. I did just ask a solution for the rownum-problem though. After my post where I explained my alternative solution I didn't expect any more suggestions because my problem was already solved. Sorry if I was unclear about that.

herothecat
Starting Member

19 Posts

Posted - 2007-04-26 : 08:50:13

3 Hours....

:-(

Why push the envelope when you can just open it?

johey
Starting Member

10 Posts

Posted - 2007-04-26 : 09:45:39

Don't feel bad about it, my first attempt wasn't finished after 12h!

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-04-26 : 16:11:50

Then 3 seconds is really bad?

Peter Larsson
Helsingborg, Sweden

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources