Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Adding Rows for Missing Years

Author  Topic 

Yida12
Starting Member

6 Posts

Posted - 2011-07-20 : 08:35:25
Hi Everyone,

I have a table that has revenues for firms per year like so:

FIRM YEAR REVENUE
FirmA 1991 125
FirmA 1992 130
FirmA 1993 160
FirmB 1991 150
FirmB 1992 175
FirmC 1992 130
FirmC 1993 120

The timeline is 1991-1993 inclusive.

I want to add to the table so that if a firm does not have revenue for a year, SQL will try to use a revenue from the closest year with revenue.

So FirmB 1993 would have revenue of 175, and FirmC 1991 would have revenue of 130.

Modified Table would look like:

FIRM YEAR REVENUE
FirmA 1991 125
FirmA 1992 130
FirmA 1993 160
FirmB 1991 150
FirmB 1992 175
FirmB 1993 175
FirmC 1991 130
FirmC 1992 130
FirmC 1993 120

If you need more detail, please let me know. Thank you for the help.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2011-07-20 : 08:44:36
[code]
SELECT p.* INTO #MissingYrs
FROM
(
SELECT t.FIRM,y.Yr
FROM (SELECT DISTINCT FIRM FROM table)t
CROSS JOIN (SELECT 1991 AS Yr UNION ALL
SELECT 1992 UNION ALL
SELECT 1993)y
)p
LEFT JOIN Table t1
ON t1.FIRM = p.FIRM
AND t1.YEAR = p.Yr
WHERE t1.FIRM IS NULL

INSERT INTO table
SELECT m.FIRM,m.Yr,COALESCE(y1.REVENUE,y2.REVENUE)
FROM #MissingYrs m
OUTER APPLY (SELECT TOP 1 REVENUE
FROM Table
WHERE FIRM = m.FIRM
AND YEAR >m.Yr
ORDER BY YEAR ASC) y1
OUTER APPLY (SELECT TOP 1 REVENUE
FROM Table
WHERE FIRM = m.FIRM
AND YEAR < m.Yr
ORDER BY YEAR DESC) y2

DROP TABLE #MissingYrs
[/code]

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-07-20 : 08:49:11
[code]
Declare @t table (
Firm varchar(10),
Yr int,
Rev int
)

Insert Into @t Select 'FirmA', 1991, 125
Insert Into @t Select 'FirmA', 1992, 130
Insert Into @t Select 'FirmA', 1993, 160
Insert Into @t Select 'FirmB', 1991, 150
Insert Into @t Select 'FirmB', 1992, 175
Insert Into @t Select 'FirmC', 1992, 130
Insert Into @t Select 'FirmC', 1993, 120



Select *
From @t


Declare @rs int, --Range Start
@re int --Range End

Set @rs = 1991
Set @re = 1993

Select
A.Firm,
A.Yr,
A.Rev
From
(
Select
Z.*,
Y.Rev,
r = ROW_NUMBER() Over(Partition By Z.Firm, Z.Yr Order By abs(Z.Yr-Y.Yr), Z.Yr-Y.Yr)
From
(
Select *
From (Select Distinct Firm From @t) A
Cross Join (Select Yr=number From master..spt_values Where number between @rs and @re) B
) Z
Left Join @t Y
On Z.Firm = Y.Firm
) A
Where A.r = 1
Order By A.Firm, A.Yr

[/code]

Corey

I Has Returned!!
Go to Top of Page

Yida12
Starting Member

6 Posts

Posted - 2011-07-20 : 10:24:58
Thanks for the help guys.
Do your codes work for a table with some firms not having any revenue data for any years? In this case, I just want to skip the firm entirely. Also, I am applying the code to thousands of firms with years 2004-2010.
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-07-20 : 10:49:07
Oh... I realized I was making mine more difficult than necessary:

EDIT: Forgot to say... this should work fine on thousands of firms, and I added in a Firm that did not fall inside the Yr range... and it is excluded from the results.


Drop Table #t
Create Table #t (
Firm varchar(10),
Yr int,
Rev int,
Primary Key (Firm, Yr)
)

Insert Into #t Select 'FirmA', 1991, 125
Insert Into #t Select 'FirmA', 1992, 130
Insert Into #t Select 'FirmA', 1993, 160
Insert Into #t Select 'FirmB', 1991, 150
Insert Into #t Select 'FirmB', 1992, 175
Insert Into #t Select 'FirmC', 1992, 130
Insert Into #t Select 'FirmC', 1993, 120
Insert Into #t Select 'FirmD', 1990, 100 -- Test Firm with no Rev for 91-93

-- Make a bunch of copies
Insert Into #t
Select Firm + CONVERT(varchar,number), Yr, Rev
From #t A, (Select number From master..spt_values Where type = 'P') B


Select *
From #t

Declare @rs int, --Range Start
@re int --Range End

Set @rs = 1991
Set @re = 1993


Select
Firm,
Yr,
Rev
From
(
Select
A.Firm,
B.Yr,
A.Rev,
r = ROW_NUMBER() Over(Partition By A.Firm, B.Yr Order By abs(A.Yr - B.Yr))
From #t A
Cross Join (Select Yr=number From master..spt_values Where number between @rs and @re) B
Where A.Yr between @rs and @re
) Z
Where r = 1



Corey

I Has Returned!!
Go to Top of Page

Yida12
Starting Member

6 Posts

Posted - 2011-07-20 : 11:15:07
Seventhnight,

Could you contact me via an email? I want to send you a cleaned sample of the data. I tried implementing your code but am getting errors.
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-07-20 : 12:35:30
I don't have access to my regular email here... sorry.

Why don't you post what you have tried here, along with the errors you are getting.

Also, you could try to recreate sample data that you think reproduces the issue...

Corey

I Has Returned!!
Go to Top of Page

Yida12
Starting Member

6 Posts

Posted - 2011-07-22 : 14:18:33
So for my actual table, I need to update the table so that each FIRM has a revenue for every year (2004-2010).

If there is a tie in the years, pick the year that is the later one.

Here is my code:


-- table variable
declare @data as table (
FIRM nvarchar(100), YEAR int, REVENUE float)
)

-- sample data
insert into @data(FIRM, YEAR, REVENUE)
Select a.FIRM, a.YEAR, a.REVENUE from [excel table] as a

-- SQL
select sq.FIRM, sq.yr, coalesce(d.revenue, closest_revenue.closest_revenue) as revenue

from (
-- generate a result set that matches firms to years
select distinct d.FIRM, yrs.yr
from @data AS d
cross apply (values(2004),(2005),(2006), (2007), (2008), (2009), (2010)) yrs(yr)
) AS sq
left outer join @data d on sq.FIRM=d.FIRM and sq.YEAR=d.YEAR

outer apply (
-- look up the closest revenue
-- how do I deal with ties?
select top 1 d_closest.revenue as closest_revenue
from @data as d_closest
where sq.FIRM=d.FIRM
order by abs(sq.yr - d_closest.DATAYEAR)
) AS closest_revenue

order by sq.ORG_KEY, sq.SEGMENT_KEY, sq.yr


The code runs instantly before "left outer join @data d on sq.FIRM=d.FIRM and sq.YEAR=d.YEAR", but this step takes so much time that I just stop it. Any suggestions on this and on how to deal with ties?
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2011-07-23 : 03:57:40
how many rows does @data contain?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

Yida12
Starting Member

6 Posts

Posted - 2011-07-25 : 15:57:48
@data contains tens of thousands of rows.
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-07-25 : 16:52:17
Maybe a temp table would be better (can be indexed). Also, I'd assign an Id to a 'Firm':


-- temp table
Create Table #firms table (
FirmId int identity(1,1)
FIRM nvarchar(100),
Primary Key (FirmId)
)

Create Table #data table (
FirmId int,
FIRM nvarchar(100),
Yr int,
REVENUE float
)

-- sample data
insert into #firms (FIRM)
Select distinct a.FIRM from [excel table] as a Order By A.Firm


insert into #data(FIRM, Yr, REVENUE)
Select
FirmId = (Select FirmId From #firms Where Firm = A.Firm),
a.Firm,
a.Yr,
a.REVENUE
from [excel table] as a


Select
Z.FirmId,
Y.Firm,
Z.Yr,
Z.Rev
From
(
Select
A.FirmId,
B.Yr,
A.Rev,
r = ROW_NUMBER() Over(Partition By A.FirmId, B.Yr Order By abs(A.Yr - B.Yr))
From #data A
Cross Join (values(2004),(2005),(2006), (2007), (2008), (2009), (2010)) B(Yr)
Where A.Yr between 2004 and 2010
) Z
Inner Join #firms Y
On Z.FirmId = Y.FirmId
Where r = 1
Order By Z.FirmId, Z.Yr


Seeing this as far as output on my sample data:

(18441 row(s) affected)
Table '#t_____000000006850'. Scan count 1, logical reads 69, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 125 ms, elapsed time = 446 ms.


Corey

I Has Returned!!
Go to Top of Page

Yida12
Starting Member

6 Posts

Posted - 2011-07-26 : 09:40:03
So using a temp table is faster than using a table variable? And also what is the reason for the ID on firms?
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2011-07-26 : 13:07:23
it can be based on amount of data and indexes.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-07-26 : 16:17:03
By generating the Id in the temp table, and making it a primary key, I get an indexed integer. Even though the rest of the query isn't really using the #firms temp table much, it is much easier(more efficient) to work with the integer than the nvarchar(100)... you could put an index on the #data temp table and it might run even better. I didn't test that though.

Corey

I Has Returned!!
Go to Top of Page
   

- Advertisement -