Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Flatten wide sparse columns into single row

Author  Topic 

nathans
Aged Yak Warrior

938 Posts

Posted - 2011-03-31 : 14:55:53
I am working with a very wide table (500+ sparse columns) and am trying to flatten the sparse rows into a single row. File this design under "eav gone bad" but the design is not mine nor up for refactoring :)

I have a very verbose and unwieldy CTE solution that is functioning (i codegen the cte from the table), and it serves well to illustrate the problem and desired result; but, I am posting here for any other ideas. My though is to append the datediff into the value of each sparse column, then use max to get the "most recent" cell value, and finally remove the appended sort.

Thanks!


declare @Sparse table
( Id int,
dt datetime,
s_1 varchar(max) sparse null,
s_2 varchar(max) sparse null,
-- ...
s_500 varchar(max) sparse null
)
insert into @Sparse (Id, dt, s_1, s_2, s_500)
values (1, getdate(), 'John', null, null),
(1, getdate()-1, 'J', null, 'Kennedy'),
(1, getdate()-2, null, 'F', 'Unknown'),
--
(2, getdate(), 'James', 'Earl', null),
(2, getdate()-1, 'James', null, null),
(2, getdate()-2, null, 'n/a', 'Jones')

-- flatten dataset for each Id, using most recent value from each sparse column
-- desired:
/*
id s_1 s_2 s_500
----- ------ ------ -----
1 John F Kennedy
2 James Earl Jones

*/

;with
c_Stage (id, diff, s_1, s_2, s_500)
as ( select id,
cast(datediff(ms, dt, getdate()) as varchar(max)),
cast(s_1 as varchar(max)),
cast(s_2 as varchar(max)),
cast(s_500 as varchar(max))
from @Sparse
),
c_Sort (id, sort, s_1, s_2, s_500)
as ( select id,
cast(row_number() over(order by replicate('0',len(diff)) desc, diff desc) as varchar(max)),
s_1,
s_2,
s_500
from c_Stage
),
c_Fin (id, s_1, s_2, s_500)
as ( select id,
sort+'.'+s_1,
sort+'.'+s_2,
sort+'.'+s_500
from c_Sort
)
select id,
[s_1] = stuff(max(s_1), 1, charindex('.', max(s_1), 0), ''),
[s_2] = stuff(max(s_2), 1, charindex('.', max(s_2), 0), ''),
[s_500] = stuff(max(s_500), 1, charindex('.', max(s_500), 0), '')
from c_Fin
group
by id;

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-03-31 : 15:50:00
Have you tried a nested UNPIVOT/PIVOT?
;WITH cteSource(ID, RecID, theValue, theCol)
AS (
SELECT u.ID,
ROW_NUMBER() OVER (PARTITION BY u.ID, u.theCol ORDER BY u.DT DESC) AS RecID,
u.theValue,
u.theCol
FROM @Sparse AS s
UNPIVOT (
theValue
FOR theCol IN (s.s_1, s.s_2, s.s_500)
) AS u
)
SELECT p.ID,
p.s_1,
p.s_2,
p.s_500
FROM cteSource AS s
PIVOT (
MAX(s.theValue)
FOR s.theCol IN ([s_1], [s_2], [s_500])
) AS p
WHERE p.RecID = 1


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

nathans
Aged Yak Warrior

938 Posts

Posted - 2011-03-31 : 18:08:10
Yes, though testing against the huge (and contrived) dataset of 10k rows per Id it performed about as well as the nasty string manipulation I did. Fastest I was able to get was to take the column_set xml from each row up into c# via CLR and merge them.

The pivot is by far the most readable though. And for the real world dataset of ~50 rows per Id it performs just fine.

Thanks peso!
Go to Top of Page

nathans
Aged Yak Warrior

938 Posts

Posted - 2011-03-31 : 18:41:24
Scratch that... I took the pivot query to a diff server where it could leverage multi cpus, and it runs much faster! Query plan shows its taking advantage of parallelism. So, pivot looks like definitely the way to go.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-04-01 : 00:31:42
So you have a COLUMN SET column too?



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

nathans
Aged Yak Warrior

938 Posts

Posted - 2011-04-01 : 01:04:21
Yea, sorry I shouldve included in the example setup. In my CLR attempt the biggest hurdle was making sure I had an xml node for every column, even those that only had null values in sparse column. Those obviously dont have a node in the column_set.

Go to Top of Page
   

- Advertisement -