Flatten wide sparse columns into single row

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

Flatten wide sparse columns into single row

Author

Topic

nathans
Aged Yak Warrior

938 Posts

Posted - 2011-03-31 : 14:55:53

I am working with a very wide table (500+ sparse columns) and am trying to flatten the sparse rows into a single row. File this design under "eav gone bad" but the design is not mine nor up for refactoring :)

I have a very verbose and unwieldy CTE solution that is functioning (i codegen the cte from the table), and it serves well to illustrate the problem and desired result; but, I am posting here for any other ideas. My though is to append the datediff into the value of each sparse column, then use max to get the "most recent" cell value, and finally remove the appended sort.

Thanks!


declare @Sparse table 
	(	Id int,
		dt datetime,
		s_1 varchar(max) sparse null, 
		s_2 varchar(max) sparse null, 
		-- ...
		s_500 varchar(max) sparse null
	)
insert into @Sparse (Id, dt, s_1, s_2, s_500)
	values	(1, getdate(), 'John', null, null), 
		(1, getdate()-1, 'J', null, 'Kennedy'),
		(1, getdate()-2, null, 'F', 'Unknown'),
		--
		(2, getdate(), 'James', 'Earl', null), 
		(2, getdate()-1, 'James', null, null),
		(2, getdate()-2, null, 'n/a', 'Jones')

-- flatten dataset for each Id, using most recent value from each sparse column
-- desired:
/*
id    s_1    s_2    s_500
----- ------ ------ -----
1     John   F      Kennedy
2     James  Earl   Jones

*/

;with	
	c_Stage (id, diff, s_1, s_2, s_500)
	as	(	select	id, 
				cast(datediff(ms, dt, getdate()) as varchar(max)),
				cast(s_1 as varchar(max)), 
				cast(s_2 as varchar(max)), 
				cast(s_500 as varchar(max))
			from	@Sparse
		),
	 c_Sort (id, sort, s_1, s_2, s_500) 
	 as	(	select	id, 
				cast(row_number() over(order by replicate('0',len(diff)) desc, diff desc) as varchar(max)),
				s_1, 
				s_2, 
				s_500
			from	c_Stage
	 ),
	 c_Fin (id, s_1, s_2, s_500) 
	as	(	select	id, 
				sort+'.'+s_1,
				sort+'.'+s_2,
				sort+'.'+s_500
			from	c_Sort
		)
select	id, 
	[s_1] = stuff(max(s_1), 1, charindex('.', max(s_1), 0), ''),
	[s_2] = stuff(max(s_2), 1, charindex('.', max(s_2), 0), ''),
	[s_500] = stuff(max(s_500), 1, charindex('.', max(s_500), 0), '')
from	c_Fin
group
by	id;

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-03-31 : 15:50:00

Have you tried a nested UNPIVOT/PIVOT?

;WITH cteSource(ID, RecID, theValue, theCol)
AS (
	SELECT		u.ID,
			ROW_NUMBER() OVER (PARTITION BY u.ID, u.theCol ORDER BY u.DT DESC) AS RecID,
			u.theValue,
			u.theCol
	FROM		@Sparse AS s
	UNPIVOT		(
				theValue
				FOR theCol IN (s.s_1, s.s_2, s.s_500)
			) AS u
)
SELECT	p.ID,
	p.s_1,
	p.s_2,
	p.s_500
FROM	cteSource AS s
PIVOT	(
		MAX(s.theValue)
		FOR s.theCol IN ([s_1], [s_2], [s_500])
	) AS p
WHERE	p.RecID = 1

N 56°04'39.26"
E 12°55'05.63"

nathans
Aged Yak Warrior

938 Posts

Posted - 2011-03-31 : 18:08:10

Yes, though testing against the huge (and contrived) dataset of 10k rows per Id it performed about as well as the nasty string manipulation I did. Fastest I was able to get was to take the column_set xml from each row up into c# via CLR and merge them.

The pivot is by far the most readable though. And for the real world dataset of ~50 rows per Id it performs just fine.

Thanks peso!

nathans
Aged Yak Warrior

938 Posts

Posted - 2011-03-31 : 18:41:24

Scratch that... I took the pivot query to a diff server where it could leverage multi cpus, and it runs much faster! Query plan shows its taking advantage of parallelism. So, pivot looks like definitely the way to go.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-04-01 : 00:31:42

So you have a COLUMN SET column too?

N 56°04'39.26"
E 12°55'05.63"

nathans
Aged Yak Warrior

938 Posts

Posted - 2011-04-01 : 01:04:21

Yea, sorry I shouldve included in the example setup. In my CLR attempt the biggest hurdle was making sure I had an xml node for every column, even those that only had null values in sparse column. Those obviously dont have a node in the column_set.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources