Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Normalizing MV data (split function).

Author  Topic 

mhavens
Starting Member

3 Posts

Posted - 2006-08-18 : 14:20:10
Hi -

I have data that is stored like this:

ID AUTHORS BirthYR DeathYR
123 A\B\C... 1901\1902\1903... 1951\1952\1953...


I'd like to normalize it. I have SQL2005 at my disposal, but I can't get my brain wrapped around the various UDFs that split the individual columns & the Pivot functions. The data would look like this afterwards:

ID AUTHORS BirthYR DeathYR
123 A 1901 1951
123 B 1902 1952
123 C 1903 1953
.
.



I hate reinventing the wheel. What's the best approach I can take and some links with good examples. I don't need to aggregate any data, just split it up & normalize it. Thanks in advance.

Mark Havens

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2006-08-18 : 15:06:45
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648



Go with the flow & have fun! Else fight the flow
blog thingie: http://weblogs.sqlteam.com/mladenp
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2006-08-19 : 06:36:32
I think the main problem is doing it in a way that zips* across the rows rather than producing the cartesian product and is both sufficiently generic and performant.

Using the split function in the thread mladen's pointed to, it's something like this for your example:


CREATE FUNCTION dbo.Split (@sep char(1), @s varchar(512))
RETURNS table
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(@sep, @s)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(@sep, @s, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT pn,
SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS s
FROM Pieces
)
GO



SELECT ID, A.s, B.s, D.s
FROM TableName
CROSS APPLY dbo.Split('\', AUTHORS) AS A
CROSS APPLY dbo.Split('\', BirthYR) AS B
CROSS APPLY dbo.Split('\', DeathYR) AS D
WHERE A.pn = B.pn
AND A.pn = D.pn

This looks pretty tidy to me, and it has an understandable behaviour where the piece lengths for AUTHORS, BirthYR and DeathYR differ: stop when the shortest one runs out! It's probably not the quickest way: mainly because it's likely to have to produce (partial) products before it can apply the WHERE conditions. But perhaps that doesn't matter if you're only doing it once.

An alternate approach is to count the pieces in one of the MV columns (or even take the minimum, but that gets messy), use a tally table/function and a Piece function. That way you know you're getting the same piece for each column without having to make it filter the result of a join. Like this:


SELECT ID,
dbo.Piece(AUTHORS, '\', N.n),
dbo.Piece(BirthYR, '\', N.n),
dbo.Piece(DeathYR, '\', N.n)
FROM dbo.IkeySensitivityRaw
INNER JOIN Numbers AS N
ON N.n BETWEEN 1 AND LEN(AUTHORS) - LEN(REPLACE(AUTHORS, '\', '')) + 1


* sorry, it's a functional programming term!
Go to Top of Page
   

- Advertisement -