Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Splitting a CSV column - using PARSENAME

Author  Topic 

Kristen
Test

22859 Posts

Posted - 2011-01-03 : 05:02:19
In http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=154730 the O/P was asking about splitting a column and PARSENAME was suggested as a possible solution.

I see PARESENAME recommended for this type of thing quite often. I have to admit I've never tried it, but it seems like a "kludge" to me. Is it really faster do you know? (given that there will be 3 sets of REPLACE function I presume?)

May not be an issue, but it would also bother me that PARESENAME will silently remove any wrapping "[" ... "]", also embedded "." (and possibly other such characters too)

Pity MS don't just provide an inbuilt splitter - native code is going to be much faster than the complex stuff we have to throw together ...

My preference is an in-situ update which uses fewer CHARINDEX calls, and does not nest them, so "scales" better if there are many columns to be split:


DECLARE @MyData TABLE
(
MyDelimitedData varchar(100),
MyCol1 varchar(20),
MyCol2 varchar(20),
MyCol3 varchar(20)
)

INSERT INTO @MyData
(
MyDelimitedData
)
SELECT '1|E001|Test1' UNION ALL
SELECT '2|E002|Test2' UNION ALL
SELECT '3|E003|Test3' UNION ALL
SELECT '4|E004|Test4'
-- Data with is poorly compatible with PARSENAME:
UNION ALL
SELECT '5.1|E005.5|Test5.5' UNION ALL
SELECT '6|[E006]|[Test6]' UNION ALL
SELECT '7|"E007"|"Test7"' UNION ALL
SELECT '8|E008|Test8|Part8-4' UNION ALL
SELECT '9|E009|Test9|Part9-4|Part9-5'


SELECT SUBSTRING(MyDelimitedData,1,Charindex('|',MyDelimitedData)-1) AS MyCol1
, SUBSTRING(MyDelimitedData,Charindex('|',MyDelimitedData)+1,
(Charindex('|',MyDelimitedData,CHARINDEX('|',MyDelimitedData)+1))-(Charindex('|',MyDelimitedData)+1)) AS MyCol2
, Reverse(SubString(REVERSE(MyDelimitedData),1,Charindex('|',REVERSE(MyDelimitedData))-1)) AS MyCol3
FROM @MyData

SELECT [MyCol1] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 3),
[MyCol2] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 2),
[MyCol3] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 1)
FROM @MyData

DECLARE @I1 int,
@I2 int,
@I3 int
--
UPDATE U
SET
@I1 = CHARINDEX('|', MyDelimitedData + '|')
, [MyCol1] = LEFT(MyDelimitedData, @I1-1)
, @I2 = NullIf(CHARINDEX('|', MyDelimitedData + '|', @I1+1), 0)
, [MyCol2] = SUBSTRING(MyDelimitedData, @I1+1, @I2-@I1-1)
, @I3 = NullIf(CHARINDEX('|', MyDelimitedData + '|', @I2+1), 0)
, [MyCol3] = SUBSTRING(MyDelimitedData, @I2+1, @I3-@I2-1)
FROM @MyData AS U
--
SELECT MyCol1, MyCol2, MyCol3, MyDelimitedData
FROM @MyData

Results (comparing "correct" splitting with PARSENAME):

MyCol1 MyCol2 MyCol3 MyDelimitedData
------ ------ ------- ---------------
1 E001 Test1 1|E001|Test1
2 E002 Test2 2|E002|Test2
3 E003 Test3 3|E003|Test3
4 E004 Test4 4|E004|Test4
5.1 E005.5 Test5.5 5.1|E005.5|Test5.5
6 [E006] [Test6] 6|[E006]|[Test6]
7 "E007" "Test7" 7|"E007"|"Test7"
8 E008 Test8 8|E008|Test8|Part8-4
9 E009 Test9 9|E009|Test9|Part9-4|Part9-5

MyCol1 MyCol2 MyCol3
------ ------ ------
1 E001 Test1
2 E002 Test2
3 E003 Test3
4 E004 Test4
NULL NULL NULL
6 E006 Test6
7 E007 Test7
E008 Test8 Part8-4
NULL NULL NULL

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-01-03 : 08:09:40
quote:
Originally posted by Kristen


I see PARESENAME recommended for this type of thing quite often. I have to admit I've never tried it, but it seems like a "kludge" to me. Is it really faster do you know? (given that there will be 3 sets of REPLACE function I presume?)


I sort of share Kristen's feelings towards parsename - I recall reading somewhere that parsename was really meant for parsing the four-part naming convention (and consequently, it has a limitation of a maximum of four tokens that can be split).
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2011-01-03 : 11:43:27
"I recall reading somewhere that parsename was really meant for parsing the four-part naming convention "

That's exactly what it was for! I have changed the example to add some data that is PARSENAME "unfriendly". For me that would be cause-enough not to use it (unless someone convinces me otherwise) just on the off-chance that such data got into my data-stream.

In particular an additional element mucking up the splitting because the splitting is from-RIGHT, rather than from-LEFT as would more normally be expected, and any inappropriate data causing all parts to return NULL.
Go to Top of Page

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2011-01-04 : 02:18:33
Parsename should be used cleverly in cases like the one specified in http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=154730
OP wants to split three part and update them to three different columns. Parsename can do it what is needed. So the usage of parsename depends on the nature of the data. I always suggest this function to split maximum of four parts. I would not suggest it for the example data you posted.

Madhivanan

Failing to plan is Planning to fail
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2011-01-04 : 03:53:09
"I would not suggest it for the example data you posted"

Thanks Madhi. I always worry that even "good" data can contain "bad" data (accidentally, through unintended program error, etc.)

So maybe the answer is:

SELECT [MyCol1] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 3),
[MyCol2] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 2),
[MyCol3] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 1)
FROM @MyData
WHERE MyDelimitedData LIKE '%|%|%'
AND MyDelimitedData NOT LIKE '%|%|%|%'
AND MyDelimitedData NOT LIKE '%.%'


to ensure that there only rows with 3 parts are included (and then some means is needed to report, by exception, on any data that does not have the correct pattern
Go to Top of Page

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2011-01-04 : 04:32:48
Yes. You can filter like that. However, if a dot is part of data and delimeter is something else, you may need to handle it differently as fifth row has only three parts


SELECT [MyCol1] = REPLACE(PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 3),'~','.'),
[MyCol2] = REPLACE(PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 2),'~','.'),
[MyCol3] = REPLACE(PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 1),'~','.')
FROM (select replace(MyDelimitedData,'.','~') as MyDelimitedData from @MyData) as t
WHERE MyDelimitedData LIKE '%|%|%'
AND MyDelimitedData NOT LIKE '%|%|%|%'


Madhivanan

Failing to plan is Planning to fail
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2011-01-04 : 05:03:27
"if a dot is part of data and delimeter is something else, you may need to handle it differently"

Yup. But this is, really, my original point - as at this point I don;t see why I would use PARSENAME (which is intended for a completely different job of course ...)

So I'm back where I started - I think PARSENAME is the wrong tool for this job (notwithstanding that I think that PIPE-delimited data "stinks" in any case )

FWIW we have people who register on Client sites with email addresses as "||||||@somedomain.com" - they know that pipe-delimited is a favourite for passing data around (best me why though ...) and are reducing their spam accordingly ...
Go to Top of Page

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2011-01-04 : 05:30:23
<<
So I'm back where I started
>>


Ok. If there are unknown number of delimted parts, use this
http://beyondrelational.com/blogs/madhivanan/archive/2008/09/11/splitting-delimited-data-to-columns-set-based-approach.aspx

Madhivanan

Failing to plan is Planning to fail
Go to Top of Page
   

- Advertisement -