Splitting a CSV column - using PARSENAME

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

Splitting a CSV column - using PARSENAME

Author

Topic

Kristen
Test

22859 Posts

Posted - 2011-01-03 : 05:02:19

In http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=154730 the O/P was asking about splitting a column and PARSENAME was suggested as a possible solution.

I see PARESENAME recommended for this type of thing quite often. I have to admit I've never tried it, but it seems like a "kludge" to me. Is it really faster do you know? (given that there will be 3 sets of REPLACE function I presume?)

May not be an issue, but it would also bother me that PARESENAME will silently remove any wrapping "[" ... "]", also embedded "." (and possibly other such characters too)

Pity MS don't just provide an inbuilt splitter - native code is going to be much faster than the complex stuff we have to throw together ...

My preference is an in-situ update which uses fewer CHARINDEX calls, and does not nest them, so "scales" better if there are many columns to be split:



DECLARE @MyData TABLE
(
	MyDelimitedData	varchar(100),
	MyCol1	varchar(20),
	MyCol2	varchar(20),
	MyCol3	varchar(20)
)

INSERT INTO @MyData
(
	MyDelimitedData
)
SELECT	'1|E001|Test1' UNION ALL
SELECT	'2|E002|Test2' UNION ALL
SELECT	'3|E003|Test3' UNION ALL
SELECT	'4|E004|Test4'
-- Data with is poorly compatible with PARSENAME:
UNION ALL
SELECT	'5.1|E005.5|Test5.5' UNION ALL
SELECT	'6|[E006]|[Test6]' UNION ALL
SELECT	'7|"E007"|"Test7"' UNION ALL
SELECT	'8|E008|Test8|Part8-4' UNION ALL
SELECT	'9|E009|Test9|Part9-4|Part9-5'

SELECT	SUBSTRING(MyDelimitedData,1,Charindex('|',MyDelimitedData)-1) AS MyCol1
	, SUBSTRING(MyDelimitedData,Charindex('|',MyDelimitedData)+1,
		(Charindex('|',MyDelimitedData,CHARINDEX('|',MyDelimitedData)+1))-(Charindex('|',MyDelimitedData)+1)) AS MyCol2
	, Reverse(SubString(REVERSE(MyDelimitedData),1,Charindex('|',REVERSE(MyDelimitedData))-1)) AS MyCol3
FROM	@MyData

SELECT	[MyCol1] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 3),
	[MyCol2] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 2),
	[MyCol3] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 1)
FROM	@MyData

DECLARE	@I1	int,
	@I2	int,
	@I3	int
--
UPDATE U
SET
	  @I1 = CHARINDEX('|', MyDelimitedData + '|')
	, [MyCol1] = LEFT(MyDelimitedData, @I1-1)
	, @I2 =  NullIf(CHARINDEX('|', MyDelimitedData + '|', @I1+1), 0)
	, [MyCol2] = SUBSTRING(MyDelimitedData, @I1+1, @I2-@I1-1)
	, @I3 =  NullIf(CHARINDEX('|', MyDelimitedData + '|', @I2+1), 0)
	, [MyCol3] = SUBSTRING(MyDelimitedData, @I2+1, @I3-@I2-1)
FROM	@MyData AS U
--
SELECT	MyCol1, MyCol2, MyCol3, MyDelimitedData
FROM	@MyData

Results (comparing "correct" splitting with PARSENAME):


MyCol1 MyCol2 MyCol3  MyDelimitedData
------ ------ ------- ---------------
1      E001   Test1   1|E001|Test1
2      E002   Test2   2|E002|Test2
3      E003   Test3   3|E003|Test3
4      E004   Test4   4|E004|Test4
5.1    E005.5 Test5.5 5.1|E005.5|Test5.5
6      [E006] [Test6] 6|[E006]|[Test6]
7      "E007" "Test7" 7|"E007"|"Test7"
8      E008   Test8   8|E008|Test8|Part8-4
9      E009   Test9   9|E009|Test9|Part9-4|Part9-5

MyCol1 MyCol2 MyCol3
------ ------ ------
1      E001   Test1
2      E002   Test2
3      E003   Test3
4      E004   Test4
NULL   NULL   NULL
6      E006   Test6
7      E007   Test7
E008   Test8  Part8-4
NULL   NULL   NULL

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-01-03 : 08:09:40

quote:
Originally posted by Kristen

I see PARESENAME recommended for this type of thing quite often. I have to admit I've never tried it, but it seems like a "kludge" to me. Is it really faster do you know? (given that there will be 3 sets of REPLACE function I presume?)

I sort of share Kristen's feelings towards parsename - I recall reading somewhere that parsename was really meant for parsing the four-part naming convention (and consequently, it has a limitation of a maximum of four tokens that can be split).

Kristen
Test

22859 Posts

Posted - 2011-01-03 : 11:43:27

"I recall reading somewhere that parsename was really meant for parsing the four-part naming convention "

That's exactly what it was for! I have changed the example to add some data that is PARSENAME "unfriendly". For me that would be cause-enough not to use it (unless someone convinces me otherwise) just on the off-chance that such data got into my data-stream.

In particular an additional element mucking up the splitting because the splitting is from-RIGHT, rather than from-LEFT as would more normally be expected, and any inappropriate data causing all parts to return NULL.

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2011-01-04 : 02:18:33

Parsename should be used cleverly in cases like the one specified in http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=154730
OP wants to split three part and update them to three different columns. Parsename can do it what is needed. So the usage of parsename depends on the nature of the data. I always suggest this function to split maximum of four parts. I would not suggest it for the example data you posted.

Madhivanan

Failing to plan is Planning to fail

Kristen
Test

22859 Posts

Posted - 2011-01-04 : 03:53:09

"I would not suggest it for the example data you posted"

Thanks Madhi. I always worry that even "good" data can contain "bad" data (accidentally, through unintended program error, etc.)

So maybe the answer is:


SELECT	[MyCol1] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 3),
	[MyCol2] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 2),
	[MyCol3] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 1)
FROM	@MyData
WHERE	    MyDelimitedData LIKE '%|%|%'
	AND MyDelimitedData NOT LIKE '%|%|%|%'
	AND MyDelimitedData NOT LIKE '%.%'

to ensure that there only rows with 3 parts are included (and then some means is needed to report, by exception, on any data that does not have the correct pattern

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2011-01-04 : 04:32:48

Yes. You can filter like that. However, if a dot is part of data and delimeter is something else, you may need to handle it differently as fifth row has only three parts


SELECT	[MyCol1] = REPLACE(PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 3),'~','.'),
	[MyCol2] = REPLACE(PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 2),'~','.'),
	[MyCol3] = REPLACE(PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 1),'~','.')
FROM	(select replace(MyDelimitedData,'.','~') as  MyDelimitedData from  @MyData) as t
WHERE	    MyDelimitedData LIKE '%|%|%'
	AND MyDelimitedData NOT LIKE '%|%|%|%'

Madhivanan

Failing to plan is Planning to fail

Kristen
Test

22859 Posts

Posted - 2011-01-04 : 05:03:27

"if a dot is part of data and delimeter is something else, you may need to handle it differently"

Yup. But this is, really, my original point - as at this point I don;t see why I would use PARSENAME (which is intended for a completely different job of course ...)

So I'm back where I started

- I think PARSENAME is the wrong tool for this job (notwithstanding that I think that PIPE-delimited data "stinks" in any case

)

FWIW we have people who register on Client sites with email addresses as "||||||@somedomain.com" - they know that pipe-delimited is a favourite for passing data around (best me why though ...) and are reducing their spam accordingly ...

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2011-01-04 : 05:30:23

<<
So I'm back where I started
>>

Ok. If there are unknown number of delimted parts, use this
http://beyondrelational.com/blogs/madhivanan/archive/2008/09/11/splitting-delimited-data-to-columns-set-based-approach.aspx

Madhivanan

Failing to plan is Planning to fail

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources