Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Kristen
Test
22859 Posts |
Posted - 2011-01-03 : 05:02:19
|
In http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=154730 the O/P was asking about splitting a column and PARSENAME was suggested as a possible solution.I see PARESENAME recommended for this type of thing quite often. I have to admit I've never tried it, but it seems like a "kludge" to me. Is it really faster do you know? (given that there will be 3 sets of REPLACE function I presume?)May not be an issue, but it would also bother me that PARESENAME will silently remove any wrapping "[" ... "]", also embedded "." (and possibly other such characters too)Pity MS don't just provide an inbuilt splitter - native code is going to be much faster than the complex stuff we have to throw together ...My preference is an in-situ update which uses fewer CHARINDEX calls, and does not nest them, so "scales" better if there are many columns to be split:DECLARE @MyData TABLE( MyDelimitedData varchar(100), MyCol1 varchar(20), MyCol2 varchar(20), MyCol3 varchar(20))INSERT INTO @MyData( MyDelimitedData)SELECT '1|E001|Test1' UNION ALLSELECT '2|E002|Test2' UNION ALLSELECT '3|E003|Test3' UNION ALLSELECT '4|E004|Test4'-- Data with is poorly compatible with PARSENAME:UNION ALLSELECT '5.1|E005.5|Test5.5' UNION ALLSELECT '6|[E006]|[Test6]' UNION ALLSELECT '7|"E007"|"Test7"' UNION ALLSELECT '8|E008|Test8|Part8-4' UNION ALLSELECT '9|E009|Test9|Part9-4|Part9-5'SELECT SUBSTRING(MyDelimitedData,1,Charindex('|',MyDelimitedData)-1) AS MyCol1 , SUBSTRING(MyDelimitedData,Charindex('|',MyDelimitedData)+1, (Charindex('|',MyDelimitedData,CHARINDEX('|',MyDelimitedData)+1))-(Charindex('|',MyDelimitedData)+1)) AS MyCol2 , Reverse(SubString(REVERSE(MyDelimitedData),1,Charindex('|',REVERSE(MyDelimitedData))-1)) AS MyCol3FROM @MyDataSELECT [MyCol1] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 3), [MyCol2] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 2), [MyCol3] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 1)FROM @MyDataDECLARE @I1 int, @I2 int, @I3 int--UPDATE USET @I1 = CHARINDEX('|', MyDelimitedData + '|') , [MyCol1] = LEFT(MyDelimitedData, @I1-1) , @I2 = NullIf(CHARINDEX('|', MyDelimitedData + '|', @I1+1), 0) , [MyCol2] = SUBSTRING(MyDelimitedData, @I1+1, @I2-@I1-1) , @I3 = NullIf(CHARINDEX('|', MyDelimitedData + '|', @I2+1), 0) , [MyCol3] = SUBSTRING(MyDelimitedData, @I2+1, @I3-@I2-1)FROM @MyData AS U--SELECT MyCol1, MyCol2, MyCol3, MyDelimitedDataFROM @MyDataResults (comparing "correct" splitting with PARSENAME):MyCol1 MyCol2 MyCol3 MyDelimitedData------ ------ ------- ---------------1 E001 Test1 1|E001|Test12 E002 Test2 2|E002|Test23 E003 Test3 3|E003|Test34 E004 Test4 4|E004|Test45.1 E005.5 Test5.5 5.1|E005.5|Test5.56 [E006] [Test6] 6|[E006]|[Test6]7 "E007" "Test7" 7|"E007"|"Test7"8 E008 Test8 8|E008|Test8|Part8-49 E009 Test9 9|E009|Test9|Part9-4|Part9-5MyCol1 MyCol2 MyCol3------ ------ ------1 E001 Test12 E002 Test23 E003 Test34 E004 Test4NULL NULL NULL6 E006 Test67 E007 Test7E008 Test8 Part8-4NULL NULL NULL |
|
|
sunitabeck
Master Smack Fu Yak Hacker
5155 Posts |
Posted - 2011-01-03 : 08:09:40
|
quote: Originally posted by Kristen I see PARESENAME recommended for this type of thing quite often. I have to admit I've never tried it, but it seems like a "kludge" to me. Is it really faster do you know? (given that there will be 3 sets of REPLACE function I presume?)
I sort of share Kristen's feelings towards parsename - I recall reading somewhere that parsename was really meant for parsing the four-part naming convention (and consequently, it has a limitation of a maximum of four tokens that can be split). |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2011-01-03 : 11:43:27
|
| "I recall reading somewhere that parsename was really meant for parsing the four-part naming convention "That's exactly what it was for! I have changed the example to add some data that is PARSENAME "unfriendly". For me that would be cause-enough not to use it (unless someone convinces me otherwise) just on the off-chance that such data got into my data-stream.In particular an additional element mucking up the splitting because the splitting is from-RIGHT, rather than from-LEFT as would more normally be expected, and any inappropriate data causing all parts to return NULL. |
 |
|
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2011-01-04 : 02:18:33
|
| Parsename should be used cleverly in cases like the one specified in http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=154730 OP wants to split three part and update them to three different columns. Parsename can do it what is needed. So the usage of parsename depends on the nature of the data. I always suggest this function to split maximum of four parts. I would not suggest it for the example data you posted.MadhivananFailing to plan is Planning to fail |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2011-01-04 : 03:53:09
|
"I would not suggest it for the example data you posted"Thanks Madhi. I always worry that even "good" data can contain "bad" data (accidentally, through unintended program error, etc.)So maybe the answer is:SELECT [MyCol1] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 3), [MyCol2] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 2), [MyCol3] = PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 1)FROM @MyDataWHERE MyDelimitedData LIKE '%|%|%' AND MyDelimitedData NOT LIKE '%|%|%|%' AND MyDelimitedData NOT LIKE '%.%' to ensure that there only rows with 3 parts are included (and then some means is needed to report, by exception, on any data that does not have the correct pattern |
 |
|
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2011-01-04 : 04:32:48
|
Yes. You can filter like that. However, if a dot is part of data and delimeter is something else, you may need to handle it differently as fifth row has only three partsSELECT [MyCol1] = REPLACE(PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 3),'~','.'), [MyCol2] = REPLACE(PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 2),'~','.'), [MyCol3] = REPLACE(PARSENAME(REPLACE(MyDelimitedData, '|', '.'), 1),'~','.')FROM (select replace(MyDelimitedData,'.','~') as MyDelimitedData from @MyData) as tWHERE MyDelimitedData LIKE '%|%|%' AND MyDelimitedData NOT LIKE '%|%|%|%' MadhivananFailing to plan is Planning to fail |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2011-01-04 : 05:03:27
|
"if a dot is part of data and delimeter is something else, you may need to handle it differently"Yup. But this is, really, my original point - as at this point I don;t see why I would use PARSENAME (which is intended for a completely different job of course ...)So I'm back where I started - I think PARSENAME is the wrong tool for this job (notwithstanding that I think that PIPE-delimited data "stinks" in any case )FWIW we have people who register on Client sites with email addresses as "||||||@somedomain.com" - they know that pipe-delimited is a favourite for passing data around (best me why though ...) and are reducing their spam accordingly ... |
 |
|
|
madhivanan
Premature Yak Congratulator
22864 Posts |
|
|
|
|
|
|
|