Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
eehab
Starting Member
1 Post |
Posted - 2010-11-27 : 08:12:36
|
hi how can i split string in sql2005 and insert the result in table by for loop my regards |
|
MohanKandasamy
Starting Member
9 Posts |
Posted - 2010-11-27 : 08:20:29
|
Hi Please use the split (custom split function created by me) function to use the split dataINSERT INTO TABLE [YOUR TABLE]select * from [Split]( 'mohan,Kandasamy',',') create split which i mentioned in belowCREATE FUNCTION [dbo].[Split]( @rowData nvarchar(2000), @splitOn nvarchar(5)) RETURNS @rtnValue table ( Id int identity(1,1), Data nvarchar(100)) AS BEGIN Declare @count int Set @count = 1 While (Charindex(@splitOn,@rowData)>0) Begin Insert Into @rtnValue (data) Select Data = ltrim(rtrim(Substring(@rowData,1,Charindex(@splitOn,@rowData)-1))) Set @rowData = Substring(@rowData,Charindex(@splitOn,@rowData)+1,len(@rowData)) Set @count = @count + 1 End Insert Into @rtnValue (data) Select Data = ltrim(rtrim(@rowData)) ReturnENDGOMohan Kandasamy |
 |
|
Jeff Moden
Aged Yak Warrior
652 Posts |
Posted - 2010-11-28 : 17:20:21
|
A couple of faster options would be to 1) use an iTVF (inline Table Valued Function) and 2) use set based code instead of a While Loop. The following code does both. The comments explaining things are much longer than the code itself...DROP FUNCTION dbo.DelimitedSplit8KGO CREATE FUNCTION dbo.Split8K/*************************************************************************************************** Purpose: Split a given string at a given delimiter and return a list of the split elements (items). Returns: iTVF containing the following: ItemNumber = Element position of Item as a BIGINT (not converted to INT to eliminate a CAST) Item = Element value as a VARCHAR(8000) CROSS APPLY Usage Example:-----------------------------------------------------------------------------------------------------===== Conditionally drop the test tables to make reruns easier for testing. -- (this is NOT a part of the solution) IF OBJECT_ID('tempdb..#JBMTest') IS NOT NULL DROP TABLE #JBMTest;--===== Create and populate a test table on the fly (this is NOT a part of the solution). SELECT * INTO #JBMTest FROM ( SELECT 1,'1,10,100,1000,10000,100000,1000000' UNION ALL SELECT 2,'2000000,200000,20000,2000,200,20,2' UNION ALL SELECT 3, 'This,is,a,test' UNION ALL SELECT 4, 'and so is this' UNION ALL SELECT 5, 'This, too (no pun intended)' ) d (SomeID,SomeValue);GO--===== Split the CSV column for the whole table using CROSS APPLY (this is the solution) SELECT test.SomeID, split.ItemNumber, split.Item FROM #JBMTest test CROSS APPLY ( SELECT ItemNumber, Item FROM dbo.DelimitedSplit8k(test.SomeValue,',') ) split;--------------------------------------------------------------------------------------------------- Notes: 1. Optimized for VARCHAR(7999) or less. No testing or error reporting for truncation at 7999 characters is done. 2. Optimized for single character delimiter. Multi-character delimiters should be resolved externally from this function. 3. Optimized for use with CROSS APPLY. 4. Does not "trim" elements just in case leading or trailing blanks are intended. 5. If you don't know how a Tally table can be used to replace loops, please see the following... http://www.sqlservercentral.com/articles/T-SQL/62867/ 6. Changing this function to use VARCHAR(MAX) will cause it to run twice as slow. It's just the nature of VARCHAR(MAX) whether it fits in-row or not. Some recovery of speed can be realized by doing datatype matching for CHARINDEX which changes it's datatype for VARCHAR(MAX). 7. Multi-machine testing for the method of using UNPIVOT instead of 10 SELECT/UNION ALLs shows that the UNPIVOT method is quite machine dependent and can slow things down quite a bit. 8. Performance testing shows using "TOP" for the limiting criteria of "N" is actually slower and slightly more CPU intensive than the traditional WHERE N < LEN(@pString) + 2. 9. Performance testing shows using ORDER BY (SELECT x) where "x" is anything is actually slower and slightly more CPU intensive than the traditional ORDER BY (SELECT N).***************************************************************************************************/--===== Define I/O parameters ( @pString VARCHAR(7999), @pDelimiter CHAR(1) )RETURNS TABLE WITH SCHEMABINDINGASRETURN--===== "Inline" CTE Driven "Tally Table” produces values up to -- 10,000... enough to cover VARCHAR(8000)WITH E1(N) AS ( --=== Create Ten 1's SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 --10 ), E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100 E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4) --===== Do the split SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber, SUBSTRING(@pString, N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Item FROM cteTally WHERE N <= DATALENGTH(@pString)+1 --DATATLENGTH allows for trailing space delimiters AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter;GO --Jeff Moden |
 |
|
glendcruz
Yak Posting Veteran
60 Posts |
Posted - 2010-11-28 : 18:24:27
|
It is a perfect function, but you have one error, you have used a have dropped ---FROM dbo.DelimitedSplit8k(test.SomeValue,',')it should be replaced with dbo.split8k(test.SomeValue,',')Thankyou. A couple of faster options would be to 1) use an iTVF (inline Table Valued Function) and 2) use set based code instead of a While Loop. The following code does both. The comments explaining things are much longer than the code itself...DROP FUNCTION dbo.DelimitedSplit8KGO CREATE FUNCTION dbo.Split8K/*************************************************************************************************** Purpose: Split a given string at a given delimiter and return a list of the split elements (items). Returns: iTVF containing the following: ItemNumber = Element position of Item as a BIGINT (not converted to INT to eliminate a CAST) Item = Element value as a VARCHAR(8000) CROSS APPLY Usage Example:-----------------------------------------------------------------------------------------------------===== Conditionally drop the test tables to make reruns easier for testing. -- (this is NOT a part of the solution) IF OBJECT_ID('tempdb..#JBMTest') IS NOT NULL DROP TABLE #JBMTest;--===== Create and populate a test table on the fly (this is NOT a part of the solution). SELECT * INTO #JBMTest FROM ( SELECT 1,'1,10,100,1000,10000,100000,1000000' UNION ALL SELECT 2,'2000000,200000,20000,2000,200,20,2' UNION ALL SELECT 3, 'This,is,a,test' UNION ALL SELECT 4, 'and so is this' UNION ALL SELECT 5, 'This, too (no pun intended)' ) d (SomeID,SomeValue);GO--===== Split the CSV column for the whole table using CROSS APPLY (this is the solution) SELECT test.SomeID, split.ItemNumber, split.Item FROM #JBMTest test CROSS APPLY ( SELECT ItemNumber, Item FROM dbo.DelimitedSplit8k(test.SomeValue,',') ) split;--------------------------------------------------------------------------------------------------- Notes: 1. Optimized for VARCHAR(7999) or less. No testing or error reporting for truncation at 7999 characters is done. 2. Optimized for single character delimiter. Multi-character delimiters should be resolved externally from this function. 3. Optimized for use with CROSS APPLY. 4. Does not "trim" elements just in case leading or trailing blanks are intended. 5. If you don't know how a Tally table can be used to replace loops, please see the following... http://www.sqlservercentral.com/articles/T-SQL/62867/ 6. Changing this function to use VARCHAR(MAX) will cause it to run twice as slow. It's just the nature of VARCHAR(MAX) whether it fits in-row or not. Some recovery of speed can be realized by doing datatype matching for CHARINDEX which changes it's datatype for VARCHAR(MAX). 7. Multi-machine testing for the method of using UNPIVOT instead of 10 SELECT/UNION ALLs shows that the UNPIVOT method is quite machine dependent and can slow things down quite a bit. 8. Performance testing shows using "TOP" for the limiting criteria of "N" is actually slower and slightly more CPU intensive than the traditional WHERE N < LEN(@pString) + 2. 9. Performance testing shows using ORDER BY (SELECT x) where "x" is anything is actually slower and slightly more CPU intensive than the traditional ORDER BY (SELECT N).***************************************************************************************************/--===== Define I/O parameters ( @pString VARCHAR(7999), @pDelimiter CHAR(1) )RETURNS TABLE WITH SCHEMABINDINGASRETURN--===== "Inline" CTE Driven "Tally Table” produces values up to -- 10,000... enough to cover VARCHAR(8000)WITH E1(N) AS ( --=== Create Ten 1's SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 --10 ), E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100 E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4) --===== Do the split SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber, SUBSTRING(@pString, N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Item FROM cteTally WHERE N <= DATALENGTH(@pString)+1 --DATATLENGTH allows for trailing space delimiters AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter;GO --Jeff Moden[/quote] |
 |
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2010-11-29 : 05:25:10
|
Another method would be to use XMLDeclare @Text Varchar(100),@delimiter NVARCHAR(5)Set @Text = 'can you split this?'set @delimiter = ' 'Declare @textXml XmlSelect @TextXml = Cast('<d>' + Replace(@Text, @Delimiter,'</d><d>') + '</d>' As Xml );SELECT T.split.value('.', 'nvarchar(max)') AS dataFROM @textXML.nodes('/d') T (split)MadhivananFailing to plan is Planning to fail |
 |
|
glendcruz
Yak Posting Veteran
60 Posts |
Posted - 2010-11-29 : 17:55:51
|
Thanks Madavan,I was looking for something like this, but i cannot understand this concept to be honest about it specifically this part FROM @textXML.nodes('/d') T (split)quote: Originally posted by madhivanan Another method would be to use XMLDeclare @Text Varchar(100),@delimiter NVARCHAR(5)Set @Text = 'can you split this?'set @delimiter = ' 'Declare @textXml XmlSelect @TextXml = Cast('<d>' + Replace(@Text, @Delimiter,'</d><d>') + '</d>' As Xml );SELECT T.split.value('.', 'nvarchar(max)') AS dataFROM @textXML.nodes('/d') T (split)MadhivananFailing to plan is Planning to fail
|
 |
|
Jeff Moden
Aged Yak Warrior
652 Posts |
Posted - 2010-11-30 : 00:49:54
|
@Madhivanan,Hey there, ol' friend. There are (IIRC) three different XML splitters out there and two of them are pretty bad for performance. Your's is a bit different than any of the other 3 I've seen. If you'd convert it to a function (I hate to change other people's code and then test it... I don't want to muck it up), I'd be happy to run this through the CSV Gin Mill.--Jeff Moden |
 |
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2010-11-30 : 05:40:43
|
quote: Originally posted by Jeff Moden @Madhivanan,Hey there, ol' friend. There are (IIRC) three different XML splitters out there and two of them are pretty bad for performance. Your's is a bit different than any of the other 3 I've seen. If you'd convert it to a function (I hate to change other people's code and then test it... I don't want to muck it up), I'd be happy to run this through the CSV Gin Mill.--Jeff Moden
What are the other three methods that involve XML?I converted the code as a functioncreate function dbo.split_xml( @data varchar(8000), @delimiter char(1))returns @split table(data varchar(8000))asbeginDeclare @textXml XmlSelect @TextXml = Cast('<d>' + Replace(@data, @delimiter,'</d><d>') + '</d>' As Xml )insert into @split(data)SELECT T.split.value('.', 'nvarchar(max)') AS dataFROM @textXML.nodes('/d') T (split)returnendGO--Usageselect * from dbo.split_xml('can you split this?',' ') Can you run a test?MadhivananFailing to plan is Planning to fail |
 |
|
Jeff Moden
Aged Yak Warrior
652 Posts |
Posted - 2010-11-30 : 23:39:43
|
I can do better than run a test. I can share all the test code with you and the other good folks who may also wish to run it and post their results.First, here's the code to build random test data into a CSV. As always, the details are in the comments of the code.--=================================================================================================-- Create a CSV test table with 10000 rows and 10 random CSV elements per row.-- The randomness of the elements also prevents the delimiters for showing up in the same -- position for each row. SQL Server would figure that out and cache the information making-- some splitting techniques seem faster than they really are.-- This section of the code takes just a couple of seconds to run because XML concatenation-- is very fast (especially when compared to XML splitting or shredding).-- Jeff Moden--=================================================================================================--===== Declare and preset a couple of control variablesDECLARE @NumberOfRows INT, @NumberOfElementsPerRow INT; SELECT @NumberOfRows = 10000, @NumberOfElementsPerRow = 10;--===== Conditionally drop the test table to make reruns easier IF OBJECT_ID('TempDB..#CsvTest','U') IS NOT NULL DROP TABLE #CsvTest;--===== This creates and populates a test table on the fly containing a -- sequential column and a randomly generated CSV Parameter column. SELECT TOP (@NumberOfRows) --Controls the number of rows in the test table ISNULL(ROW_NUMBER() OVER (ORDER BY(SELECT NULL)),0) AS RowNum, ( SELECT CAST(STUFF( --=== STUFF get's rid of the leading comma ( --=== This builds CSV row with a leading comma SELECT TOP (@NumberOfElementsPerRow) --Controls the number of CSV elements in each row ','+CAST(ABS(CHECKSUM(NEWID()))%100000 AS VARCHAR(10)) FROM Master.sys.All_Columns ac3 --Classic cross join pseudo-cursor CROSS JOIN Master.sys.All_Columns ac4 --can produce row sets up 16 million. WHERE ac3.Object_ID <> ac1.Object_ID --Without this line, all rows would be the same. FOR XML PATH('') ) ,1,1,'') AS NVARCHAR(MAX)) ) AS CsvParameter INTO #CsvTest FROM Master.sys.All_Columns ac1 --Classic cross join pseudo-cursor CROSS JOIN Master.sys.All_Columns ac2 --can produce row sets up 16 million;--===== Let's add a PK just for grins. Since it's a temp table, we won't name it. ALTER TABLE #CsvTest ADD PRIMARY KEY CLUSTERED (RowNum) WITH FILLFACTOR = 100;--===== Sanity check... let's see what we've got in the test table SELECT * FROM #CsvTest; Here's the test code I used for the functions that Madhivanan and I posted. I threw in the While Loop code that was posted, as well... DBCC FREEPROCCACHE PRINT '========== Tally-on-the-fly Splitter =========='; SET STATISTICS TIME ON; SELECT src.RowNum, split.* FROM #CsvTest src CROSS APPLY dbo.Split8K(src.CsvParameter,',') split; SET STATISTICS TIME OFF; DBCC FREEPROCCACHE PRINT '========== XML Splitter =========='; SET STATISTICS TIME ON; SELECT src.RowNum, split.* FROM #CsvTest src CROSS APPLY dbo.split_xml(src.CsvParameter,',') split; SET STATISTICS TIME OFF; DBCC FREEPROCCACHE PRINT '========== While Loop Splitter =========='; SET STATISTICS TIME ON; SELECT src.RowNum, split.* FROM #CsvTest src CROSS APPLY dbo.split(src.CsvParameter,',') split; SET STATISTICS TIME OFF; Here's what I get for 10,000 rows of 10 elements each...DBCC execution completed. If DBCC printed error messages, contact your system administrator.========== Tally-on-the-fly Splitter ==========(100000 row(s) affected)SQL Server Execution Times: CPU time = 2469 ms, elapsed time = 6273 ms.DBCC execution completed. If DBCC printed error messages, contact your system administrator.========== XML Splitter ==========(100000 row(s) affected)SQL Server Execution Times: CPU time = 25093 ms, elapsed time = 31378 ms.DBCC execution completed. If DBCC printed error messages, contact your system administrator.========== While Loop Splitter ==========(100000 row(s) affected)SQL Server Execution Times: CPU time = 43875 ms, elapsed time = 66742 ms. Here's what I get for 10,000 rows of 100 elements each (I didn't include the While Loop on this one)...DBCC execution completed. If DBCC printed error messages, contact your system administrator.========== Tally-on-the-fly Splitter ==========(1000000 row(s) affected)SQL Server Execution Times: CPU time = 60281 ms, elapsed time = 98922 ms.DBCC execution completed. If DBCC printed error messages, contact your system administrator.========== XML Splitter ==========(1000000 row(s) affected)SQL Server Execution Times: CPU time = 104922 ms, elapsed time = 155466 ms. For shorter length CSV's, XML is probably not the way to go. We should all probably steer clear of the While Loop for splitting. --Jeff Moden |
 |
|
Jeff Moden
Aged Yak Warrior
652 Posts |
Posted - 2010-11-30 : 23:41:10
|
p.s. I forgot to mention that I'm running an 8 year old, 1.8GHz single CPU w/1GB of RAM using SQL Server 2005 Developer's edition--Jeff Moden |
 |
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2010-12-01 : 01:57:31
|
Thanks for that Jeff Moden MadhivananFailing to plan is Planning to fail |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-12-01 : 04:07:34
|
Jeff: There are some Stats for other functions posted here:http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648&whichpage=2#305425Dunno how comparable they are, but I'd be interested in the most performant variation in current SQL versions.We now have a range of split functions - depending on whether source data is VARCHAR(8000) or VARCHAR(MAX) and whether the values we want are INT or CHAR.I am disappointed that the best result is XML and involves a REPLACE on the whole string [delimiter] before starting. Presumably this means that the splitting of TAGs in XML is a carefully hand-optimised Machine Code routine, and the cost of REPLACE is offset by the gain from the XML handler. If this knocks spots off the performance of Set-based methods why don't MS build a hand-optimised machine code split function into SQL?The is some interesting stuff in Erland Sommarskog articles on Arrays and Lists "Using Table-Valued Parameters" www.sommarskog.se/arrays-in-sql-2008.htmlSplit using non "Table-Valued Parameters" methods http://www.sommarskog.se/arrays-in-sql-2005.htmlincluding setting the Delimiter Comparison to binary collation - on the grounds that delimiters like "," don't need foreign accent consideration - which I would not have thought of |
 |
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2010-12-01 : 09:09:11
|
Jeff, can you highlight the change you have made?MadhivananFailing to plan is Planning to fail |
 |
|
Jeff Moden
Aged Yak Warrior
652 Posts |
Posted - 2010-12-01 : 09:21:05
|
Thanks for the feedback, Kristen. I watched that first link as it unfolded way back in 2007 (I'm the 18th post on that link asking a question about the slothful recursive CTE method). Also, being in America, I always install with the default collation which also means I don't have the performance problem that some collations offer and I almost always forget to add the binary COLLATE clause to my examples. I really need to get into the habit of doing that because, as Erland and others have demonstrated with code, it can make a huge difference if you have a non-default collation selected for the server in this global environment we call T-SQL.One of the other things that people forget that sometimes makes Tally table methods look real bad for larger things is the fact that CHARINDEX changes personality depending on the datatypes of it's operands. CHARINDEX usually returns an INT but if you use it on a MAX datatype, it will return a BIGINT and, depending on how you use that result, you can get table scans on the Tally (Numbers) Table or other external table where you'd normally enjoy a SEEK and short range scan. Same could happen with the CTE method I posted (because ROW_NUMBER returns a BIGINT) except the datatype precedence works in the correct direction there so no problem with the pseudo-Tally Table, but it can still mess up when joined with other tables (they all can... not just Tally Table based code). Oddly enough, the RBAR in a While Loop based splitter prevents such a problem but you still have the slowness of a While Loop.It's also worth mentioning that no one method is the best method for every situation. It's also worth mentioning that a very well written CLR will usually beat all the methods with a real Tally table comining in very close behind the CLR for performance on shorter length splits.--Jeff Moden |
 |
|
Jeff Moden
Aged Yak Warrior
652 Posts |
Posted - 2010-12-01 : 09:46:00
|
quote: Originally posted by madhivanan Jeff, can you highlight the change you have made?MadhivananFailing to plan is Planning to fail
First, thanks for the feedback, Madhivanan. It's always a pleasure working with you.I'm not sure what you mean, though. What change do you speak of?--Jeff Moden |
 |
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2010-12-07 : 02:32:50
|
quote: Originally posted by Jeff Moden
quote: Originally posted by madhivanan Jeff, can you highlight the change you have made?MadhivananFailing to plan is Planning to fail
First, thanks for the feedback, Madhivanan. It's always a pleasure working with you.I'm not sure what you mean, though. What change do you speak of?--Jeff Moden
In the post where you compared the different methods, it says at the endEdited by - Jeff Moden on 12/01/2010 08:49:18 whereas you posted it at 11/30/2010 23:39:43MadhivananFailing to plan is Planning to fail |
 |
|
Jeff Moden
Aged Yak Warrior
652 Posts |
Posted - 2010-12-07 : 22:13:46
|
Ah... that. There was nothing in the code or the test results that I changed. I had made a comment and thought afterwards about WHILE Loops and decided that some folks might have taken it the wrong way so I changed it to "We should all probably steer clear of the While Loop for splitting."--Jeff Moden |
 |
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2010-12-20 : 06:59:30
|
quote: Originally posted by Jeff Moden Ah... that. There was nothing in the code or the test results that I changed. I had made a comment and thought afterwards about WHILE Loops and decided that some folks might have taken it the wrong way so I changed it to "We should all probably steer clear of the While Loop for splitting."--Jeff Moden
Ok. No problem MadhivananFailing to plan is Planning to fail |
 |
|
|
|
|
|
|