how can i split string in sql2005

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

how can i split string in sql2005

Author

Topic

eehab
Starting Member

1 Post

Posted - 2010-11-27 : 08:12:36

hi how can i split string in sql2005 and insert the result in table by for loop
my regards

MohanKandasamy
Starting Member

9 Posts

Posted - 2010-11-27 : 08:20:29

Hi
Please use the split (custom split function created by me) function to use the split data

INSERT INTO TABLE [YOUR TABLE]
select * from [Split]( 'mohan,Kandasamy',',')

create split which i mentioned in below

CREATE FUNCTION [dbo].[Split]
(
@rowData nvarchar(2000),
@splitOn nvarchar(5)
)
RETURNS @rtnValue table
(
Id int identity(1,1),
Data nvarchar(100)
)
AS
BEGIN
Declare @count int
Set @count = 1

While (Charindex(@splitOn,@rowData)>0)
Begin
Insert Into @rtnValue (data)
Select
Data = ltrim(rtrim(Substring(@rowData,1,Charindex(@splitOn,@rowData)-1)))

Set @rowData = Substring(@rowData,Charindex(@splitOn,@rowData)+1,len(@rowData))
Set @count = @count + 1
End

Insert Into @rtnValue (data)
Select Data = ltrim(rtrim(@rowData))

Return
END
GO

Mohan Kandasamy

Jeff Moden
Aged Yak Warrior

652 Posts

Posted - 2010-11-28 : 17:20:21

A couple of faster options would be to 1) use an iTVF (inline Table Valued Function) and 2) use set based code instead of a While Loop. The following code does both. The comments explaining things are much longer than the code itself...


DROP FUNCTION dbo.DelimitedSplit8K
GO
 CREATE FUNCTION dbo.Split8K
/***************************************************************************************************
 Purpose:
 Split a given string at a given delimiter and return a list of the split elements (items).

 Returns:
 iTVF containing the following:
 ItemNumber = Element position of Item as a BIGINT (not converted to INT to eliminate a CAST)
 Item       = Element value as a VARCHAR(8000)

 CROSS APPLY Usage Example:
---------------------------------------------------------------------------------------------------
--===== Conditionally drop the test tables to make reruns easier for testing.
     -- (this is NOT a part of the solution)
     IF OBJECT_ID('tempdb..#JBMTest') IS NOT NULL 
        DROP TABLE #JBMTest
;
--===== Create and populate a test table on the fly (this is NOT a part of the solution).
 SELECT *
   INTO #JBMTest
   FROM (
         SELECT 1,'1,10,100,1000,10000,100000,1000000' UNION ALL
         SELECT 2,'2000000,200000,20000,2000,200,20,2' UNION ALL
         SELECT 3, 'This,is,a,test'                    UNION ALL
         SELECT 4, 'and so is this'                    UNION ALL
         SELECT 5, 'This, too (no pun intended)'
        ) d (SomeID,SomeValue)
;
GO
--===== Split the CSV column for the whole table using CROSS APPLY (this is the solution)
 SELECT test.SomeID, split.ItemNumber, split.Item
   FROM #JBMTest test
  CROSS APPLY
        (
         SELECT ItemNumber, Item
           FROM dbo.DelimitedSplit8k(test.SomeValue,',')
        ) split
;
---------------------------------------------------------------------------------------------------
 Notes:
 1. Optimized for VARCHAR(7999) or less.  No testing or error reporting for truncation at 7999
    characters is done.
 2. Optimized for single character delimiter.  Multi-character delimiters should be resolved
    externally from this function.
 3. Optimized for use with CROSS APPLY.
 4. Does not "trim" elements just in case leading or trailing blanks are intended.
 5. If you don't know how a Tally table can be used to replace loops, please see the following...
    http://www.sqlservercentral.com/articles/T-SQL/62867/
 6. Changing this function to use VARCHAR(MAX) will cause it to run twice as slow.  It's just the 
    nature of VARCHAR(MAX) whether it fits in-row or not.  Some recovery of speed can be realized
    by doing datatype matching for CHARINDEX which changes it's datatype for VARCHAR(MAX).
 7. Multi-machine testing for the method of using UNPIVOT instead of 10 SELECT/UNION ALLs shows
    that the UNPIVOT method is quite machine dependent and can slow things down quite a bit.
 8. Performance testing shows using "TOP" for the limiting criteria of "N" is actually 
    slower and slightly more CPU intensive than the traditional WHERE N < LEN(@pString) + 2. 
 9. Performance testing shows using ORDER BY (SELECT x) where "x" is anything is actually
    slower and slightly more CPU intensive than the traditional ORDER BY (SELECT N).
***************************************************************************************************/
--===== Define I/O parameters
        (
        @pString    VARCHAR(7999),
        @pDelimiter CHAR(1)
        )
RETURNS TABLE
   WITH SCHEMABINDING
AS
RETURN
--===== "Inline" CTE Driven "Tally Table” produces values up to
     -- 10,000... enough to cover VARCHAR(8000)
WITH
      E1(N) AS ( --=== Create Ten 1's
                 SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 --10
               ),
      E2(N) AS (SELECT 1 FROM E1 a, E1 b),   --100
      E4(N) AS (SELECT 1 FROM E2 a, E2 b),   --10,000
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4)  
--===== Do the split
 SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,
        SUBSTRING(@pString, N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Item
   FROM cteTally
  WHERE N <= DATALENGTH(@pString)+1 --DATATLENGTH allows for trailing space delimiters
    AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter
;
GO

--Jeff Moden

glendcruz
Yak Posting Veteran

60 Posts

Posted - 2010-11-28 : 18:24:27

It is a perfect function, but you have one error, you have used a have dropped ---FROM dbo.DelimitedSplit8k(test.SomeValue,',')

it should be replaced with
dbo.split8k(test.SomeValue,',')

Thankyou.

A couple of faster options would be to 1) use an iTVF (inline Table Valued Function) and 2) use set based code instead of a While Loop. The following code does both. The comments explaining things are much longer than the code itself...


DROP FUNCTION dbo.DelimitedSplit8K
GO
 CREATE FUNCTION dbo.Split8K
/***************************************************************************************************
 Purpose:
 Split a given string at a given delimiter and return a list of the split elements (items).

 Returns:
 iTVF containing the following:
 ItemNumber = Element position of Item as a BIGINT (not converted to INT to eliminate a CAST)
 Item       = Element value as a VARCHAR(8000)

 CROSS APPLY Usage Example:
---------------------------------------------------------------------------------------------------
--===== Conditionally drop the test tables to make reruns easier for testing.
     -- (this is NOT a part of the solution)
     IF OBJECT_ID('tempdb..#JBMTest') IS NOT NULL 
        DROP TABLE #JBMTest
;
--===== Create and populate a test table on the fly (this is NOT a part of the solution).
 SELECT *
   INTO #JBMTest
   FROM (
         SELECT 1,'1,10,100,1000,10000,100000,1000000' UNION ALL
         SELECT 2,'2000000,200000,20000,2000,200,20,2' UNION ALL
         SELECT 3, 'This,is,a,test'                    UNION ALL
         SELECT 4, 'and so is this'                    UNION ALL
         SELECT 5, 'This, too (no pun intended)'
        ) d (SomeID,SomeValue)
;
GO
--===== Split the CSV column for the whole table using CROSS APPLY (this is the solution)
 SELECT test.SomeID, split.ItemNumber, split.Item
   FROM #JBMTest test
  CROSS APPLY
        (
         SELECT ItemNumber, Item
           FROM dbo.DelimitedSplit8k(test.SomeValue,',')
        ) split
;
---------------------------------------------------------------------------------------------------
 Notes:
 1. Optimized for VARCHAR(7999) or less.  No testing or error reporting for truncation at 7999
    characters is done.
 2. Optimized for single character delimiter.  Multi-character delimiters should be resolved
    externally from this function.
 3. Optimized for use with CROSS APPLY.
 4. Does not "trim" elements just in case leading or trailing blanks are intended.
 5. If you don't know how a Tally table can be used to replace loops, please see the following...
    http://www.sqlservercentral.com/articles/T-SQL/62867/
 6. Changing this function to use VARCHAR(MAX) will cause it to run twice as slow.  It's just the 
    nature of VARCHAR(MAX) whether it fits in-row or not.  Some recovery of speed can be realized
    by doing datatype matching for CHARINDEX which changes it's datatype for VARCHAR(MAX).
 7. Multi-machine testing for the method of using UNPIVOT instead of 10 SELECT/UNION ALLs shows
    that the UNPIVOT method is quite machine dependent and can slow things down quite a bit.
 8. Performance testing shows using "TOP" for the limiting criteria of "N" is actually 
    slower and slightly more CPU intensive than the traditional WHERE N < LEN(@pString) + 2. 
 9. Performance testing shows using ORDER BY (SELECT x) where "x" is anything is actually
    slower and slightly more CPU intensive than the traditional ORDER BY (SELECT N).
***************************************************************************************************/
--===== Define I/O parameters
        (
        @pString    VARCHAR(7999),
        @pDelimiter CHAR(1)
        )
RETURNS TABLE
   WITH SCHEMABINDING
AS
RETURN
--===== "Inline" CTE Driven "Tally Table” produces values up to
     -- 10,000... enough to cover VARCHAR(8000)
WITH
      E1(N) AS ( --=== Create Ten 1's
                 SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 --10
               ),
      E2(N) AS (SELECT 1 FROM E1 a, E1 b),   --100
      E4(N) AS (SELECT 1 FROM E2 a, E2 b),   --10,000
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4)  
--===== Do the split
 SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,
        SUBSTRING(@pString, N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Item
   FROM cteTally
  WHERE N <= DATALENGTH(@pString)+1 --DATATLENGTH allows for trailing space delimiters
    AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter
;
GO

--Jeff Moden
[/quote]

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2010-11-29 : 05:25:10

Another method would be to use XML

Declare @Text Varchar(100),@delimiter NVARCHAR(5)
Set @Text = 'can you split this?'
set @delimiter = ' '
Declare @textXml Xml
Select @TextXml = Cast('<d>' + Replace(@Text, @Delimiter,'</d><d>') + '</d>' As Xml );
SELECT T.split.value('.', 'nvarchar(max)') AS data
FROM @textXML.nodes('/d') T (split)

Madhivanan

Failing to plan is Planning to fail

glendcruz
Yak Posting Veteran

60 Posts

Posted - 2010-11-29 : 17:55:51

Thanks Madavan,
I was looking for something like this, but i cannot understand this concept to be honest about it specifically this part FROM @textXML.nodes('/d') T (split)

quote:
Originally posted by madhivanan

Another method would be to use XML

Declare @Text Varchar(100),@delimiter NVARCHAR(5)
Set @Text = 'can you split this?'
set @delimiter = ' '
Declare @textXml Xml
Select @TextXml = Cast('<d>' + Replace(@Text, @Delimiter,'</d><d>') + '</d>' As Xml );
SELECT T.split.value('.', 'nvarchar(max)') AS data
FROM @textXML.nodes('/d') T (split)

Madhivanan

Failing to plan is Planning to fail

Jeff Moden
Aged Yak Warrior

652 Posts

Posted - 2010-11-30 : 00:49:54

@Madhivanan,

Hey there, ol' friend. There are (IIRC) three different XML splitters out there and two of them are pretty bad for performance. Your's is a bit different than any of the other 3 I've seen. If you'd convert it to a function (I hate to change other people's code and then test it... I don't want to muck it up), I'd be happy to run this through the CSV Gin Mill.

--Jeff Moden

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2010-11-30 : 05:40:43

quote:
Originally posted by Jeff Moden

@Madhivanan,

Hey there, ol' friend. There are (IIRC) three different XML splitters out there and two of them are pretty bad for performance. Your's is a bit different than any of the other 3 I've seen. If you'd convert it to a function (I hate to change other people's code and then test it... I don't want to muck it up), I'd be happy to run this through the CSV Gin Mill.

--Jeff Moden

What are the other three methods that involve XML?

I converted the code as a function


create function dbo.split_xml
(
	@data varchar(8000),
	@delimiter char(1)
)
returns @split table(data varchar(8000))
as
begin
Declare @textXml Xml
Select @TextXml = Cast('<d>' + Replace(@data, @delimiter,'</d><d>') + '</d>' As Xml )
insert into @split(data)
SELECT T.split.value('.', 'nvarchar(max)') AS data
FROM @textXML.nodes('/d') T (split)
return
end

GO

--Usage
select * from dbo.split_xml('can you split this?',' ')

Can you run a test?

Madhivanan

Failing to plan is Planning to fail

Jeff Moden
Aged Yak Warrior

652 Posts

Posted - 2010-11-30 : 23:39:43

I can do better than run a test. I can share all the test code with you and the other good folks who may also wish to run it and post their results.

First, here's the code to build random test data into a CSV. As always, the details are in the comments of the code.

--=================================================================================================
--      Create a CSV test table with 10000 rows and 10 random CSV elements per row.
--      The randomness of the elements also prevents the delimiters for showing up in the same 
--      position for each row.  SQL Server would figure that out and cache the information making
--      some splitting techniques seem faster than they really are.
--      This section of the code takes just a couple of seconds to run because XML concatenation
--      is very fast (especially when compared to XML splitting or shredding).
-- Jeff Moden
--=================================================================================================
--===== Declare and preset a couple of control variables
DECLARE @NumberOfRows           INT,
        @NumberOfElementsPerRow INT
;
 SELECT @NumberOfRows           = 10000,
        @NumberOfElementsPerRow = 10
;
--===== Conditionally drop the test table to make reruns easier
     IF OBJECT_ID('TempDB..#CsvTest','U') IS NOT NULL
        DROP TABLE #CsvTest
;
--===== This creates and populates a test table on the fly containing a
     -- sequential column and a randomly generated CSV Parameter column.
 SELECT TOP (@NumberOfRows) --Controls the number of rows in the test table
        ISNULL(ROW_NUMBER() OVER (ORDER BY(SELECT NULL)),0) AS RowNum,
        (
         SELECT CAST(STUFF( --=== STUFF get's rid of the leading comma
                ( --=== This builds CSV row with a leading comma
                 SELECT TOP (@NumberOfElementsPerRow) --Controls the number of CSV elements in each row
                        ','+CAST(ABS(CHECKSUM(NEWID()))%100000 AS VARCHAR(10))
                   FROM Master.sys.All_Columns ac3      --Classic cross join pseudo-cursor
                  CROSS JOIN Master.sys.All_Columns ac4 --can produce row sets up 16 million.
                  WHERE ac3.Object_ID <> ac1.Object_ID   --Without this line, all rows would be the same.
                    FOR XML PATH('')
                )
                ,1,1,'') AS NVARCHAR(MAX))
        ) AS CsvParameter
   INTO #CsvTest
   FROM Master.sys.All_Columns ac1      --Classic cross join pseudo-cursor
  CROSS JOIN Master.sys.All_Columns ac2 --can produce row sets up 16 million
;
--===== Let's add a PK just for grins.  Since it's a temp table, we won't name it.
  ALTER TABLE #CsvTest
    ADD PRIMARY KEY CLUSTERED (RowNum) WITH FILLFACTOR = 100
;
--===== Sanity check... let's see what we've got in the test table
 SELECT * FROM #CsvTest
;

Here's the test code I used for the functions that Madhivanan and I posted. I threw in the While Loop code that was posted, as well...

   DBCC FREEPROCCACHE
  PRINT '========== Tally-on-the-fly Splitter ==========';
    SET STATISTICS TIME ON;
 SELECT src.RowNum, split.* 
   FROM #CsvTest src
  CROSS APPLY dbo.Split8K(src.CsvParameter,',') split;
    SET STATISTICS TIME OFF;

   DBCC FREEPROCCACHE
  PRINT '========== XML Splitter ==========';
    SET STATISTICS TIME ON;
 SELECT src.RowNum, split.*
   FROM #CsvTest src
  CROSS APPLY dbo.split_xml(src.CsvParameter,',') split;
    SET STATISTICS TIME OFF;

   DBCC FREEPROCCACHE
  PRINT '========== While Loop Splitter ==========';
    SET STATISTICS TIME ON;
 SELECT src.RowNum, split.*
   FROM #CsvTest src
  CROSS APPLY dbo.split(src.CsvParameter,',') split;
    SET STATISTICS TIME OFF;

Here's what I get for 10,000 rows of 10 elements each...

DBCC execution completed. If DBCC printed error messages, contact your system administrator.
========== Tally-on-the-fly Splitter ==========

(100000 row(s) affected)

SQL Server Execution Times:
   CPU time = 2469 ms,  elapsed time = 6273 ms.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
========== XML Splitter ==========

(100000 row(s) affected)

SQL Server Execution Times:
   CPU time = 25093 ms,  elapsed time = 31378 ms.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
========== While Loop Splitter ==========

(100000 row(s) affected)

SQL Server Execution Times:
   CPU time = 43875 ms,  elapsed time = 66742 ms.

Here's what I get for 10,000 rows of 100 elements each (I didn't include the While Loop on this one)...

DBCC execution completed. If DBCC printed error messages, contact your system administrator.
========== Tally-on-the-fly Splitter ==========

(1000000 row(s) affected)

SQL Server Execution Times:
   CPU time = 60281 ms,  elapsed time = 98922 ms.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
========== XML Splitter ==========

(1000000 row(s) affected)

SQL Server Execution Times:
   CPU time = 104922 ms,  elapsed time = 155466 ms.

For shorter length CSV's, XML is probably not the way to go. We should all probably steer clear of the While Loop for splitting.

--Jeff Moden

Jeff Moden
Aged Yak Warrior

652 Posts

Posted - 2010-11-30 : 23:41:10

p.s. I forgot to mention that I'm running an 8 year old, 1.8GHz single CPU w/1GB of RAM using SQL Server 2005 Developer's edition

--Jeff Moden

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2010-12-01 : 01:57:31

Thanks for that Jeff Moden

Madhivanan

Failing to plan is Planning to fail

Kristen
Test

22859 Posts

Posted - 2010-12-01 : 04:07:34

Jeff: There are some Stats for other functions posted here:

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648&whichpage=2#305425

Dunno how comparable they are, but I'd be interested in the most performant variation in current SQL versions.

We now have a range of split functions - depending on whether source data is VARCHAR(8000) or VARCHAR(MAX) and whether the values we want are INT or CHAR.

I am disappointed that the best result is XML and involves a REPLACE on the whole string [delimiter] before starting. Presumably this means that the splitting of TAGs in XML is a carefully hand-optimised Machine Code routine, and the cost of REPLACE is offset by the gain from the XML handler. If this knocks spots off the performance of Set-based methods why don't MS build a hand-optimised machine code split function into SQL?

The is some interesting stuff in Erland Sommarskog articles on Arrays and Lists
"Using Table-Valued Parameters" www.sommarskog.se/arrays-in-sql-2008.html
Split using non "Table-Valued Parameters" methods http://www.sommarskog.se/arrays-in-sql-2005.html

including setting the Delimiter Comparison to binary collation - on the grounds that delimiters like "," don't need foreign accent consideration - which I would not have thought of

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2010-12-01 : 09:09:11

Jeff, can you highlight the change you have made?

Madhivanan

Failing to plan is Planning to fail

Jeff Moden
Aged Yak Warrior

652 Posts

Posted - 2010-12-01 : 09:21:05

Thanks for the feedback, Kristen. I watched that first link as it unfolded way back in 2007 (I'm the 18th post on that link asking a question about the slothful recursive CTE method). Also, being in America, I always install with the default collation which also means I don't have the performance problem that some collations offer and I almost always forget to add the binary COLLATE clause to my examples. I really need to get into the habit of doing that because, as Erland and others have demonstrated with code, it can make a huge difference if you have a non-default collation selected for the server in this global environment we call T-SQL.

One of the other things that people forget that sometimes makes Tally table methods look real bad for larger things is the fact that CHARINDEX changes personality depending on the datatypes of it's operands. CHARINDEX usually returns an INT but if you use it on a MAX datatype, it will return a BIGINT and, depending on how you use that result, you can get table scans on the Tally (Numbers) Table or other external table where you'd normally enjoy a SEEK and short range scan. Same could happen with the CTE method I posted (because ROW_NUMBER returns a BIGINT) except the datatype precedence works in the correct direction there so no problem with the pseudo-Tally Table, but it can still mess up when joined with other tables (they all can... not just Tally Table based code). Oddly enough, the RBAR in a While Loop based splitter prevents such a problem but you still have the slowness of a While Loop.

It's also worth mentioning that no one method is the best method for every situation. It's also worth mentioning that a very well written CLR will usually beat all the methods with a real Tally table comining in very close behind the CLR for performance on shorter length splits.

--Jeff Moden

Jeff Moden
Aged Yak Warrior

652 Posts

Posted - 2010-12-01 : 09:46:00

quote:
Originally posted by madhivanan

Jeff, can you highlight the change you have made?

Madhivanan

Failing to plan is Planning to fail

First, thanks for the feedback, Madhivanan. It's always a pleasure working with you.

I'm not sure what you mean, though. What change do you speak of?

--Jeff Moden

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2010-12-07 : 02:32:50

quote:
Originally posted by Jeff Moden

quote:
Originally posted by madhivanan

Jeff, can you highlight the change you have made?

Madhivanan

Failing to plan is Planning to fail

First, thanks for the feedback, Madhivanan. It's always a pleasure working with you.

I'm not sure what you mean, though. What change do you speak of?

--Jeff Moden

In the post where you compared the different methods, it says at the end

Edited by - Jeff Moden on 12/01/2010 08:49:18

whereas you posted it at 11/30/2010 23:39:43

Madhivanan

Failing to plan is Planning to fail

Jeff Moden
Aged Yak Warrior

652 Posts

Posted - 2010-12-07 : 22:13:46

Ah... that. There was nothing in the code or the test results that I changed. I had made a comment and thought afterwards about WHILE Loops and decided that some folks might have taken it the wrong way so I changed it to "We should all probably steer clear of the While Loop for splitting."

--Jeff Moden

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2010-12-20 : 06:59:30

quote:
Originally posted by Jeff Moden

Ah... that. There was nothing in the code or the test results that I changed. I had made a comment and thought afterwards about WHILE Loops and decided that some folks might have taken it the wrong way so I changed it to "We should all probably steer clear of the While Loop for splitting."

--Jeff Moden

Ok. No problem

Madhivanan

Failing to plan is Planning to fail

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources