Article: Using a CSV with an IN sub-select

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

Site Related Forums

Article Discussion

Article: Using a CSV with an IN sub-select

Author

Topic

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2002-10-13 : 21:53:30

One of the most asked questions involves a bit of code like this: Select * From Products Where ProductID in (@SomeCSVString). I've always said the only way to do this was through dynamic SQL. Actually there's a better way in SQL Server 2000. It doesn't involve dynamic SQL and it's really easy to use.

Article Link.

SamC
White Water Yakist

3467 Posts

Posted - 2002-10-13 : 22:46:10

Very cool way to 'hard code' a table in a SP without allocating table space.

This is exactly what I was looking for in an earlier post for comparing a last name to a list of known names..

Sam

sgtwilko
Starting Member

23 Posts

Posted - 2002-10-14 : 03:40:49

Acctually there is another way....


select * from Products where charindex(','+convert(nvarchar,productID)+',',','+@SomeCSVString+',')>0

You need to have the ','+convert(nvarchar,productID)+',' section to make sure you don't match 112 when you want 12, etc.

Ian.

--
Eagles may soar,
but Weasels aren't sucked into jet engines.

jimmers
Starting Member

12 Posts

Posted - 2002-10-14 : 06:14:36

2sgtwilko:

Your solution won't be able to use index on productID...

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-10-14 : 07:41:10

And if you want a fast way to do this ...


declare @csv varchar(1000), @sep char(1), @t datetime
select @csv = '1,5,11', @sep = ','

dbcc dropcleanbuffers
select @t=getdate()

select P.ProductID, ProductName
from northwind.dbo.Products P
JOIN dbo.CsvToInt(@CSV) CSV 
ON CSV.IntValue = P.ProductID
select datediff(ms,@t,getdate())

dbcc dropcleanbuffers
select @t = getdate()
select P.ProductID, ProductName
from northwind.dbo.Products P
JOIN (select nullif(substring(@sep+@csv+@sep,n,charindex(@sep,@sep+@csv+@sep,n)-n),'') as element
      from dbo.numbers
      where
            n<=datalength(@sep+@csv+@sep) and
            n-datalength(@sep)>0 and
            substring(@sep+@csv+@sep,n-datalength(@sep),datalength(@sep))=@sep and
            charindex(@sep,@sep+@csv+@sep,n)-n>0) csv
ON CSV.element = p.ProductID
select datediff(ms,@t,getdate())

note: dbo.numbers is a tally table...dbcc dropcleanbuffers is there to clear the data cache to ensure a fair comparison.

The first query has a subtree cost of 0.0442 and includes a table scan of the function. The second query 0.0292 and seeks the clustered indexs on Products and Numbers. On my P4 SQL2K machine, the first query is running in around 110ms while the second query is usually around 50ms.

The difference between these two methods has already been explored here. Each method has advantages and depending on the circumstances, can be the better performer. As always in perf tuning of a query, you should test both methods to make an accurate decision based on your environment.

Jay White
{0}

jcelko
Esteemed SQL Purist

547 Posts

Posted - 2002-10-14 : 13:05:05

I hope we a;ll know the right way to do this is to build a one column table, load it and use IN (SELECT..). But if you want a kludge, passing a list of parameters to a stored procedure can be done by putting them into a string with a separator. I like to use the traditional comma. Let's assume that you have a whole table full of such parameter lists:

CREATE TABLE InputStrings
(keycol CHAR(10) NOT NULL PRIMARY KEY,
input_string VARCHAR(255) NOT NULL);

INSERT INTO InputStrings VALUES ('first', '12,34,567,896');
INSERT INTO InputStrings VALUES ('second', '312,534,997,896');
...

This will be the table that gets the outputs, in the form of the original key column and one parameter per row.

CREATE TABLE Parmlist
(keycol CHAR(5) NOT NULL PRIMARY KEY,
parm INTEGER NOT NULL);

It makes life easier if the lists in the input strings start and end with a comma. You will also need a table called Sequence, which is a set of integers from 1 to (n).

SELECT keycol,
CAST (SUBSTRING (',' + I1.input_string + ',', MAX(S1.seq + 1),
(S2.seq - MAX(S1.seq + 1)))
AS INTEGER),
COUNT(S2.seq) AS place
FROM InputStrings AS I1, Sequence AS S1, Sequence AS S2
WHERE SUBSTRING (',' + I1.input_string + ',' , S1.seq, 1) = ','
AND SUBSTRING (',' + I1.input_string + ',' , S2.seq, 1) = ','
AND S1.seq < S2.seq
AND S2.seq <= DATALENGTH(I1.input_string) + 1
GROUP BY I1.keycol, I1.input_string, S2.seq
ORDER BY I1.keycol, I1.input_string, S2.seq

The S1 and S2 copies of Sequence are used to locate bracketing pairs of commas, and the entire set of substrings located between them is extracts and cast as integers in one non-procedural step. The trick is to be sure that the left hand comma of the bracketing pair is the closest one to the second comma. The place column tells you the relative position of the value in the input string

--CELKO--
Joe Celko, SQL Guru

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-10-14 : 14:09:23

So it comes full circle ...

Rob reads Joe's book.
Rob burglers Joe's method to write SQL Team article.
Jay reads Rob's article.
Jay burglers Rob's article to post response to this thread.
Joe reads this thread and posts his original method ...

Jay White
{0}

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-10-14 : 14:22:53

Rob reads Joe's book.
~~Rob burglers Joe's method to write SQL Team article.~~ Rob does not see this technique in Joe's book(s) and actually figures it out on his own.

AFAIK Joe borrowed it from me.

Jay reads Rob's article.
Jay burglers Rob's article to post response to this thread.
Joe reads this thread and posts his original method ...

Seriously, I'm pretty sure I wrote this article even before I bought SQL For Smarties, and it's not in that book. Who cares anyway? If Joe also uses it then it just means it's an excellent technique.

And Joe has posted this before:

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=14932

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-10-14 : 14:31:02

Come on now, Rob ... you mention SQL for Smarties in the article in reference to Tally tables ...

Denial ain't just a river

.

I guess it's really just like saying who invented green ... who cares ...

Although I did respond before Joe and he obviously didn't read my post otherwise he would not have posted the same thing again ...

Note: Joe, I'm just goofing off ... honestly, thanks for stopping by ... I love reading your posts ...

Jay White
{0}

Edited by - Page47 on 10/14/2002 14:32:56

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-10-14 : 14:54:34

Yep, forgot that. I meant to say the parsing technique wasn't included in SQL For Smarties.

And for the record, the Irish invented green.

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-10-14 : 16:43:35

This may help:

http://www.straightdope.com/classics/a2_168b.html

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2002-10-14 : 17:21:17

Where oh where do you find this???

Another question comes to find: Who oh who would fund this research???

Jonathan
{0}

AjarnMark
SQL Slashing Gunting Master

3246 Posts

Posted - 2002-10-14 : 18:20:32

quote:
Who oh who would fund this research???

Ummm... The U.S. government? Just a guess considering how many wacky things they fund.

JustinBigelow
SQL Gigolo

1157 Posts

Posted - 2002-10-14 : 18:52:57

quote:

And for the record, the Irish invented green.

Me being at least part Irish I will go ahead and claim responsibility on behalf of myself and my people. Please forward all royalty checks post haste.

Justsin

Have you hugged your SQL Server today?

GreatInca
Posting Yak Master

102 Posts

Posted - 2002-10-16 : 16:57:39

select * from Products where charindex(','+convert(nvarchar,productID)+',',','+@SomeCSVString+',')>0

You need to have the ','+convert(nvarchar,productID)+',' section to make sure you don't match 112 when you want 12, etc.

Above can't use an index.

I don't like the csv idea. Until I'm sure otherwise, table variables are temp tables without transaction log overhead - they get written to tempdb still, especially if it gets large. Its I/O overhead even if its only 8K. They are way preferred to actual temp tables though as long as you can handle the scope limitation. I use dynamic sql until next version comes out where arrays are supported for real through C# :) I think this idea is really cool if the same csv string is being used multiple times in a complicated query though.

GreatInca
Posting Yak Master

102 Posts

Posted - 2002-10-16 : 16:58:47

ashok
Yak Posting Veteran

57 Posts

Posted - 2002-10-17 : 10:12:11

SQL Mag has a similar UDF called fn_Split, it allows you
to pass a parameter to specify the seperator....

http://www.sqlmag.com/Articles/Index.cfm?ArticleID=21071

-ashok
http://www.unganisha.org

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-10-17 : 11:06:20

quote:
Above can't use an index.

That is exactly right. So with this example from pubs, your solution makes a lot of sense, from a performance perspective. However, as the dataset grows, you'll find the other solutions here out performing you charindex() or dynamic sql dml ...



declare @csv varchar(1000), @sep char(1), @t datetime
select @sep = ','

set rowcount 10
select
    @csv = coalesce(@csv+@sep,'')+s
from
    (select top 100 percent
         s
     from
         dbo.strings  -- 10,000 random varchar(10) strings
     order by
         newid()) as a
set rowcount 0

dbcc dropcleanbuffers
select @t = getdate()
select s.s
from dbo.strings s
JOIN (select nullif(substring(@sep+@csv+@sep,n,charindex(@sep,@sep+@csv+@sep,n)-n),'') as element
      from dbo.numbers
      where
            n<=datalength(@sep+@csv+@sep) and
            n-datalength(@sep)>0 and
            substring(@sep+@csv+@sep,n-datalength(@sep),datalength(@sep))=@sep and
            charindex(@sep,@sep+@csv+@sep,n)-n>0) csv
ON CSV.element = s.s
select datediff(ms,@t,getdate())

dbcc dropcleanbuffers
select @t = getdate()
select s.s
from dbo.strings s
where charindex(@sep+s+@sep,@sep+@csv+@sep)>0
select datediff(ms,@t,getdate())

dbcc dropcleanbuffers
select @t = getdate()
declare @sql varchar(4000)
select @sql = 'select s from dbo.strings where charindex(''' + @sep + '''+s+''' +@sep + ''','''+@sep+@csv+@sep+''')>0'
exec(@sql)
select datediff(ms,@t,getdate())

Jay White
{0}

Onamuji
Aged Yak Warrior

504 Posts

Posted - 2002-10-19 : 09:03:35

I know joe did something with a sequence table to extract the strings ... but I posted something like this http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=18378 a while ago ... a composite of something rob did ... which would be faster ... the use of a sequence/tally table or the while loop ?

ThomasZ
Starting Member

2 Posts

Posted - 2002-11-03 : 12:42:18

Hi,

I am not sure if this method was already mentioned its less verbose that the UDF, but performances should be a bit better:

SET @CSVParameters = ',' +@CSVParameters+ ','
SELECT * FROM YourTable WHERE @CSVParameters LIKE '%,' +TableID + ',%'

Where @CSVParameters would contain the CSV values eg : 1,2,3,4,5
And tableID would be the column to compare. Depending on the type of your column you might need to convert it to a (var)char type.

Regards,
Thomas +++

troubleshooter
Starting Member

1 Post

Posted - 2003-10-10 : 00:38:20

I have to admit, I enjoyed this posting, and found it to be reasonably useful while dealing with a significant, and reasonably complex challenge I was faced with. The specific dilemma that I encountered involved a CSV (I'm not going to get into the debate about the merits of CSV files, in this case, it was a necessary evil) that exceeded 8000 characters, and thus was rendered useless by the limitations of VarChar.

At first, I simply split the CSV, in less-than 8000 byte increments, through code and looped through the chunks within the application, prior to passing to through to the stored procedure. Although this meant more overhead, and slowed the process (albeit minor), it also involved a lot of re-writing of code every time I encountered this scenario (I hate re-writing code). So I then took a look at a few other samples, and found a solution that is a hybrid of the original CSVtoInt found on this site, along with some thoughts / ideas from Erland Sommarskog regarding the passing of unlimited (text / ntext) variables.

If my input is welcome here, I'd be happy to post the hybrid, and possibly help a few others whom I am sure have encountered my dilemma. If not, then at least I can say 'thanks' to those of you who have shared your knowledge with the rest of us.

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources