Converting Multiple Rows into a CSV String (Set Based Method)

By Jay White on 9 September 2002 | Tags: SELECT

Based the problem outlined in RocketScientist's article (entitled Converting Multiple Rows into a CSV String), I would like to present an alternative method for accomplishing a similar task. This article will show a faster method to convert multiple rows into multiple CSV strings.

Say you had a table with a composite key, and you wanted to return each distinct first member and a CSV list of all the applicable second members. Using the approach outlined by RocketScientist, you would have to iterate through each distinct first member and run the sp_MakeCharList proc N times. I think there is a better way.

First, the DDL and sample data:

create table Page47 (
    i  int not null,
    vc varchar(5) not null,
    constraint pk_Page47 primary key (i,vc) )
go


set nocount on
declare @i int
select  @i = 0
while @i <5000
begin
    insert into Page47 (i,vc)
    select round((rand() * 100), 0),
           char(round((rand() * 25), 0) + 97) +
           char(round((rand() * 25), 0) + 97) +
           char(round((rand() * 25), 0) + 97) +
           char(round((rand() * 25), 0) + 97) +
           char(round((rand() * 25), 0) + 97)
    select @i = @i + 1
end
go

The sample data should look something like this:

select top 10
    i,
    vc
from
    Page47
order by
    newid()

i           vc    
----------- ----- 
743         nndsn
267         krtub
582         lgrrg
185         qmcku
907         rrpqe
635         cpbcp
237         uovqh
85          bpixy
554         mfsqh
49          geabv

(10 row(s) affected)

First, lets look at adapting RocketScientists solution to this problem. Here is how you might iterate through the table to build all of the CSVs

--create a temp table
select
    identity(int,1,1) as rowid,
    i
into
    #workingtable
from
    Page47
group by
    i

--create a table to hold the lists
create table #Rocket (
    i int not null primary key,
    list varchar(8000) not null)

declare
    @i int,
    @maxrowid int,
    @sql varchar(8000),
    @list varchar(8000)

select
    @maxrowid = max(rowid),
    @i = 0
from
    #workingtable

--loop through the all the i's
while @i <= @maxrowid
begin

    select
        @sql = 'select vc from Page47 where i = ' + convert(varchar,i)
    from
        #workingtable
    where
        rowid = @i

    exec sp_MakeCharList
        @codelistselect=@sql,
        @delimitedlist=@list output

    insert into #rocket (i,list)
    select i, @list
    from
        #workingtable
    where
        rowid = @i and
        @list is not null
    select @i = @i + 1
end

--return a sample from the final rowset
select top 10
    i,
    case
        when len(list) > 50 then convert(varchar(50), left(list,47) + '...')
        else convert(varchar(50),list)
    end as list
from
    #rocket
order by
    newid()
go

i           list                                               
----------- -------------------------------------------------- 
790         fpflf, hjutq, inahr, kgkox, kroox, nbkiy, rsscw...
752         imokv, jcrgz, khsui, lbmwm, pkjee, ujxoi, ulgsm...
74          aphdr, cdmoc, fogdm, gslsg, hushk, icchc, lybud...
320         bvevh, ohyqv, oivyc, yevjb, ypnrk, yxfji
956         aisgk, emeup, fhtad, hffnm, iidpn, ivqlk, jhklu...
530         bqcjr, dnfzr, dqwas, egqpm, fplip, ibrxp, jcbhk...
137         aorpw, bngki, cibny, hwsrp, jdown, qlepc, zekbo, z
72          ghrxi, gmkzk, mufmc, uixpr, vjvbu, ymruj, zmucr, z
911         fibrs, fuhub, hgcwh, kuvcf, lcfux, nrqht, pfaae...
62          edxyj, grrmt, hmkfv, hprlt, ieqvg, mojsk, pdrmv...

Now, here is a set based method for doing the same thing.

--create a table to work with
create table #workingtable (
    i int not null,
    vc varchar(5) not null,
    list varchar(8000),
    constraint pk_wt primary key (i,vc) )

insert into #workingtable (i,vc)
select i,vc
from Page47
order by i,vc

declare
    @list varchar(8000),
    @lasti int

select
    @list = '',
    @lasti = -1

--here is the meat of the work
update
    #workingtable
set
    @list = list = case
                       when @lasti <> i then vc
                       else @list + ', ' + vc
                   end,
    @lasti = i

--return a sample from the final rowset
select top 10
    i,
    case
        when len(max(list)) > 50 then convert(varchar(50), left(max(list),47) + '...')
        else convert(varchar(50),max(list))
    end as list
from
    #workingtable
group by
    i
order by
    newid()
go

i           list                                               
----------- -------------------------------------------------- 
127         itvgq, ljosw, nxmdj, oshrp, plxff, pubig, sthck...
849         gcifo, hbxkf, njkdl, sfesm, sjhky, uxhfq, vjeno...
684         fejly, fqyqf, gpfce, hutht, kwywo, mapco, momqn...
461         fsofv, fzked, murat, vzmek, yrqjo
612         nmmey, tfjhv, ulwuj, xxaaq
374         bbthd, jvjwz, klcsq, mrakf, peztf, pixww, rtwdd
730         dlynf, egqei, hhckx, nsvdn, obnhh, rfbwh, ytgfi
458         eijdr, gtxhu, lhtqh, phprf, qjhcr, vqnos
655         bijer, fwlgk, nrcbm, sohho, trjtw, usjdj, uvpie...
837         ayxcv, epurf, flvtj, ftxcj, imjap, pmygd, sqhcc...

My tests show the iterative method taking 2.813 seconds and the set-based method taking 0.670 seconds, with this set of sample data. Increasing the number of rows shows the cursor solution creeping up on the set based solution; however, increasing the number of distinct i’s allowed show the performance of the cursor sharply declining. With 1000 distinct i’s, the cursor solution dropped to 89.420 seconds while the set based solution screamed at a 1.513 seconds. As you can see, the implementation choice will be dependent on your data set, so you should test both methods to find which is right for you.

One last thing, why does that update statement work in the set based solution? Honestly, your guess is as good as mine. You can read more about that here.