|
Converting Multiple Rows into a CSV String (Set Based Method)By Jay White on 9 September 2002 | Tags: SELECT Based the problem outlined in RocketScientist's article (entitled Converting Multiple Rows into a CSV String), I would like to present an alternative method for accomplishing a similar task. This article will show a faster method to convert multiple rows into multiple CSV strings.
Say you had a table with a composite key, and you wanted to return each distinct first member and a CSV list of all the applicable second members. Using the approach outlined by RocketScientist, you would have to iterate through each distinct first member and run the sp_MakeCharList proc N times. I think there is a better way.
First, the DDL and sample data: create table Page47 (
i int not null,
vc varchar(5) not null,
constraint pk_Page47 primary key (i,vc) )
go
set nocount on
declare @i int
select @i = 0
while @i <5000
begin
insert into Page47 (i,vc)
select round((rand() * 100), 0),
char(round((rand() * 25), 0) + 97) +
char(round((rand() * 25), 0) + 97) +
char(round((rand() * 25), 0) + 97) +
char(round((rand() * 25), 0) + 97) +
char(round((rand() * 25), 0) + 97)
select @i = @i + 1
end
go
The sample data should look something like this: select top 10
i,
vc
from
Page47
order by
newid()
i vc
----------- -----
743 nndsn
267 krtub
582 lgrrg
185 qmcku
907 rrpqe
635 cpbcp
237 uovqh
85 bpixy
554 mfsqh
49 geabv
(10 row(s) affected)
First, lets look at adapting RocketScientists solution to this problem. Here is how you might iterate through the table to build all of the CSVs --create a temp table
select
identity(int,1,1) as rowid,
i
into
#workingtable
from
Page47
group by
i
--create a table to hold the lists
create table #Rocket (
i int not null primary key,
list varchar(8000) not null)
declare
@i int,
@maxrowid int,
@sql varchar(8000),
@list varchar(8000)
select
@maxrowid = max(rowid),
@i = 0
from
#workingtable
--loop through the all the i's
while @i <= @maxrowid
begin
select
@sql = 'select vc from Page47 where i = ' + convert(varchar,i)
from
#workingtable
where
rowid = @i
exec sp_MakeCharList
@codelistselect=@sql,
@delimitedlist=@list output
insert into #rocket (i,list)
select i, @list
from
#workingtable
where
rowid = @i and
@list is not null
select @i = @i + 1
end
--return a sample from the final rowset
select top 10
i,
case
when len(list) > 50 then convert(varchar(50), left(list,47) + '...')
else convert(varchar(50),list)
end as list
from
#rocket
order by
newid()
go
i list
----------- --------------------------------------------------
790 fpflf, hjutq, inahr, kgkox, kroox, nbkiy, rsscw...
752 imokv, jcrgz, khsui, lbmwm, pkjee, ujxoi, ulgsm...
74 aphdr, cdmoc, fogdm, gslsg, hushk, icchc, lybud...
320 bvevh, ohyqv, oivyc, yevjb, ypnrk, yxfji
956 aisgk, emeup, fhtad, hffnm, iidpn, ivqlk, jhklu...
530 bqcjr, dnfzr, dqwas, egqpm, fplip, ibrxp, jcbhk...
137 aorpw, bngki, cibny, hwsrp, jdown, qlepc, zekbo, z
72 ghrxi, gmkzk, mufmc, uixpr, vjvbu, ymruj, zmucr, z
911 fibrs, fuhub, hgcwh, kuvcf, lcfux, nrqht, pfaae...
62 edxyj, grrmt, hmkfv, hprlt, ieqvg, mojsk, pdrmv...
Now, here is a set based method for doing the same thing. --create a table to work with
create table #workingtable (
i int not null,
vc varchar(5) not null,
list varchar(8000),
constraint pk_wt primary key (i,vc) )
insert into #workingtable (i,vc)
select i,vc
from Page47
order by i,vc
declare
@list varchar(8000),
@lasti int
select
@list = '',
@lasti = -1
--here is the meat of the work
update
#workingtable
set
@list = list = case
when @lasti <> i then vc
else @list + ', ' + vc
end,
@lasti = i
--return a sample from the final rowset
select top 10
i,
case
when len(max(list)) > 50 then convert(varchar(50), left(max(list),47) + '...')
else convert(varchar(50),max(list))
end as list
from
#workingtable
group by
i
order by
newid()
go
i list
----------- --------------------------------------------------
127 itvgq, ljosw, nxmdj, oshrp, plxff, pubig, sthck...
849 gcifo, hbxkf, njkdl, sfesm, sjhky, uxhfq, vjeno...
684 fejly, fqyqf, gpfce, hutht, kwywo, mapco, momqn...
461 fsofv, fzked, murat, vzmek, yrqjo
612 nmmey, tfjhv, ulwuj, xxaaq
374 bbthd, jvjwz, klcsq, mrakf, peztf, pixww, rtwdd
730 dlynf, egqei, hhckx, nsvdn, obnhh, rfbwh, ytgfi
458 eijdr, gtxhu, lhtqh, phprf, qjhcr, vqnos
655 bijer, fwlgk, nrcbm, sohho, trjtw, usjdj, uvpie...
837 ayxcv, epurf, flvtj, ftxcj, imjap, pmygd, sqhcc...
My tests show the iterative method taking 2.813 seconds and the set-based method taking 0.670 seconds, with this set of sample data. Increasing the number of rows shows the cursor solution creeping up on the set based solution; however, increasing the number of distinct i’s allowed show the performance of the cursor sharply declining. With 1000 distinct i’s, the cursor solution dropped to 89.420 seconds while the set based solution screamed at a 1.513 seconds. As you can see, the implementation choice will be dependent on your data set, so you should test both methods to find which is right for you. One last thing, why does that update statement work in the set based solution? Honestly, your guess is as good as mine. You can read more about that here.
|
- Advertisement - |