Row Concatination - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

Old Forums

CLOSED - General SQL Server

Row Concatination

Author

Topic

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2002-08-05 : 09:30:56

Mark writes "I have a 77,000 record table from a legacy (COOL:Gen) that has following structure

RecID RecSeq RecTxt
10 1 A
10 2 A
11 1 A
11 2 B
11 3 C

I need to concatenate it to this

10 AA
11 ABC

My current script based on a Read Only Cursor executing a stored procedure takes 20 min. (A entirely cursor-based script takes 55 min.)

I would like to do something similar to BULK INSERT like this...

EXEC ProcName (@v1, @v2) SELECT v1, v2 FROM Table WHERE Criteria ORDER BY RecSeq

(where the sp checks is rec exists and either inserts or updates RecTxt + @v2)

...but obviously it will not work. Any other idea how I may be able to improve the cursor approach?"

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-08-05 : 10:00:19

I've posted this method so many times, I think it should be henceforth be referred to as 'The Page47 Method'....

... admittedly, I think I got if from NR or somebody, so I guess I can't really claim it as my own....


select
    RecID,
    RecTxt,
    space(8000) as RecList
into
    #workingtable
from
    mark
order by
    RecID,
    RecSeq

declare
    @lastRecID int,
    @list varchar(8000)

select
    @lastRecID = -1,
    @list = ''

update
    #workingtable
set
    @list = list = case
                       when @lastRecID <> RecID then RecTxt
                       else @list + RecTxt
                   end,
    @lastRecID = RecID

select
    RecID,
    max(RecList)
from
    #workingtable

Jay White
{0}

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-08-05 : 11:21:19

So why does this work?
I mean, if there's no TOP and no IDENTITY function, what is an ORDER BY on a SELECT...INTO supposed to be doing?
And how is the order that the UPDATE processes the rows supposed to be the same?
Is there some special logic in SQL Server that spots this construct?

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-08-05 : 11:28:53

quote:
So why does this work?

I think it works because SQL Server isn't a RDBMS, but rather a SQLDBMS, as Fabian Pascal would say. Here's my best guess.... By ordering the SELECT INTO, the #workingtable records are written to disk (buffer) in an ordered way and the update statement processes them in that order. Now if SQL Server where relational, a) there would be no concept of physical data order on the disk (buffer) and b) the update would not process rows in a particular order. However, that doesn't seem to be how its implemented. I've never been able to not make this work in SQL Server.

With my luck, the next Service Pack fix this ....

Jay White
{0}

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-08-05 : 11:51:16

Jay's right. When you think about it, internally, SQL Server would have to DELIBERATELY put them out of order during the INSERT operation if they did not match the ORDER BY clause. That doesn't make a whole lot of sense IMHO.

quote:
Now if SQL Server where relational, a) there would be no concept of physical data order on the disk (buffer)

Ummmm, that's a pretty important "concept" to have nailed down if you ever want to have a WORKING database storing and retrieving real data!

Maybe I should clarify that: The physical order is a byproduct of its physical storage. While its physical order is not important, data that does not "physically" exist is mighty useless.

quote:
and b) the update would not process rows in a particular order.

I can't see how a "truly" relational database would be better if it deliberately stored data out of order, or accessed sequential data pages out of order.

quote:
However, that doesn't seem to be how its implemented.

And might explain why D hasn't been implemented AT ALL.

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-08-05 : 12:39:25

So does the fact that we're using an atomic INSERT INTO mean that the all the necessary extents for #worktable can be assigned before it is populated with rows? Otherwise, any fragmentation would result in the IAM pages having the extents listed in something other than the intended sort order. And since the IAM pages are the only things referencing all the extents in a heap, the update wouldn't work.

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-08-05 : 13:30:32

quote:
So does the fact that we're using an atomic INSERT INTO mean that the all the necessary extents for #worktable can be assigned before it is populated with rows?

Books Online suggests that this indeed is the case:

quote:
Extents are the basic unit in which space is allocated to tables and indexes. An extent is 8 contiguous pages, or 64 KB. This means SQL Server 2000 databases have 16 extents per megabyte.

To make its space allocation efficient, SQL Server 2000 does not allocate entire extents to tables with small amounts of data. SQL Server 2000 has two types of extents:

-Uniform extents are owned by a single object; all eight pages in the extent can only be used by the owning object.

-Mixed extents are shared by up to eight objects.

A new table or index is usually allocated pages from mixed extents. When the table or index grows to the point that it has eight pages, it is switched to uniform extents. If you create an index on an existing table that has enough rows to generate eight pages in the index, all allocations to the index are in uniform extents.

I also remember reading somewhere that a table can only have one mixed extent; it's implied in the above but there was something else that stated it clearly. The worst that would happen is that the first extent of the temp table would be a mixed extent, all other after would be uniform and most likely contiguous.

If you look at tempdb, it makes very little sense to allocate non-contiguous extents unless you're at the limit and all other contiguous extents are in use, and that situation would ONLY occur after a great deal of gymnastics within tempdb. You'd have to see a lot of temp tables created and only certain ones dropped. Sure, this happens a lot anyway, but its so random in nature that it's hard to believe SQL Server would try to squeeze a new table into this available patchwork instead of creating a bunch of new extents.

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-08-05 : 14:17:18

What I meant was, if it allocates the extents it needs before populating, it doesn't matter if the table has fragmentation because the row inserts will presumably just fill up the extents in IAM order.

I suppose if you're really paranoid, you could instrument the update with some extra counters to check all transitions observered conform to the desired order and complain if they're not. It would only be a small extra scalar cost -- though it could be a bit tiresome if you've got a load of columns in the ORDER BY.

Edited by - Arnold Fribble on 08/05/2002 14:21:42

Wanderer
Master Smack Fu Yak Hacker

1168 Posts

Posted - 2004-05-14 : 10:15:08

Hmmm,

I don't know why it is I read these, and everyone else seems to understand, while I go "clear as mud". Am I right in thinking that somehow this works on the unproven, yet always experienced, basis that select from a table that was inserted in order, results in a select being in order ?

If no, then see my "clear as mud" confusion :-)

If yes, the how does the "merry-go-around-scan" potentially impact this ? see http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=35068

From Muddy :-)

*#&#* *#&#* *#&#* *#&#*

Chaos, Disorder and Panic ... my work is done here!

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-05-14 : 11:32:43

The select has an ORDER BY ... it's two different discussions ...

Jay White
{0}

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-05-14 : 12:55:54

As far as I remember my reasoning was thus:

When you do an UPDATE, you don't get to say what order anything is done in. Usually that doesn't matter, but since this UPDATE is both using and changing a variable, it definitely makes a difference what order those rows are processed in.

I would expect the rows to be processed in the easiest order for the database. If SQL Server says, "Hey! I know what you're trying to do here," and puts an Ordered Forward on the UPDATE's index scan then you're ok. (assuming you wanted the things updated in clustered index order!)

If there's no clustered index, or it doesn't use Ordered Forward, then you're relying on the extents that contain pages for the table being in the desired order. I was expressing some dubiousness that this must be the case. Or something.

Wanderer
Master Smack Fu Yak Hacker

1168 Posts

Posted - 2004-05-17 : 03:35:08

Hi Jay,

As Arnold noted, I was thinking along the lines of the Update, and not the select. thinking it through though, since the update will take a exclusive lock, I figure that it cannot "piggy-back" any other users table scan, since it requires no-one else to have locks. Hence I figure that Updates cannot do "merry-go-round scans".

I still would be interested to know for a fact why this works/ if it is guaranteed that it will always work ...

Cheers

*#&#* *#&#* *#&#* *#&#*

Chaos, Disorder and Panic ... my work is done here!

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2004-05-17 : 19:24:21

quote:

Now if SQL Server where relational, a) there would be no concept of physical data order on the disk (buffer)

I don't think that's right. RM "says" that order is irrelevant because of the identity principle. It does not care about the physical storage.. CSV, Excel, R-Tree, B-Tree, ordered, unordered.. what ever you like...

quote:

and b) the update would not process rows in a particular order.

True, but we must remember that a UPDATE is simply short hand for a DELETE (MINUS) then an INSERT. (UNION).. The most common distinct alogrithm uses a sort operation and I think it is because SQL (not just SQL Server) allows duplicates that this sort operation is so heavily relied on... Look at projection to see how SQL butchers it and that you must supply the DISTINCT operator to achieve something that looks like a relation. It does raise an interesting question though... If I receive a relation that is unordered, then apply the non-relational operator ORDER BY, is it still a relation after that operation?

quote:

And might explain why D hasn't been implemented AT ALL

It has Rob... The Alphora product uses it...Very well I might add....

DavidM

"Always pre-heat the oven"

todhilton
Starting Member

2 Posts

Posted - 2005-05-12 : 11:08:10

I'm cross-posting this reply with this topic because my original searches resulted in both posts.

I realize this post is more than a year old, but I just recently had to do this and ended up using the temporary table method posted by Page47. It worked, but was clunky. A few days ago a co-worker showed me a really quick and easy way of doing this in SQL 2000:

declare @list varchar(8000)
select @list = Isnull(@list,'') +';'+ theColumnToBeConcatenated
from yourtable
where yoursearchparameter='something'
select @list

HTH!

~tod

X002548
Not Just a Number

15586 Posts

Posted - 2005-05-12 : 11:21:21

Try 3 years this August.....and the actual original question was concat for each key

http://weblogs.sqlteam.com/brettk/archive/2005/02/10/4153.aspx

Brett

8-)

tom_bapps
Starting Member

1 Post

Posted - 2005-08-03 : 06:39:27

quote:
Originally posted by Page47

I've posted this method so many times, I think it should be henceforth be referred to as 'The Page47 Method'.... ... admittedly, I think I got if from NR or somebody, so I guess I can't really claim it as my own....

select
RecID,
RecTxt,
space(8000) as RecList
into
#workingtable
from
mark
order by
RecID,
RecSeq

declare
@lastRecID int,
@list varchar(8000)

select
@lastRecID = -1,
@list = ''

update
#workingtable
set
@list = list = case
when @lastRecID <> RecID then RecTxt
else @list + RecTxt
end,
@lastRecID = RecID

select
RecID,
max(RecList)
from
#workingtable

I have created a table called mark as follows I was just trying to use the following above query you had posted,but it throws the error "Invalid column name 'list'."

I have created a table mark as follows

create table mark (RecID int,RecSeq int,RecTxt varchar(3))

What would be the error,Please advice as i am in urgent need of this functionality

Thanking you in advance,
Thomas Alexander

Jay White

Jay White
{0}

If you can you can
if you can't then you are right

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources