Generate Structured Tree Breadth First from graph

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Generate Structured Tree Breadth First from graph

Author

Topic

aeblank
Starting Member

12 Posts

Posted - 2008-11-29 : 02:04:53

Hi,

THE PLAIN VANILLA PROBLEM

I want to know the absolutely fastest way to implement the following:

1) SELECT some_data INTO my_temporary_list

2) FOREACH item in my_temporary_list, DO SOME STUFF WITH item

That's all I need to know how to do. I know that I can implement this in the CLR, but I heard that the CLR is slower when working with enormous datasets, (I'm looking at returning millions of rows of data with this command when it's in production, it needs to be fast!). I'll probably implement it in both CLR and in T-SQL then benchmark them, but I don't know how to implement this in T-SQL.

Any suggestions or solutions would be much appreciated. Thanks!

MORE BACKGROUND

I'm coding a breadth-first tree generation algorithm. A few things: the data in the database is NOT stored as a tree, but as a graph, in fact, as "edges" or connections between points.[EDIT: see next post for more details on how data is stored] I'm generating a tree from a graph, and returning the results breadth first.

I have an algorithm that should do just fine, but I don't know how to implement a FOREACH loop over a temporary LIST of results in T-SQL. Can someone let me know how I can arbitrarily add data to a list from a SQL SELECT command, and then iterate through those results and perform more operations?

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2008-11-29 : 05:50:18

Generally, the fastest way to do stuff in SQL is not to do it iteratively. SQL is optimised for set-based operations, ie ones that operate on sets of data not rows of data.
If you try processing a million rows with a cursor, it'll take ages. Depends on what exactly you're doing in the cursor, but probable minutes to hours.

Could you perhaps post the table structure, some sample data and what you want to see once the processing is done. There's a good chance that one of the T-SQL experts here can come up with a good set-based solution that's waaaay faster than any iterative one would ever be.

Why can't it be a recursive algorithm?

--
Gail Shaw
SQL Server MVP

aeblank
Starting Member

12 Posts

Posted - 2008-11-29 : 10:11:46

Absolutely, thanks for the tip. I was afraid of that (I heard that looping / iteration is notoriously slow in SQL Server). You can ignore my pseudo-code from before. I wrote it quick, and realize now that it's wrong!

Anyway, straight to the point:

TABLE SCHEMA
CREATE TABLE [dbo].[edges](
[edge_id] [bigint] IDENTITY(1,1) NOT NULL,
[graph_id] [int] NOT NULL,
[vertex_id] [int] NOT NULL,
[adjacent_id] [int] NOT NULL,
CONSTRAINT [PK_edges] PRIMARY KEY CLUSTERED
)

SAMPLE DATA
edge_id graph_id vertex_id adjacent_id

1 1 0 1
2 1 0 2
3 1 0 3
4 1 1 4
5 1 1 5
6 1 1 6
7 1 2 7
8 1 2 8
9 1 2 9
10 1 2 10
10 1 2 11
10 1 3 12
10 1 3 13

INTERPRETATION OF TABLE

The table stores "edges" in a graph. An edge represents a one-way connection between two points on the graph. The algorithm is supposed to take a node, and generate a tree representing adjacents/neighbors/connections. In the output above, the shows that "node 0" belongs to "graph 1" connects to nodes 1, 2, and 3; similarly node 1 connects to nodes 4, 5 and 6, etc.

In simple terms, the algorithm has to return a list of the "neighbors" of a point on the graph, and then returns its neighbor's neighbor's, and then it's neighbor's neighbor's neighbors, etc. The graph is absolutely enormous, and the results need to come back in breadth first order rather than depth first order (i.e. return ALL the 1st degree neighbors, then spread out to second degree neighbors, then 3rd degree, etc.). Also, it will need to stop at a maximum amount of records returned.

The full real background is that a program operates mathematically on the graph, but only operates on sections of the graph at a time. Thus, the graph needn't be entirely loaded in memory (and there's no way it would fit). The Database stores the graph, and then the program says "I want to work ON AND AROUND / NEAR point #15938243", then the database returns a structured tree which represents how the points are connected. The tree starts with point #15938243, then spreads out from there--the program knows how to parse and read in the "tree" and correctly load the graph into memory.

EXAMPLE CASE OF WHAT I'M HOPING TO DO
Here I'm using letters to represent points instead of "vertex_id"s.
I also use what looks like XML tags, but really could just be represented by any special code (i.e. "-1" or "-2" or something).

If A connects uniquely to B, C and D, and B, C, and D each uniquely connect to a bunch of other points, then the algorithm would:

TAKE AS INPUT:
point A's ID #
GIVE AS OUTPUT:
A list containing
<depth>
A's ID #
<adj_list>
B's ID #
C's ID #
D's ID #
</adj_list>
</depth>
<depth>
B's ID #
<adj_list>
B's neighbor 1's ID #
B's neighbor 2's ID #
</adj_list>
C's ID #
<adj_list>
C's neighbor 1's ID #
C's neighbor 2's ID #
C's neighbor 3's ID #
C's neighbor 4's ID #
C's neighbor 5's ID #
</adj_list>
D's ID #
<adj_list>
D's neighbor 1's ID #
D's neighbor 2's ID #
D's neighbor 3's ID #
</adj_list>
</depth>
etc.
etc.
etc.
until "max records" is reached

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2008-11-29 : 11:42:27

Ok, so, for that sample data, what do you want the results to look like? (so that anyone who tries to write up a solution can easily test it.)

--
Gail Shaw
SQL Server MVP

aeblank
Starting Member

12 Posts

Posted - 2008-11-29 : 12:07:45

Sorry if I'm being really unclear in my question, I think I keep botching my explanations. Hope this clears it up!

For a table with the following values:

edge_id graph_id vertex_id adjacent_id
1 1 0 1
2 1 0 2
3 1 0 3
4 1 1 4
5 1 1 5
6 1 1 6
7 1 2 7
8 1 2 8
9 1 2 9
10 1 2 10
10 1 2 11
10 1 3 12
10 1 3 13

And for the following input:

graph_id = 1
starting_vertex_id = 0

I need the following output:

<depth>
0
<adj_list>
1
2
3
</adj_list>
</depth>
<depth>
1
<adj_list>
4
5
6
</adj_list>
2
<adj_list>
7
8
9
10
11
</adj_list>
3
<adj_list>
12
13
</adj_list>
</depth>

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-11-29 : 13:12:07

Have a look at recursive CTEs in books online

http://msdn.microsoft.com/en-us/library/ms186243.aspx

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2008-11-29 : 13:55:33

Ah. OK. I was misunderstanding too.
So the <depth> and <adj_list> are some markers, not a filler for some set of values?

Just to confirm, SQL 2005/2008?

--
Gail Shaw
SQL Server MVP

aeblank
Starting Member

12 Posts

Posted - 2008-11-29 : 22:58:09

That's exactly right. In my "CLR" version, I use -1 to denote the start of a depth, -2 to denote the end of a depth, -3 to denote the start of an adjacent list, and -4 to denote the end of an adjacent list. It basically creates an XML-like hierarchy that my program can parse quickly.

And I'm running SQL Server 2008.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources