Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
aeblank
Starting Member
12 Posts |
Posted - 2008-11-29 : 02:04:53
|
| Hi,THE PLAIN VANILLA PROBLEMI want to know the absolutely fastest way to implement the following:1) SELECT some_data INTO my_temporary_list2) FOREACH item in my_temporary_list, DO SOME STUFF WITH itemThat's all I need to know how to do. I know that I can implement this in the CLR, but I heard that the CLR is slower when working with enormous datasets, (I'm looking at returning millions of rows of data with this command when it's in production, it needs to be fast!). I'll probably implement it in both CLR and in T-SQL then benchmark them, but I don't know how to implement this in T-SQL.Any suggestions or solutions would be much appreciated. Thanks! MORE BACKGROUNDI'm coding a breadth-first tree generation algorithm. A few things: the data in the database is NOT stored as a tree, but as a graph, in fact, as "edges" or connections between points.[EDIT: see next post for more details on how data is stored] I'm generating a tree from a graph, and returning the results breadth first.I have an algorithm that should do just fine, but I don't know how to implement a FOREACH loop over a temporary LIST of results in T-SQL. Can someone let me know how I can arbitrarily add data to a list from a SQL SELECT command, and then iterate through those results and perform more operations? |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2008-11-29 : 05:50:18
|
| Generally, the fastest way to do stuff in SQL is not to do it iteratively. SQL is optimised for set-based operations, ie ones that operate on sets of data not rows of data.If you try processing a million rows with a cursor, it'll take ages. Depends on what exactly you're doing in the cursor, but probable minutes to hours.Could you perhaps post the table structure, some sample data and what you want to see once the processing is done. There's a good chance that one of the T-SQL experts here can come up with a good set-based solution that's waaaay faster than any iterative one would ever be. Why can't it be a recursive algorithm?--Gail ShawSQL Server MVP |
 |
|
|
aeblank
Starting Member
12 Posts |
Posted - 2008-11-29 : 10:11:46
|
| Absolutely, thanks for the tip. I was afraid of that (I heard that looping / iteration is notoriously slow in SQL Server). You can ignore my pseudo-code from before. I wrote it quick, and realize now that it's wrong!Anyway, straight to the point:TABLE SCHEMACREATE TABLE [dbo].[edges]( [edge_id] [bigint] IDENTITY(1,1) NOT NULL, [graph_id] [int] NOT NULL, [vertex_id] [int] NOT NULL, [adjacent_id] [int] NOT NULL, CONSTRAINT [PK_edges] PRIMARY KEY CLUSTERED )SAMPLE DATAedge_id graph_id vertex_id adjacent_id1 1 0 12 1 0 23 1 0 34 1 1 45 1 1 56 1 1 67 1 2 78 1 2 89 1 2 910 1 2 1010 1 2 1110 1 3 1210 1 3 13INTERPRETATION OF TABLEThe table stores "edges" in a graph. An edge represents a one-way connection between two points on the graph. The algorithm is supposed to take a node, and generate a tree representing adjacents/neighbors/connections. In the output above, the shows that "node 0" belongs to "graph 1" connects to nodes 1, 2, and 3; similarly node 1 connects to nodes 4, 5 and 6, etc.In simple terms, the algorithm has to return a list of the "neighbors" of a point on the graph, and then returns its neighbor's neighbor's, and then it's neighbor's neighbor's neighbors, etc. The graph is absolutely enormous, and the results need to come back in breadth first order rather than depth first order (i.e. return ALL the 1st degree neighbors, then spread out to second degree neighbors, then 3rd degree, etc.). Also, it will need to stop at a maximum amount of records returned.The full real background is that a program operates mathematically on the graph, but only operates on sections of the graph at a time. Thus, the graph needn't be entirely loaded in memory (and there's no way it would fit). The Database stores the graph, and then the program says "I want to work ON AND AROUND / NEAR point #15938243", then the database returns a structured tree which represents how the points are connected. The tree starts with point #15938243, then spreads out from there--the program knows how to parse and read in the "tree" and correctly load the graph into memory.EXAMPLE CASE OF WHAT I'M HOPING TO DO Here I'm using letters to represent points instead of "vertex_id"s.I also use what looks like XML tags, but really could just be represented by any special code (i.e. "-1" or "-2" or something).If A connects uniquely to B, C and D, and B, C, and D each uniquely connect to a bunch of other points, then the algorithm would:TAKE AS INPUT: point A's ID #GIVE AS OUTPUT:A list containing<depth>A's ID #<adj_list>B's ID #C's ID #D's ID #</adj_list></depth><depth>B's ID #<adj_list>B's neighbor 1's ID #B's neighbor 2's ID #</adj_list>C's ID #<adj_list>C's neighbor 1's ID #C's neighbor 2's ID #C's neighbor 3's ID #C's neighbor 4's ID #C's neighbor 5's ID #</adj_list>D's ID #<adj_list>D's neighbor 1's ID #D's neighbor 2's ID #D's neighbor 3's ID #</adj_list></depth>etc.etc.etc. until "max records" is reached |
 |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2008-11-29 : 11:42:27
|
| Ok, so, for that sample data, what do you want the results to look like? (so that anyone who tries to write up a solution can easily test it.)--Gail ShawSQL Server MVP |
 |
|
|
aeblank
Starting Member
12 Posts |
Posted - 2008-11-29 : 12:07:45
|
| Sorry if I'm being really unclear in my question, I think I keep botching my explanations. Hope this clears it up!For a table with the following values:edge_id graph_id vertex_id adjacent_id1 1 0 12 1 0 23 1 0 34 1 1 45 1 1 56 1 1 67 1 2 78 1 2 89 1 2 910 1 2 1010 1 2 1110 1 3 1210 1 3 13And for the following input:graph_id = 1starting_vertex_id = 0I need the following output:<depth>0<adj_list>123</adj_list></depth><depth>1<adj_list>456</adj_list>2<adj_list>7891011</adj_list>3<adj_list>1213</adj_list></depth> |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2008-11-29 : 13:55:33
|
| Ah. OK. I was misunderstanding too.So the <depth> and <adj_list> are some markers, not a filler for some set of values?Just to confirm, SQL 2005/2008?--Gail ShawSQL Server MVP |
 |
|
|
aeblank
Starting Member
12 Posts |
Posted - 2008-11-29 : 22:58:09
|
| That's exactly right. In my "CLR" version, I use -1 to denote the start of a depth, -2 to denote the end of a depth, -3 to denote the start of an adjacent list, and -4 to denote the end of an adjacent list. It basically creates an XML-like hierarchy that my program can parse quickly.And I'm running SQL Server 2008. |
 |
|
|
|
|
|
|
|