Complex OrgChart (nested sets/directed graph/etc)

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Complex OrgChart (nested sets/directed graph/etc)

Author

Topic

portman
Starting Member

11 Posts

Posted - 2004-08-26 : 13:21:39

Hey everyone,

A question about directed graphs, nested sets, and trees. (Yes, that old chestnut.)

I've been using Joe Celko's 'nested sets' approach to modeling hierarchies for the last three years. [url]http://www.intelligententerprise.com/001020/celko.jhtml?_requestid=160434[/url] The model has worked will for product databases (categories), HR hierarchies, and family trees.

Every so often, I would enounter a node that has two parents. I usually model this by including the node more than once in the hierarchy. For instance:

Table: VERTEX
-----------
VID VertexName
1 Mary
2 Tim
3 Bob
4 Ellen
5 Carl

Table: NESTEDSET
----------------
VID LeftID RightID
1 1 12
2 2 5
4 3 4
3 6 11
4 7 8
5 9 10

Note that Ellen reports to both Tim and Bob. She thus appears twice in the hierarchy. The result is that if you print an organization chart or an XML document, her node will appear two times.

Up until now, these have been unique circumstances, and the approach has worked. However, I am now working on a model (Ministries of Health in East Africa) where it is far more common for people to have multiple bosses: in fact, it is the norm, not an exception.

The 'simple' organization chart therefore becomes more complex, am ACYCLIC DIRECTED GRAPH instead of a BINARY TREE. Nested sets therefore _seems to_ break down, and I am wondering if people have enountered this before and whether they have any alternative modeling approaches.

There is, of course, a pure directed graph model, where we record the directed edged in addition to the

Table: EDGE
----------------
EID Weight Source Target
1 1.0 1 2
2 1.0 1 3
3 1.0 2 4
4 1.0 3 4
5 1.0 3 5

Some properties about this model that I like:
- Easy to count the number of _incoming_ and _outgoing_ edges:
CREATE VIEW vwEdgeCount AS
SELECT
V.VID,
V.VertexName,
ISNULL(I.Cnt,0) AS Inbound,
ISNULL(O.Cnt,0) AS Outbound
FROM Vertex V
LEFT JOIN (
SELECT E.Target, COUNT(E.EID) AS Cnt
FROM Edge E
GROUP BY E.Target
) AS I ON I.Target = V.VID
LEFT JOIN (
SELECT E.Source, COUNT(E.EID) AS Cnt
FROM Edge E
GROUP BY E.Source
) AS O ON O.Source = V.VID
- Easy to find the sups, subs, roots, and leaves.
(Could do it from vwEdgeCount but more eficient as below.)

CREATE VIEW vwSupervisors AS
SELECT DISTINCT V.VID, V.VertexName
FROM Vertex V
INNER JOIN Edge E1 ON V.VID = E1.Source

CREATE VIEW vwSubordinates AS
SELECT DISTINCT V.VID, V.VertexName
FROM Vertex V
INNER JOIN Edge E1 ON V.VID = E1.Target

CREATE VIEW vwLeafs AS
SELECT DISTINCT E.Target, V.VertexName
FROM Edge E
INNER JOIN Vertex V ON V.VID = E.Target
WHERE Target NOT IN ( SELECT Source FROM Edge )

CREATE VIEW vwRoots AS
SELECT DISTINCT E.Source, V.VertexName
FROM Edge E
INNER JOIN Vertex V ON V.VID = E.Source
WHERE Source NOT IN ( SELECT Target FROM Edge )

The properties of this model that I do not like (perhaps because I am just too stupid to see solution):
- Cannot find depth without using recursion or iteration.
- Cannot find siblings without using recursion or iteration.
These are very eficient with the Nested Sets model. Here, it seems that any selections involving the _connectedness_ of the graph fall outside of simple relational algebra.

Thanks!

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-08-26 : 13:25:29

Will this help?? It handles multi-parents. http://www.seventhnight.com/treeStructs.asp

Corey

portman
Starting Member

11 Posts

Posted - 2004-08-26 : 13:37:10

Corey-
Your solution is great, and one that I've actually referenced quite a few times in the past. The problem is that in order to build the @paths table, we need to iterate (WHILE EXISTS ... BEGIN ... END). Anytime that the @tree table is updated, we would need to rebuild the @paths table. This may well be the most efficient way to handle multiple parents. However, I was hoping for some bit of mathematically-inspired trickery that would let me in essense build your @paths table without using any loops or recursive calls. A pure set-based approach, basically. I'm 98% confident that such a solution doesn't exist on (the Google-indexed portion of) the Internet.

But again, thanks for the link and the code.

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-08-26 : 13:49:43

You mean like a single set-based select?

I've never really thought about it (as I have about a 25000 node tree that builds in 3 seconds) but it seems like you should be able to build the tree once. And then update with changes you make. I have been thinking about splitting my article into a series, but I just haven't had the time.

Let me think about the updates for a bit...

Corey

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-08-26 : 13:59:57

I realize this isn't the exact same question, but give this a read: http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=38788

It seems that copying a node should be one of the most complex operations as far as updating a structure.

The operations I can think of:

Create Node - Create node
Delete Node - Delete all refrence of this node. Could produce broken structures
Map Node - Create reference to node
Remove Node - Delete refrence of this node. Could produce broken structures
Move Node - Delete 1 reference & Create 1 reference. Should maintain child structure
Copy Node - Create a new node with the same references as the copied node
Copy Branch - Create a set of new node with the same structure as the copied Branch

I think each could be done without loops...

Corey

portman
Starting Member

11 Posts

Posted - 2004-08-26 : 14:33:02

I'm more mathamatician than computer scientist, so I think in a slightly different vocabulary. We have an _acyclic simple digraph_. Wikipedia has good definitions on graph theory. [url]http://en.wikipedia.org/wiki/Graph_%28mathematics%29[/url]

Basically, though, you have three sets:
- Vertices (aka nodes, entities, or dots)
- Edges (aka arcs, relationships, or lines)
- Maps _from_ vertex V1 _to_ vertex V2 over edge E.

Every edge has a source vertex and a target vertex (this is the 'directed' part comes from). Some vertices may not have any edges 'attached' to them. However, it is not possible to have an edge that is not 'attached' to a vertex.

Using this lexicon, there are a limited number of operations to perform on a graph:
- Create vertex
- Create edge
- Delete edge
- Delete vertex

When you create an edge, you define both the source- (aka from- or in-) vertex and target- (aka to- or out-) vertex.

Therefore, a description of your operations in graph-theoretic terms:
- Create node = create vertex
- Delete node = delete vertex; you cannot delete a vertex without first deleting any edges which have that vertex as a source or a target, which could disassemble one graph into two
- Map node = create edge; you must choose the source vertex and target vertex
- Remove node = delete node
- Move node = delete edge and create new edge
- Copy node = create new edge
- Copy branch = create new edge

If you’re storing the vertices in one table and the edges in another table (like my example above), then all of those are simple one-row operations (including 'copy branch'). The structure of the hierarchy is completely defined by those two tables. But again, in order to do ¬_connected¬_ operations (all siblings, all children, etc), we will need to truncate and repopulate a @paths table or perform a recursive function.

I think. It still feels like there should be some modolo arithmatic type algorithm that finds siblings. Not sure though. Goooo, now I’ve tried to wrap my head around a concept that it's too small to wrap around. :-( Gonna move on to something else for the afternoon.

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-08-26 : 14:37:58

No no... you are assuming it is not a relatively simple task to update a given tree with new changes. I have done most of the operations, not all because I am at work

...

Heres what I got so far:


/********************************
            00
          / |  
       01  02  03
      /  |  |  / |
    04  05  06   07
   /  \     |    | 
  08  09    10  11  12
********************************/
Declare @nodes table (NodeId int)
Insert Into @nodes Values(1)
Insert Into @nodes Values(2)
Insert Into @nodes Values(3)
Insert Into @nodes Values(4)
Insert Into @nodes Values(5)
Insert Into @nodes Values(6)
Insert Into @nodes Values(7)
Insert Into @nodes Values(8)
Insert Into @nodes Values(9)
Insert Into @nodes Values(10)
Insert Into @nodes Values(11)
Insert Into @nodes Values(12)


Declare @tree table (pNodeId int, cNodeId int, processed bit default(0))
Insert Into @tree (pNodeId, cNodeId) Values (0,1)
Insert Into @tree (pNodeId, cNodeId) Values (0,2)
Insert Into @tree (pNodeId, cNodeId) Values (0,3)
Insert Into @tree (pNodeId, cNodeId) Values (1,4)
Insert Into @tree (pNodeId, cNodeId) Values (1,5)
Insert Into @tree (pNodeId, cNodeId) Values (2,6)
Insert Into @tree (pNodeId, cNodeId) Values (3,6)
Insert Into @tree (pNodeId, cNodeId) Values (3,7)
Insert Into @tree (pNodeId, cNodeId) Values (4,8)
Insert Into @tree (pNodeId, cNodeId) Values (4,9)
Insert Into @tree (pNodeId, cNodeId) Values (6,10)
Insert Into @tree (pNodeId, cNodeId) Values (7,11)
Insert Into @tree (pNodeId, cNodeId) Values (7,12)

--Select * From @Tree

Declare @pad nvarchar(100),
	@lastCnt int
Set @Pad = '0000'

Declare @paths table (path nvarchar(1000), pNodeId int, cNodeId int)

Insert Into @paths
Select 
	path=right(@pad + convert(nvarchar,pNodeId),len(@pad))+';' + right(@pad + convert(nvarchar,cNodeId),len(@pad))+';',
	pNodeId,
	cNodeId
From @Tree where pNodeId=0

Update A
Set Processed = 1
From @Tree as A
Inner Join @paths as B
On A.pNodeId = B.pNodeId
and A.cNodeId = B.cNodeId

While exists(Select * From @tree Where Processed = 0)
Begin
	Insert Into @paths
	Select 
		path=path + case when B.cNodeId is not null then right(@pad + convert(nvarchar,B.cNodeId),len(@pad))+';' else '' end,
		B.pNodeId,
		B.cNodeId
	From @Paths as A
	Left Join @Tree as B
	On A.cNodeId = B.pNodeId
	Where B.Processed = 0

	Update A
	Set Processed = 1
	From @Tree as A
	Inner Join @paths as B
	On A.pNodeId = B.pNodeId
	and A.cNodeId = B.cNodeId
	Where A.Processed = 0
End

--All paths to each nodeId
Select path, cNodeId From @paths Where cNodeId is not null Order By Path


Declare @nodeId int,
	@pNodeId int,
	@cNodeId int


/*********************
Create Node #13
*********************/
Set @nodeId = 13
Insert Into @nodes Values(@nodeId)

/*********************
Map Node #13 to Node #8
*********************/
Set @pNodeId = 8
Insert Into @tree Values(@pNodeId,@nodeId,1)

Insert Into @Paths 
Select 
	Path = path + Right(@pad + convert(varchar,@nodeId),len(@pad))+';', 
	pNodeId = @pNodeId, 
	cNodeId = @nodeId
From @paths 
where cNodeId = @pNodeId

Select path, cNodeId From @paths Where cNodeId is not null Order By Path

/*********************
Map Node #2 to Node #13
*********************/
Set @NodeId = 2
Set @pNodeId = 13
Insert Into @tree Values(@pNodeId,@nodeId,1)

Insert Into @Paths 
Select 
	Path = path + Right(@pad + convert(varchar,@nodeId),len(@pad))+';', 
	pNodeId = @pNodeId, 
	cNodeId = @nodeId
From @paths 
where cNodeId = @pNodeId

Insert Into @Paths 
Select 
	Path = (Select path From @paths Where pNodeId = @pNodeId and cNodeId = @NodeId) + Right(path,len(path)-charindex(Right(@pad + convert(varchar,@nodeId),len(@pad))+';',path)-len(@pad)-1), 
	pNodeId, 
	cNodeId
From @paths 
where Path like ('%' + Right(@pad + convert(varchar,@nodeId),len(@pad))+';%')
and cNodeId <> @nodeId

Select path, cNodeId From @paths Where cNodeId is not null Order By Path


/*********************
Remove Map Node #3 to Node #6
*********************/
Set @NodeId = 6
Set @pNodeId = 3

Delete From @paths where path like (Select path+'%' From @paths Where pNodeId = @pNodeId and cNodeId = @nodeId)
Delete From @tree where pNodeId = @pNodeId and cNodeId = @nodeId

Select path, cNodeId From @paths Where cNodeId is not null Order By Path

/*********************
Move Node #13 From Node #8 to Node #9
*********************/
Set @NodeId = 9
Set @pNodeId = 8
Set @cNodeId = 13

Update @tree Set pNodeId = @nodeId From @tree where pNodeId = @pNodeId and cNodeId = @cNodeId

Update @paths
Set 
	path=replace(path,Right(@pad + convert(varchar,@pNodeId),len(@pad))+';'+Right(@pad + convert(varchar,@cNodeId),len(@pad))+';',Right(@pad + convert(varchar,@NodeId),len(@pad))+';'+Right(@pad + convert(varchar,@cNodeId),len(@pad))+';'),
	pNodeId = case when pNodeId = @pNodeId then @nodeId else pNodeId end
From @paths where path like ('%'+Right(@pad + convert(varchar,@pNodeId),len(@pad))+';'+Right(@pad + convert(varchar,@cNodeId),len(@pad))+';%')

Select path, cNodeId From @paths Where cNodeId is not null Order By Path

EDIT: I'm a Mathematics major, so I guess thats one reason I enjoy this topic so much!!

Corey

portman
Starting Member

11 Posts

Posted - 2004-08-26 : 15:38:14

Ok, I'm with you now. Point is that you do _not_ have to rebuild the @paths table whenever you modify the graph. Depending on what sort of operation it was, you can get away with updating some subset of all the @paths rows.

Helpful to think about the two extreme cases:
- The founder of a company (currently President and CEO) decides the time is ripe to hire a CEO. This CEO will be the new root of the hierarchy, and the President will report to him. You are creating one new vertex and one new edge, however, the affect is that the path for everyone in the organization changes.
- Someone who never had anyone reporting to them (and who only reported to one person) now gets an assistant. You are again creating one new vertex and one new edge, however, no current employees paths' will change.

So what's needed is an exhaustive set of use cases for graph transformations, and a procedure to edit the @paths table for each one. I'll speak in nodes, parents, and children because it's easier to visualize.
1. Add isolated node.
2. Add child to node with no children.
3. Add parent to node with no parents.
4. Make an existing node a parent of another existing node.
5. Remove isolated node.
6. Remove node with no children.
7. Remove a root node (will disassemble graph).
8. Remove a parent-to-child relationships without removing either of the nodes.

If you have @path-editing procedures for 1-8, you should be able to handle anything, right? And the only time that the _whole_ table needs to be rebuilt is the very special case where there is only one root and that root gets a new parent. Otherwise you're only dealing with subsets. #4 and #8 are the ugly ones, but also the most powerful (they handle what was called earlier in this thread a 'move branch' operation). It's high time for some pictures...

portman
Starting Member

11 Posts

Posted - 2004-08-26 : 15:53:29

Sorry to get 'real-world' on you, but here's a particular example hierarchy from Rwanda.

The two pictures below show a 'complex' operation: moving a branch. In graph-theoretic terms, it's a super simple operation: replacing edge c*h with f*h. But in terms of full paths, several nodes are affected (H, L, and P specifically).

[url]http://www.dropstone.org/4corey/graph1.gif[/url]
[url]http://www.dropstone.org/4corey/graph2.gif[/url]

To perform this, we need to update one record of @tree, delete five records of @path, update three records of @path, and keep two records of @path the same.


PATH         ACTION
A/C/H        - delete
A/C/H/L      - delete
A/C/H/L/P    - delete
A/C/I/L      - delete
A/C/I/L/P    - delete
B/C/H        - change to B/F/H
B/C/H/L      - change to B/F/H/L
B/C/H/L/P    - change to B/F/H/L/P
B/C/I/L      - stays same
B/C/I/L/P    - stays same

If there were two procedures (remove edge, add edge) which corrected the appropriate rows from @path then we would be on our way. I'm going to take what you have and create those procedures.

Thanks for the to-and-fro on this, it's been extremely helpful. I'll check back in tomorrow.

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-08-26 : 15:59:24

That sounds about right. I'm not sure that the transformations you have listed are quite correct, but the idea is accurate. I would also implement a QC type system such that the entire structure is built periodically and compared to the updated structure and differences, if found, are addressed.

I would think the only operations would be:
1) Create node (no relationships)
2) Delete node (no relationships)
3) Create relationship
4) Delete relationship
5) Move Node (From parent1 to parent2)

Also, your example about president/CEO => president + CEO:
Use #1 (Create Node) to create the CEO node
Use #5 (Move Node) to move all of the children of president to CEO
Use #3 (Create Relationship) to add president reporting to CEO

Second example, you are creating a new vertex and new edge, but because that 'parent' could show up in multiple places, you just need to insert copys of the parent's path records with the new edge.

Make some pictures, and when I get a chance, I'll build a sample set that exhibits each type of action. I don't think it will be nearly as bad as I had originally thought.

Corey

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-08-26 : 16:07:52

quote:
Originally posted by portman

Sorry to get 'real-world' on you, but here's a particular example hierarchy from Rwanda.

The two pictures below show a 'complex' operation: moving a branch. In graph-theoretic terms, it's a super simple operation: replacing edge c*h with f*h. But in terms of full paths, several nodes are affected (H, L, and P specifically).

[url]http://www.dropstone.org/4corey/graph1.gif[/url]
[url]http://www.dropstone.org/4corey/graph2.gif[/url]

To perform this, we need to update one record of @tree, delete five records of @path, update three records of @path, and keep two records of @path the same.

PATH ACTION
A/C/H - delete
A/C/H/L - delete
A/C/H/L/P - delete
A/C/I/L - delete
A/C/I/L/P - delete
B/C/H - change to B/F/H
B/C/H/L - change to B/F/H/L
B/C/H/L/P - change to B/F/H/L/P
B/C/I/L - stays same
B/C/I/L/P - stays same

If there were two procedures (remove edge, add edge) which corrected the appropriate rows from @path then we would be on our way. I'm going to take what you have and create those procedures.

Thanks for the to-and-fro on this, it's been extremely helpful. I'll check back in tomorrow.

When you get back, I think you'll find that a 'remove + add' is not gong to work well. It would be better to do an update. If you remove the relationship (and it was only linked once) then you have lost the children section of the path. Since you are only trying to move it, updating the relationship to the new relationship maintains the child section.

Nice graphs by the way...

Corey

portman
Starting Member

11 Posts

Posted - 2004-08-30 : 10:22:49

Corey-
Ran some tests over the weekend with large hierarchies. I realized that rebuilding the paths table is actually really bloody fast, even with lots of verticies and edges. The while clause will only loop N times, where N is the length of the longest open walk from a SOURCE vetex to a SINK vertex. (Source vertex is defined as a vertex with indegree = 0; Sink vertex is outdegree = 0.) In many of the real-world complex hierarchies that I am modelling, N is still less than 10, even when the number of vertices is in the tens of thousands. So I'm going to stick with rebuilding the paths table whenever the heirarchy is changed. Thanks again for the help.

portman
Starting Member

11 Posts

Posted - 2004-08-30 : 10:37:24

Oh, also, I don't think that the code you have in your WHILE loop handles all directed graphs. Take this example, for instance:


/********************************
        1 ___
       / \         2   3   4   5
     / \ /     \ /
    6   7       8
  / | \        /  9  10 11     12 13
         \   /
	  \ /
           14
********************************/
Declare @nodes table (NodeId int)
Insert Into @nodes Values(1)
Insert Into @nodes Values(2)
Insert Into @nodes Values(3)
Insert Into @nodes Values(4)
Insert Into @nodes Values(5)
Insert Into @nodes Values(6)
Insert Into @nodes Values(7)
Insert Into @nodes Values(8)
Insert Into @nodes Values(9)
Insert Into @nodes Values(10)
Insert Into @nodes Values(11)
Insert Into @nodes Values(12)
Insert Into @nodes Values(13)
Insert Into @nodes Values(14)


Declare @tree table (pNodeId int, cNodeId int, processed bit default(0))
Insert Into @tree (pNodeId, cNodeId) Values (1,2)
Insert Into @tree (pNodeId, cNodeId) Values (1,3)
Insert Into @tree (pNodeId, cNodeId) Values (1,4)
Insert Into @tree (pNodeId, cNodeId) Values (2,6)
Insert Into @tree (pNodeId, cNodeId) Values (2,7)
Insert Into @tree (pNodeId, cNodeId) Values (3,7)
Insert Into @tree (pNodeId, cNodeId) Values (4,8)
Insert Into @tree (pNodeId, cNodeId) Values (5,8)
Insert Into @tree (pNodeId, cNodeId) Values (6,9)
Insert Into @tree (pNodeId, cNodeId) Values (6,10)
Insert Into @tree (pNodeId, cNodeId) Values (6,11)
Insert Into @tree (pNodeId, cNodeId) Values (8,12)
Insert Into @tree (pNodeId, cNodeId) Values (8,13)
Insert Into @tree (pNodeId, cNodeId) Values (11,14)
Insert Into @tree (pNodeId, cNodeId) Values (12,14)

portman
Starting Member

11 Posts

Posted - 2004-08-30 : 10:40:11

Sorry, let me expand on that a bit: specifically, the process of using a PROCESSED flag doesn't work for all graphs. Some edges need to be processed more than once in order to enumerate all of the paths. For instance, in my above example, node 14 has three distinct paths to it:

1->2->6->11->14
1->4->8->12->14
5->8->12->14

However, not all of them show up. I am using a different logic -- more resource expensive, unfortunately -- that I will post as soon as I clean it up.

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-08-30 : 10:56:31

Hey! Your example doesn't show that 5 is a 'entry point'

for example, to see the structure you expected, I would have expected either a root point '0' with mappings of (0:1) & (0:5)
or
in the following query (which starts off the tree), you should have listed your entry points (1 & 5)

Insert Into @paths
Select
path=right(@pad + convert(nvarchar,pNodeId),len(@pad))+';' + right(@pad + convert(nvarchar,cNodeId),len(@pad))+';',
pNodeId,
cNodeId
From @Tree where pNodeId in (1,5) --Instead of pNodeId=0

Corey

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources