Article: More Trees & Hierarchies in SQL

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

Site Related Forums

Article Discussion

Article: More Trees & Hierarchies in SQL

Author

Topic

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2002-05-01 : 17:37:25

Hierarchies are sometimes difficult to store in SQL tables...things like trees, threaded forums, org charts and the like...and it's usually even harder to retrieve the hierarchy once you do store it. Here's a method that's easy to understand and maintain, and gives you the full hierarchy (or any piece of it) very quickly and easily.

Article Link.

Onamuji
Aged Yak Warrior

504 Posts

Posted - 2002-05-01 : 18:02:27

it seems everytime i post something on the more fun topics rob always comes out with an article a few days later ;-) heh seems like it at least for the last 3 articles :-p he always gives a superior technique that makes me very excited and want to .... never mind ... good artcile because i use hierarchies all over the place ...

rrb
SQLTeam Poet Laureate

1479 Posts

Posted - 2002-05-01 : 20:15:53

Thanks Rob. I too found Joe's stuff a little daunting...I think I like this approach and can't wait for your next installment. Will you post a link here when you're done so we can subscribe "in advance"?

Ta

--
I hope that when I die someone will say of me "That guy sure owed me a lot of money"

AjarnMark
SQL Slashing Gunting Master

3246 Posts

Posted - 2002-05-01 : 20:30:22

GREAT Article, Rob! I was intrigued by Joe Celko's technique, but I too was a bit daunted by it. Learned some good things from it, though.

Just about to get started on an org chart type of project, and definitely like what you've got here. Looking forward to part 2, too! Haven't really done anything with XML. This should be fun!

hello_ash
Starting Member

10 Posts

Posted - 2002-05-02 : 00:34:52

Great article, I faced same problem but i solved using recursion but now i got very new idea.
Tahnx Rob.

Lavos
Posting Yak Master

200 Posts

Posted - 2002-05-02 : 02:55:52

While this is a usable technique, it still falls down when you need to represent a graph instead of a tree.

The classical example is a bill of materials where you have 4 different parts that use the same bolt (or share a common component. This is something that I've had to do a short while ago.), but you can still run into it with an org chart.

For instance, a little over month ago I was put "on loan" to the IS department. I had one person who was nominally in charge of the programming work I was doing (but was too busy to really look at it, so I had free reign to design and implement), the IS manager, and my normal manager who I still reported to.

Now that I'm back doing my "secretary" work, I still have one immediate supervisor and nominally report to the second shift superintendant also. (at the same time, when my immediate supervisor is gone, I'm in charge of the department, even though I have no authority. lol.)

In those cases, you're still back to an iterative process, and storing the relationships in another table to maintain normalization. (I think that is also an adjacency model, but I could be wrong on the exact terminology.)

Don't get me wrong, there are some nice aspects of this approach, but it's not something that I've been able to use in any of my projects. (I haven't dug into the nested set model for the same reason.)

----------------------
"O Theos mou! Echo ten labrida en te mou kephale!"

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-05-02 : 08:56:36

A-ha! Actually, you can create any number of independent trees with this structure, that's why I spun off the data into a Tree table. You could simply create a "Boss2" column, then run the lineage UPDATE for Boss2 instead of Boss/Boss1. Or, add a new node with the same EmployeeID, with the appropriate parent node and lineage. That's why splitting the employee info from the node info is a good idea; it lets you do things like this.

I'll elaborate on this a little more in Part 2, but oh yeah, you can definitely have two bosses with this setup, you just have to do a little more work with it.

Lavos
Posting Yak Master

200 Posts

Posted - 2002-05-02 : 11:24:33

I noticed the independant tree feature, but things still break down for graphs.

I'm going to move back to the bill of materials example.

Part Foo is made up of a Bar and a Thinga. A Thinga is made up of a processed Majob.

You have a part FooX, that's made up of the brand new Barzac, but also still uses the same Thinga.

What happens when you re-engineer the Thinga, and make it out of a processed SomethingMaRether instead? You've got it in several trees so you have to alter each tree.

Granted, that is still relatively easy to do, but my normalization instincts are popping in to tell me it's bad. (Unfortunately, I'm still too new at SQL to know that some denormalization is perfectly acceptable ;)

----------------------
"O Theos mou! Echo ten labrida en te mou kephale!"

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-05-02 : 11:53:38

I agree that the lineage method is probably not as ideal for nested sets/assemblies/BOM structures as Joe's nested set model is. I personally haven't made the mental leap that treats trees and nested sets/BOM as the same thing. The diagrams can be made to look the same, but in reality they do not function the same way.

The bill of materials structure models something a little different from an org chart. Each part is part of a larger assembly, but that part cannot just be wantonly moved or removed from the assembly. You can't take a bolt in an automobile engine and promote it to be the parent assembly of the entire engine.

On the other hand, a person in a corporate organization can be transferred, promoted or removed anywhere in the organization, and may or may not inherit subtrees of that organization. This was what I was concerned with when putting this method together. Since the hierarchy is truly arbitrary, it was intended to support any kind of node manipulation without regard to its current relationship to the hierarchy.

Regarding the example you gave, a "Thinga" is actually a subtree, not a single node. It is a bit harder to manipulate subtrees using lineage vs. nested sets. If subtrees are more important than single nodes, I would recommend nested sets. If nodes are more important, lineage will probably work better for you.

dsdeming

479 Posts

Posted - 2002-05-02 : 14:22:37

This is an interesting method. I can't wait to see the next installment.

I only have one problem: whenever I tried to enter the drummer's name into the table, the data just disappeared. It's like it spontaneously combusted or something.

Lavos
Posting Yak Master

200 Posts

Posted - 2002-05-02 : 18:18:33

Ahh, but the nodes and the sub-trees are equally important in a bill of materials (As an example, we track the coil steel that we buy, as well as the parts that we stamp out of the steel that later get welded/bolted/sold), and there is promotion and demotion. For instance, we could buy a component from a vendor (a node) and then decide that we can make that component ourselves, and so it would then be a subtree. (or vice versa.) Or, it's possible to re-engineer the production line so that instead of assembling components at another line, they are added to the finished part directly (and thus cutting out middle management? Or at least the intervening compl's)

I think the BOM is conceptually identical to an organizational chart. If I just change the nouns and verbs, it would descibe the same thing.

That said, I did just notice a feature of this approach.

With the multiple independant trees, you don't _have_ to have the same children in each tree. (an example, a supervisor has two different managers. Some of the people he supervises should fall under one manager, while the rest would be under the other.) This has a distinct advantage (IMHO) over an adjacency model (Where you would have to store a "Department" code in the relationship table, and I've thought of a few other pieces of ugliness that this introduces.)

So, my main question would be how to maintain the independant trees, when there _are_ interelations? (In the above example, one of the supervisor's employees falls underneath both manager's departments, or there is another manager who is over everyone.)

----------------------
"O Theos mou! Echo ten labrida en te mou kephale!"

rrb
SQLTeam Poet Laureate

1479 Posts

Posted - 2002-05-02 : 19:28:20

quote:

I only have one problem: whenever I tried to enter the drummer's name into the table, the data just disappeared. It's like it spontaneously combusted or something.

That's because you didn't do
insert into tap (drummer)
select name from gardeners where methodofdeath="bizarre accident"

PS - it was cocaine...

--
I hope that when I die someone will say of me "That guy sure owed me a lot of money"

jcelko
Esteemed SQL Purist

547 Posts

Posted - 2002-05-06 : 13:44:23

>> However (you knew this was coming!) one of the issues I have with nested sets is the complexity required to do relatively simple tasks, like adding, deleting, or moving nodes in the tree. Even finding an employee's immediate supervisor or subordinates requires 3 self-joins AND a subquery! <<

I havea book on nothing but trees in SQL in the works right now. over the years, people have found a bunch of neat tricks with the nested set model.

To insert a new node, G1, under part G. We can insert one node at a time like this:

BEGIN ATOMIC
DECLARE right_most_sibling INTEGER;

SET right_most_sibling
= (SELECT rgt
FROM Frammis
WHERE part = 'G');
UPDATE Frammis
SET lft = CASE WHEN lft > right_most_sibling
THEN lft + 2
ELSE lft END,
rgt = CASE WHEN rgt >= right_most_sibling
THEN rgt + 2
ELSE rgt END
WHERE rgt >= right_most_sibling;

INSERT INTO Frammis (part, qty, wgt, lft, rgt)
VALUES ('G1', 3, 4, parent, (parent + 1));
COMMIT WORK;
END;

The idea is to spread the lft and rgt numbers after the youngest child of the parent, G in this case, over by two to make room for the new addition, G1. This procedure will add the new node to the rightmost child position, which helps to preserve the idea of an age order among the siblings.

To convert a nested sets model into an adjacency list model, which is the same as finding the immediate subodinates:

SELECT B.emp AS boss, P.emp
FROM OrgChart AS P
LEFT OUTER JOIN
OrgChart AS B
ON B.lft
= (SELECT MAX(lft)
FROM OrgChart AS S
WHERE P.lft > S.lft
AND P.lft < S.rgt);

--CELKO--
Joe Celko, SQL Guru

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-05-08 : 16:51:24

does anyone have an example of a good method for converting an adjacency model hierarchy table into a nested-set model table?


create table #adjacency (
    parent varchar(20) not null,
    child  varchar(20) not null )

insert #adjacency
select 'mona','lisa' union
select 'mona','joe' union
select 'joe','timmy'

create table #nestedset (
    item varchar(20),
    lft int,
    rgt int )

insert #nestedset (item,lft,rgt)
select ?????

I have an adjacency-model type table with several hundred thousand records and 12 levels of depth . . . a set-based conversion method would be best.

<O>

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-05-08 : 16:58:02

In the article, there are several links to Joe's articles on nested sets, one of them has the code for doing the conversion.

jcelko
Esteemed SQL Purist

547 Posts

Posted - 2002-05-09 : 13:51:16

>>does anyone have an example of a good method for converting an adjacency model hierarchy table into a nested-set model table? ... I have an adjacency-model type table with several hundred thousand records and 12 levels of depth . . . a set-based conversion method would be best. <<

I never found a pure set based method. To convert an adjacency list model into a nested set model, use a push down stack algorithm. Assume that we have these tables:

-- Tree holds the adjacency model
CREATE TABLE Tree
(emp CHAR(10) NOT NULL,
boss CHAR(10));

INSERT INTO Tree
SELECT emp, boss FROM Personnel;

-- Stack starts empty, will holds the nested set model
CREATE TABLE Stack
(stack_top INTEGER NOT NULL,
emp CHAR(10) NOT NULL,
lft INTEGER,
rgt INTEGER);

BEGIN ATOMIC
DECLARE counter INTEGER;
DECLARE max_counter INTEGER;
DECLARE current_top INTEGER;

SET counter = 2;
SET max_counter = 2 * (SELECT COUNT(*) FROM Tree);
SET current_top = 1;

INSERT INTO Stack
SELECT 1, emp, 1, NULL
FROM Tree
WHERE boss IS NULL;

DELETE FROM Tree
WHERE boss IS NULL;

WHILE counter <= (max_counter - 2)
LOOP IF EXISTS (SELECT *
FROM Stack AS S1, Tree AS T1
WHERE S1.emp = T1.boss
AND S1.stack_top = current_top)
THEN
BEGIN -- push when top has subordinates and set lft value
INSERT INTO Stack
SELECT (current_top + 1), MIN(T1.emp), counter, NULL
FROM Stack AS S1, Tree AS T1
WHERE S1.emp = T1.boss
AND S1.stack_top = current_top;

DELETE FROM Tree
WHERE emp = (SELECT emp
FROM Stack
WHERE stack_top = current_top + 1);

SET counter = counter + 1;
SET current_top = current_top + 1;
END
ELSE
BEGIN -- pop the stack and set rgt value
UPDATE Stack
SET rgt = counter,
stack_top = -stack_top -- pops the stack
WHERE stack_top = current_top
SET counter = counter + 1;
SET current_top = current_top - 1;
END IF;
END LOOP;
END;

You wil need to translate my SQL/PSM into whatever target SQL 4GL you have, or put it into a host language program.

--CELKO--
Joe Celko, SQL Guru

jcelko
Esteemed SQL Purist

547 Posts

Posted - 2002-05-09 : 13:54:38

Arrgh! I copied the old file:

INSERT INTO Stack
SELECT 1, emp, 1, max_counter -- changed!
FROM Tree
WHERE boss IS NULL;

DELETE FROM Tree
WHERE boss IS NULL;

WHILE counter < max_counter -- changed !
LOOP ...

--CELKO--
Joe Celko, SQL Guru

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-05-10 : 08:17:23

Sir Celko, I appreciate your response. I actually have been working with the push down stack model code that I adapted from another article of yours.

I am having difficultiy with one aspect. What happens when your tree data looks like this...


create table tree (
    emp char (10) not null,
    boss char (10) )

insert tree values ('Jerry',NULL)
insert tree values ('Bert','Jerry')
insert tree values ('Chuck','Jerry')
insert tree values ('Donna','Chuck')
insert tree values ('Eddie','Chuck')
insert tree values ('Fred','Chuck')
insert tree values ('Page47',NULL)
insert tree values ('Sloan','Page47')

...Such that there are multiple trees in the forest, ie. multiple roots? I am assuming that I would want tree to look like....


emp        lft         rgt         
---------- ----------- ----------- 
Jerry      1           12
Bert       2           3
Chuck      4           11
Donna      5           6
Eddie      7           8
Fred       9           10
Page47    13           16
Sloan     14           15

I am trying to work through the allpairs and leftmostpairs views and the TreeMerge procedure described here. Is this the best method for my situation or should I work towards adapting the stack algorithm for multiple roots?

quote:

The woods are lovely, dark and deep.
But I have promises to keep,
And miles to go before I sleep,
And miles to go before I sleep.

Thanks...

<O>

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-05-10 : 16:09:41

I thought I might include my SQL Server translation of the adjacency to nested set conversion code as adapted from the Celko article. This seems to deal with a multi-root adjacency model pretty well....

First the DDL . . .


create table adjmodel (
    emp char (10) not null,
    boss char (10) )
go

create table setmodel (
    emp char(10),
    boss char(10),
    lft integer,
    rgt integer )
go

create view allpairs
as
select
    t.boss as superior,
    t.emp as subordinate
from
    setmodel t
where
    exists ( select 1
             from setmodel
             where boss = t.emp) and
    t.boss <> t.emp
go

create view leftmostpairs
as
select distinct
    superior,
    (select
         min(subordinate)
     from
         allpairs
     where
         a.superior = superior) as subordinate
from
    allpairs a
where
    superior = (select min(superior)
                from allpairs)
go

create index idx_adjmodel_boss on adjmodel(boss)
create clustered index idx_adjmodel_emp on adjmodel(emp)
create index idx_setmodel_empboss on setmodel(emp,boss)
create index idx_setmodel_lftrgt on setmodel(lft,rgt)
go

...here is some sample data...


insert adjmodel values ('Jerry',NULL)
insert adjmodel values ('Bert','Jerry')
insert adjmodel values ('Chuck','Jerry')
insert adjmodel values ('Page47',NULL)
insert adjmodel values ('Donna','Chuck')
insert adjmodel values ('Eddie','Chuck')
insert adjmodel values ('Fred','Chuck')
insert adjmodel values ('Sloan','Page47')
go

...here is a proc needed...


create proc adjmodelMerge
    @superior char(10),
    @subordinate char(10)
as

declare
    @size int,
    @insert_point int

select
    @size = 2 * ( select
                      count(*)
                  from
                      setmodel
                  where
                      emp = @subordinate ),
    @insert_point = ( select
                          min (lft)
                      from
                          setmodel
                      where
                          emp = @subordinate and
                          boss = @superior ) - 1

update
    setmodel
set
    boss = case
               when boss = @subordinate then case
                                                 when emp = @subordinate then null
                                                 else @superior
                                             end
               else boss
           end,
    lft = case
              when (boss = @superior and lft > @size) then lft + @size
              else case
                       when (boss = @subordinate and emp <> @subordinate) then lft + @insert_point
                       else lft
                   end
          end,
    rgt = case
              when (boss = @superior and rgt > @size) then rgt + @size + 2
              else case
                       when (boss = @subordinate and emp <> @subordinate) then rgt + @insert_point
                       else rgt
                   end
              end
where
    boss in (@superior, @subordinate)

delete
    setmodel
where
    boss is null or
    emp is null
go

...and finally the conversion dml...


insert setmodel
select distinct
    t.boss,
    t.boss,
    1,
    2*(select count(*) + 1 from adjmodel where t.boss = boss)
from
    adjmodel t

insert setmodel
select distinct
    t.emp,
    t.boss,
    2*(select count(distinct emp) from adjmodel where emp <= t.emp and t.boss = boss), --in (emp,boss)),
    2*(select count(distinct emp) from adjmodel where emp <= t.emp and t.boss = boss) + 1 --in (emp,boss)) + 1
from
    adjmodel t

delete setmodel
where boss is null or emp is null

declare @boss char(10), @emp char(10)
while exists (
        select 1
        from
            leftmostpairs)
begin
    select
        @boss = superior,
        @emp = subordinate
    from
        leftmostpairs

    exec adjmodelmerge
        @superior = @boss,
        @subordinate = @emp

end

--and the proof
select * from setmodel order by boss,lft

The boss column is the tree 'owner'; that is to say the emp at the root of that particular tree. The lft/rgt values refer to the node position in the boss's tree only.
Now, I was not able to make the Page47 tree use the 13/16 lft/rgt values, but boss column seems to square that away.

I would love input from any intereste party.

(Rob, I like your model too and I don't mean to hi-jack this thread . . . hope you understand :) )

<O>

Dready
Starting Member

1 Post

Posted - 2002-05-23 : 10:16:14

It's funny... I used your 'lineage' tip in my little web project: http://yesi.dread.ws/ . This is a MySQL webapp, but in an SQL approach I took the method you explain.
The only thing I don't understand is why you keep a 'depth' field in your SQL Tree table: your 'lineage' field contains the way (in terms of node ids) to the root node, so the 'lineage' field already contains the 'depth' information.
Anyway nice article !

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-05-23 : 12:06:32

Thanks! It's a convenience really, you don't need it, but it takes up almost no space (you can make it a tinyint - 1 byte for up to 255 levels!) In case you need to know the depth, or count back or forward in levels, it's handy. You'd have to do something like this to calculate it from the lineage column:

SELECT Len(Lineage)-Len(Replace(Lineage,'/','')) AS Depth FROM Tree

Depending on your SQL product, the Len() and Replace() functions may not exist.

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources