Re: Is There a Java Class for this Kind of Data Structure?

From:

"Oliver Wong" <owong@castortech.com>

Newsgroups:

comp.lang.java.programmer

Date:

Fri, 13 Jul 2007 16:44:35 -0400

Message-ID:

<TuRli.4153$Fa7.130288@wagner.videotron.net>

"Hal Vaughan" <hal@thresholddigital.com> wrote in message
news:D82dnSqdhrZdTArbnZ2dnUVZ_tadnZ2d@comcast.com...

Oliver Wong wrote:

Trees
typically don't lend themselves well to modeling things where
uncle/aunt
relationships are important. Are you using the tree structure for other
reasons than this updating-algorithm?

I was looking into a tree because I thought that would be the easiest
way to
list all the tables by level or generation and step through them.
Remember, I'm self taught.

    Picking the right data structure for a given problem is one of those
things you just acquire through experience, I think. It's difficult for me
to say for sure, because I don't know what you're doing with your tables,
but based on what I heard so far, an acyclic directed graph sounds like
the the data structure I'd used to model the dependencies between your
tables.

    Presumably, your program is trying to "do something": model global
warming, schedule airplane flights, track inventory in a warehouse, etc.
This "thing" that it is trying to do exists independently of your program;
that is, you could do it manually if you needed to, but it might be
painful, boring, repetitive, etc. This "thing" that is independent of your
program is your "problem domain".

    So what I'm wondering is whether the entities in your problem domain
naturally have a tree-like structure to them. If so, then it might be
worth it to stick with trees in your program that reflect the structure of
the problem domain, and use the queue-based algorithm mentioned below.
Otherwise, it might be worth chucking the trees in favour of the acyclic
directed graph (http://en.wikipedia.org/wiki/Acyclic_directed_graph).

A "directed graph" basically means that edges between nodes have
directions associated with them. Here's a non-directed graph:

a---b----c
| |
| |
+--------+

And here's a directed graph:

a-->b<-->c
| ^
| |
+--------+

An acyclic directed graph is simply a direct graph in which there are no
cycles. Or to put it another way, it's a graph in which when you follow
the directions of the arrows, no matter where you start, it's impossible
to end up in the same location you've been to previously.

The above directed graph is not acyclic, because you can start at C, then
go to B, then go to C again. If you make this minor change, the graph
becomes an acyclic one:

a-->b<---c
| ^
| |
+--------+

From A, you can go to B or C. At B, you're stuck. And C, you can only go
to B. There's no way for you to visit a node twice while following the
edges.

If not, you might instead want to
look into using a directed acyclic graph, where each edge in the graph
represents a dependency. That way, when a change in one table occurs,
it
can notify all of its dependents (which may be children, nephew/nieces)
who in turn notify their dependents and so on.

Actually, that's part of how I was thinking of using the tree. Whenever
the
table data was changed, update() was called and would step through all
the
dependent tables. The problem, as I've said, is I didn't have them
listed
by level, but just by following a branch.

    Dependencies are usually best modelled as an acyclic graph. To take
your earlier example, with nodes A, B, C, D, E, F, G, the graph would look
like:

          ,-<-B<-.
E<-\ / \
    *-<-*---<-C<---*-<-A
F<-/ \ /
          `-<-D<-'

That is, A has 3 edges, one pointing to B, one pointing to C and one
pointing to D. B, C and D each have 2 edges, one pointing to E, and one
pointing to F.

When you do your updates, you just follow all possible paths along the
graph. So if you update B, you know you also need to update E and F. When
you update A, you know you need to update the whole graph.

The reason the graph needs to be acyclic is to avoid infinite update
looks. Let's say you had the following cyclic graph:

A<-->B

And you wanted to update A. This would trigger an update in B, which would
then trigger an update in A, and so on back and forth forever.

An easy way to implement this is to have each node (or table) keep a list
of all of the other nodes it needs to notify when it gets updated. So A
would keep B, C and D in its list. B would keep E and F in its list, and
so on.

Otherwise, if the tree structure is important, you could modify
your
updating-algorithm so that it uses a breadth-first traversal instead of
a
depth-first traversal. Usually this means making your method
non-recursive, and instead having a queue with a list of nodes that
need
to be updated, and you iterate over the queue until it's empty (meaning
all your work is done). A breadth-first traversal means that all the
nodes
at level N will be updated before all the nodes at level N+1.

Right now I'm experimenting with a loop that I wouldn't actually label
as
recursive but is close. It starts with a table, makes that table the
first
node in the list, then the loop starts. It gets the children of that
table
and makes a list with them as nodes and that new sublist becomes the
next
node in the main list. Then it looks at all the grandchild tables and
makes another sublist of them and that becomes the 3rd node on the main
list. I end up with a main list and each node is a sublist. The first
sublist is only 1 item long, containing the initial table. The 2nd
node/sublist is all the children, the 3rd node/sublist is the
grandchildren
and so on.

I looked into it several ways since I don't like writing a loop to
gather
data, then another to go through that same data, but found if I tried
one
loop to process, there were times I'd have to step back one level, so I
decided to list it all. The only actual recursion at this point is
getting
a list of tables from one table, then doing the same from each table in
the
list.

    What you've written is essentially what I described as being a
breadth-first traversal using a queue (except you need not make these
sublists). I think generally, it would not be considered a form of
recursion. I think (but am not sure) that the technical definition of
recursion requires a method to call itself either directly or indirectly.

    To clarify a potential confusion: when you're using this breadth-first
traversal algorithm you need to be working with the tree data structure,
and not the acyclic directed graph data structure.

    Recall that the tree was:

     A
     |
   +-+-+
   B C D
   | |
   E F

with A as the root, B, C, D as its children, and E the child of B, F the
child of C.

    If you wanted to update A, for example, you'd add A to the queue. Now,
you take the first element out of the queue (which is "A", leaving the
queue now empty), and then you do whatever updates you need to do on "A",
and then add all of its children in the queue (B, C and D). If the queue
is empty, you're done. But it's not empty, so you repeat: Take the first
element out of the queue (which is "B", leaving the queue as {C,D}),
update it, and add all of B's children to the queue (the queue is now
{C,D,E}). You keep repeating this until the queue is completely empty,
which should result in your processing each node exactly once.

    The drawback of this algorithm is that you must always process the
entire tree. That is, you cannot, for example, only update B and its
dependents, because adding B to the queue will eventually trigger the
processing of E, but you'll miss the processing of F.

    Having the dependency list as an acyclic graph allows you to only
process the parts of the trees that need to be processed.

    - Oliver