Re: Distance normalized TSP algorithm

From:

Joshua Cranmer <Pidgeot18@verizon.invalid>

Newsgroups:

comp.lang.java.programmer

Date:

Sun, 27 Jul 2008 02:56:18 GMT

Message-ID:

<mtRik.133$Ht4.106@trnddc01>

JSH wrote:
> That's a more general assumption than what she made and sounds good
> intuitively but you haven't even begun to give any kind of proof.
>
> But it is a route to a counterexample to this idea: use two "traps"
> and prove that the algorithm as described cannot give an optimal
> path.

In working this out in my head, I realized there are two cases where the
algorithm will generate correct pathing for one of these traps (or you
could view it as one case): either the trap node has to be the start
position, or both walkers have to hit the entrance nodes at the same
time, i.e., the trap node is the halfway-point in the path.

Let me do the best to explain the graph. All weights are assumed to be
100 if not explicitly stated. This is also going to be an undirected
graph, so all weights are symmetrical.

A "trap" is defined as a triad (1, 2, 3) where 1 has weight 10000 to all
other points (unless otherwise stated), 1000 to 2 and 3, and the 2-3
weight is 1.

The graph has two traps, (A, B, C), and (D, E, F), as well as two more
nodes G and H. The A-to-D connection has 30000. The connection between B
and E is 5, as is the connection between C and G, G and H, and H and F.
If I have appropriately assigned the weights, the best cycle is
ABEDFHGCA, which has a total cost of 4040.

Should the algorithm start at A, the travelers will move to B and C,
then E and G respectively. At that point, the travelers will move to F
and H, forcing them to take the expensive 2 * 10000 to get to G, which
is obviously much more than the first path. Starting at B or C produces
worse results, as the travelers forgo the cheap connections to A.
Starting at G will move the travelers to H and C, then F and B, forgoing
the cheap connections to both A and D. Graph reflection will show that
similar results occur for the other nodes D, E, F, and H.

As a good lesson for generating similar counterexamples, this is how I
generated the graph:

I first noticed the problem of the trap, aided by Patricia's posts. Your
solution to this was to start at the central node, so a counterexample
involving traps would need to find somewhere to force you to start
elsewhere, i.e., using two traps. This was the source of my "intuition."

To find a concrete algorithm, I visualized what the counterexample graph
should look like: two traps with a minimal number of nodes. I realized
at this point that putting the traps diametrically opposed would not
work, so I inserted a node on one side to force one traveler to get
there first.

Here, I realized that might not work, since the two travelers would then
be considering the node at the same time, and I wasn't sure I could
adjust node weights properly to get them to miss in the right way, so I
added another node on the longer side to ensure that the
simultaneously-considered node was not part of the cheap connection to
the trap. Note that arbitrary numbers of nodes could be added to avoid
longer lookahead loops.

At that point, I added the weights to complete the graph. Factors of
ten, for this small a graph, were sufficient. The conditions that make
up a trap are by now well-documented, so those weights were easy; to be
sure that no one would shortcut the traps I tripled the A-D line
probably unnecessarily. I forced the traveler onto the path by lowering
those connections (the factor of ten was probably unnecessary; I
originally used two, but added more to be on the safe side).

For simple algorithms, the qualitative analysis tends to be sufficient
(unless the constraints are such that I need to watch the balance of
weights); I included quantitative values mostly to be complete and to
head off any rebuttals that it wasn't sufficient.

So if I'm right, it easily solves TSP when cost matches well with
distance, and has problems if it does not, like with the "traps" you
mention.

If I restricted myself to Euclidean graphs, I could probably come up
with counterexamples. However, I am (at this point) still working with
only my innate visualization skills, which means I am heavily relying on
qualitative analysis to reach conclusions, whereas Euclidean
constraints, most notably the triangle inequality, require me to be more
precise with numbers, as I explained above.

I have suspicions for why your intuition is wrong in this case, but of
course, I want you to be wrong, so that doesn't say much.

Everyone would love for counterexamples to be wrong. :-)

Trouble with from scratch problem solving is it can take time, which
is why I sound differently this week than last, as last week, I'd just
come up with this approach. Kind of just came fully formed--why not
use two travelers with one going forward and one going backwards in
time?

Having the two travelers move in tandem doesn't seem to be gaining much
at this point; in fact, it's probably producing the same results as a
one-traveler algorithm except as O(N^4) instead of O(N^3). It certainly
seems to be acting as such in these heavily-massaged edge cases.

I'm not sure having two travelers will ever add much; the most common
case where an algorithm involves such two travelers is the Bidirectional
Search, where its improvement over BFS (or other corresponding
unidirectional searches) comes from the fact that the algorithm is
solved by local optimization, so having two steps toward the solution
improves time by a constant factor by not considering as many next
steps. TSP, however, needs to meet each node, so no benefit can be
derived in the same way by looking at only half (very rough estimate)
the nodes.

If you wish to continue on in these probings, my recommendations would
be to explore making the two interact in more dynamic ways. For example,
one could theoretically have one traveler somehow "warn" the other
traveler that what looks like a nice downhill path is actually the
steepest way up the mountain. This would probably mean dropping the
notion that one traveler is forwards in time and the other backwards, or
dropping that the two make moves at the same time.

So this entire algorithm is like a baby, and everything is fresh
ground as I slowly work my way through theories, implications, rough
guesses and complete b.s. in the hopes of getting an answer.

A good way to continue in probing algorithms is to look at the failures
and mull over why they fail. If P=NP, that would imply that the global
nature of the problem can be reduced to some statements about local
moves. That means that every class of graph that fails the greedy TSP
would have some step in the algorithm that redistributes expected
weights in such a way that the class succeeds.

The task, then, of proving P = NP in this manner would require us to
construct a list of all cases that fail the greedy (locally optimal)
TSP, construct a second graph whose edge weights are assigned in such a
manner that a greedy solution on /that/ graph solves the original graph,
and then prove that such an algorithm works for all failure cases and
that the list of failure cases is complete. This approach is similar to
how the four color theorem was proved.

So, the list to date:

Greedy TSP Failure Classes:
1. The smallest weight outgoing edge from node A goes to a node B, while
the smallest weight ingoing and outgoing edge from node C is node A and
node B, respectively, i.e., the "traps" we have repeatedly discussed.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth