Re: Best class decompiler?
On 07/04/2010 07:12 PM, Tom Anderson wrote:
On Sun, 4 Jul 2010, Joshua Cranmer wrote:
In any case, interest in decompiling has significantly waned over the
past decade or so. A project or two on sourceforge claim to support
Java 5 decompilation, but I haven't tested it in depth.
I wonder if the driver of the fall of decompilation is the rise of open
source, and perhaps also open standards. If your landscape consists of,
say, the JDK, JBoss, Spring, and Hibernate, then there are easier and
more reliable ways to get hold of source code than decompilation.
I think a better explanation is that it was never really a widespread
avenue of research to begin with. Academically, it consists of
disassembly [1], control structure identification, and typing and
variable analysis. The middle part is pretty much a solved problem, and
I'm reasonably sure that the type/variable analysis is also pretty well
solved. Disassembly has, by and large, remained generally difficult for
native code, but great strides have been made in the last 20 years or so.
Since Java bytecode doesn't mash data and code together in the same
space, and given how much of the structure information is left in the
bytecode, it induced a massive spurt in decompilers because it was easy
to decompile. I'm guessing this spurt was more of a proof-of-concept
than a full-blown branching out. Since fully automated disassembly is
the most unsolved portion of decompiling, Java is academically
uninteresting to decompile; furthermore, you don't need to go the full
decompiler route to showcase improvements in disassembler. On top of all
of this, one of the major problem classes for reverse engineering in
general is dealing with malware, which mostly exists in native code and
not bytecode languages. You can see that there are a handful of
decompilers, defunct or otherwise, for other bytecodes (I know of two or
three for both Python and .NET); the only two languages which have a
large number of decompilers are Java (because it was easier) and C
(because it was harder).
In short, academically, Java decompilers are effectively solved, but
maintaining an up-to-date decompiler for Java (or any other bytecode
language) is not something many people wish to do. This has probably
been true since before Java was created: the lack of modern decompilers
is probably more attributable to an abnormal interest generated by Java
being the first major bytecode language in existence.
For an open source project to survive, it needs a critical threshold of
developers. The Java decompiler market is already crowded with several
"good enough" solutions, C decompilers are effectively beyond the start
of the art [2], and the interest for other markets is generally
insufficient to sustain even a small operation. Perhaps a tool which
could become the "gcc" of decompilers (able to go from many source
architectures to many destination languages) might achieve this
threshold. But unless a tool achieves substantially better results, it
is probably not going to be successful as a project.
[1] I'm glossing over a lot of stuff here which is actually quite
difficult for native code, but many of the problems don't exist in Java.
[2] In the sense of fully-automated decompilation. x86 disassembly is a
royal pain in the butt; while there exist tools that can do this well
(IDA!), I'm not aware of anything that could be used in open-source
software [3].
[3] On reflection, I suppose LLVM is utilizing its x86 assembly
architecture for disassembly (for debugging purposes).
--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth