Re: does the following code function as expected?
Aryeh M. Friedman wrote:
Assuming standard Java naming conventions does the following code (in
the general case) do the following:
1. List all imported packages and/or explicitilly named dotted classes
2. List all simple class names
3. Not list keywords, literals, instance names
4. For any thing matching items 1 and 2 list them only once in
output
Without even looking at your source code, I can tell you that the answer
is almost definitely no (it happens to still be no even after looking at
the source code).
JLS 3 clearly states that the Unicode-escape processing happens /before/
any other processing, and therefore this processing is needed to handle
any application precisely (however, having written source-level
analyzers myself, I can say that this requirement is mostly esoteric).
FileReader rd = new FileReader("Main.java");
StreamTokenizer st = new StreamTokenizer(rd);
The needed tokenizer is much more complex than StreamTokenizer. The JLS
provides an explicit description on the entire tokenization process, so
custom-writing a tokenizer is not terribly difficult (modulo Unicode
escapes).
boolean endImports=false;
Set<String> out=new HashSet<String>();
Yes, a Set is probably sufficient.
[ cut parsing method ]
Glaring errors:
1. Your code does not appear to take into account names embedded in strings.
2. Ditto for names embedded in comments.
3. Proper resolution of types of identifiers can only be properly done
through semantic analysis of the various expressions. Class names crop
up in a surprising number of places in the Java grammar.
4. I feel that your method for determining the end of imports is
incorrect. If the first statement does not begin with the keyword
`package', then there are no imports; otherwise, the first statement not
beginning with `import' excluding the optional `package' statement is
the end of imports.
In short, this can only be done with real lexers and parsers that are
more tolerant of valid input Java programs.
Note on final applications: I want to write a tool that will
determine from source only what classes the current source file depend
on. After a little more processing the final output is a DAG
representing the order stuff would need to be compiled in for a non-
JIT compiler.
I am willing to bet that there are open-source Java dependency analysis
programs already existing that you could use. It is also likely that
your problem state is not sufficient for this task: you need to generate
the fully-qualified class name for each used class. Bytecode analysis is
much easier at handling this, but it requires the compiled code to work
with.
--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth