Re: multiple super types for generics
On Mar 26, 5:41 pm, "Aryeh M. Friedman" <Aryeh.Fried...@gmail.com>
wrote:
On Mar 26, 2:56 pm, Tom Hawtin <use...@tackline.plus.com> wrote:
Aryeh M. Friedman wrote:
now the question is how can I make it so sym[...] is T or State<T> the
only thing I can think of is something like the following syntex
(which I know is illegal):
public class State<T extends (T || State<T>)>
I'm not sure what you would do with such a thing. How would you do
anything to a reference to T, other than what you could do with an
Object? I think you want a little type hierarchy here so that you can
treat things uniformly:
class Terminal<T> extends State<T> {
Terminal(T t) {
...
class NonTerminal<T> extends State<T> {
NonTerminal(List<State<T>> symbols) {
...
You might also want to consider introducing a Builder.
Tom Hawtin
Thanks will look into using a Builder.... you almost nailed it but
something more like
public abstract class State<T>
{
public State()
{
symbols=new ArrayList<T>();
}
public void addSymbol(T sym) // this is = to State(T... syms)
I used in the orginal post but this is the actual implementation
{
states.add(t);
}
public abstract consume(List<T> in)
{
Queue<T> q=new
ArrayBackedBlockingQueue<T>(in.size(),true,in);
consume(q);
}
public void consume(T sym, Queue<T> q)
{
// this where I need to detect if it is T or State<T>
// psedo code
if symbols.get(sym) instanceof State<T>
symbols.get(sym).consume(q)
else
q.poll()
}
}
public class SingleState<T> extends State<T>
{
public void consume(Queue<T> q)
{
if(symbols.contains(q.peek())
consume(q.peek(),q);
}
}
I also subclass State to be RepeatState (consume until a symbol not in
symbols is encountered), SequenceState (consume N symbols only if they
are in a certain order... for example if we do:
SequenceState<Character> state=new SequenceState<Character>();
state.addSymbol('a')
state.addSymbol('b')
will accept input *ONLY* if the next two chars in the input is
"ab" (but not "ba" unlike RepeatState would))
and the final subclass is TerminalState where if the input is one of
the symbols then it returns the output state (i.e. the ctor is
TerminalState(State<T> outState) and if the next input
sym is in th alphabet then getState() returns outState else it returns
null).
Side question as far I can tell this plus some form output sink (which
the real code does have) is enough to parse any formal grammer (if a
state needs memory then it done by pushing back onto the input queue)
[i.e. I think this is all that is needed to make a UTM]
--Aryeh
I just looked at my copy of GoG and I do not see how a builder solves
the problem since I would still need to do the instanceof in the inner
State.consume(T sym,Queue<T> q). Besides if I am right that all the
states I listed+sink+mem via the queue is sufficient to make a UTM I
do not see any need for sub classes. Namely any subclasses would
actually be tokenizers (i.e. define some composite state machine
[normally a SequenceState where the members of the sequence are
various states]) and then pass the raw output to it (allow the output
sink to collect the token sequence). For example for the sake of
arguement I can define Java (at least at the top level)
class:package,(import),classProper;
package: "package", dotIdentifier
dorIdentifier: identifier,('.',idenetifier)*;
etc. (am I sorry if I am using adhoc EBNF but I don't know EBNF well
enough to do it from memory)
which means I do something like this in the tokenizer for class:
pubklic ClassMachine extends StateMachine
{
public classMachine()
{
machine=new SequenceMachine<Character>(); // T =
Character since that denotes the input and output sink it will create
machine.addSymbol(new PackageMachine(),true); // 2nd arg
states is if is manidtory
machine.addSymbol(new ImportMachine(),false);
machine.addSymbol(new ClassProperMachine(),true);
}
}
and to parse any Java source code I do:
public class StateMachine
{
...
public void consume(List<Character> text)
throws BadStateException
{
machine.consume(text); // will through an
BadStateException on syntax error (i.e. we arrive at a null state
before the end of input)
}
}
and I just continue recursivelly to define the state machine until I
get to a machine that can understand indivual characters (such as
DigitMachine or LetterMachine for example)
I know if I am only dealing with one formal grammer it is easier to
hard code for it and in general ANTLR or something like it is used to
build parsers.
The "big picture" motivation of wanting to parse this way:
Both approachs have one problem they completely loose the power of OO
in parsing... for example 90% of C/C++/Java/C# are identical so
instead of reinventing the wheel why not have them
inherit some C family parser and then just customize a few state
machines for the particular lang (expression (ptr vs. ptr], method
def, keyword list, etc.). If you take this to the logical extreme the
C family parser inherits the Algol family parser which inherits the
formal grammer parser which inherits the ascii parser which inherits
the unicode parser (which for completeness inherits the null parser).
Thus you can build a single parser frame work for all parsable langs
(all non-natural ones for the most part). One intresting side effect
of this "super" parser is I can intermix langs in the same logical
unit (file) and if the concrete parser knows how handle a subset of
the langs it can ignore input it doesn't understand. Now if it
understands 100% of the langs used in a logical unit and has a shared
symbol table between them we get a compilor/interpreter front end that
allows you to use multiple langs in the same class and/or compilation
unit depending on "master" lang orientation. The other thing it
let's us do is if we have a system wide rule that says all data has to
be in some data rep lang like XML then the partial understanding
hinted above can be extented all the way to the end-user OS level
(i.e. redirection, network streams and pipes where not all the input
data need be parsable by the recieving process). This combined with
some other ideas (persistent objects instead of files and using what
Mozart [a strange little lang] calls delayed binding values to control
process scheduling) that are not related to this directly is the
organizing prinicable behind a research OS I am working on.
--Aryeh