|
SYS-CON.TV Webcasts
Comments
Did you read today's front page stories & breaking news?
SYS-CON.TV
|
Top Links You Must Click On
General Java Design Parameters in a Java Interpreter
Design Parameters in a Java Interpreter
By: Gene Callahan; Brian Clark
Jan. 1, 1999 12:00 AM
This article describes our use of design patterns to create an interpreter in Java, and shows how it can be built in a "pure," object-oriented fashion. .The patterns we use are from Design Patterns: Elements of Reusable Object-Oriented Software by Gamma, Helm, Johnson and Vlissides, published by Addison Wesley in 1995. (We'll refer to this book henceforth as DP.)
What Did We Build? The project came into existence while we were working with Dr. Jo Anne Parikh of the Southern Connecticut State University Computer Science Department. The interpreter was originally written in C++ and later ported to Java. We found that the port was not difficult, and that for our purposes Java had several key advantages over C++. It made garbage collection a breeze, exception handling more elegant and allowed us to easily add features to the language, such as fetching the contents of a URL and creating objects from the Java AWT. And of course, its being able to run as an applet from a Web browser handled the problem of distribution. We plan to take further advantage of Java's built-in networking capabilities and make more AWT features available from Scheme.
The Patterns We extended the Interpreter pattern to include interpreter input, borrowing an idea from Bjarne Stroustrup's The C++ Programming Language (Addison Wesley 1993). Instead of the core logic being in a large state machine, as in a procedural interpreter, it is dispersed through a hierarchy of classes. Objects in HotScheme know how to construct themselves from an input stream, assemble themselves into a parse tree and return the value they represent. There is a trade-off here: a more traditional interpreter, like those generated by the combination of lex and yacc, will run faster, but they may be a little harder to understand. It was for that reason that we chose simplicity and clarity over performance - a pretty reasonable trade-off given that we viewed this as a tool for learning and experimentation, rather than for the purpose of creating production systems. We modeled Scheme data types directly as Java classes. For a Lisp dialect, capturing the data types as classes means capturing the whole language, as Lisp programs are Lisp data. This simplifies understanding the interpreter; its structure reflects the definition of Scheme, and can be understood by reading a Scheme manual. Lisp interpreters operate in an endless read-eval-print loop by first getting input from the user, evaluating it and then returning the result of the evaluation. We discuss the read phase of this loop in the Abstract Factory section below. Evaluating the parse tree is simply a matter of calling Eval() on the SchemeObject returned by the read phase. Leaf nodes in the tree (Scheme atoms, such as numbers, strings and symbols) return themselves as their value or, in the case of symbols, return the object they label. Calling Eval() on high-level nodes will trigger a recursive evaluation of all nodes lower in the tree, yielding a single return value - of a type descended from SchemeObject - for that portion of the tree. Similarly, this result is printed by calling Print() on the object returned by Eval(). If this object is a composite, it will call Print() on its components. This makes the top-level code so simple that the body of this loop, the heart of the interpreter, is a single statement:
// term is the current terminal, global_env the lexical environment: The meat of the evaluation phase is in the List.Eval()method (see Listing 1). This is because the most fundamental Scheme activity is function application. When a list is evaluated, its default behavior is to treat the first item in the list (the car of the list) as a function to be applied to the rest of the list (the cdr) - the arguments to the function. As Java lacks the varargs feature present in C and C++, having a list data structure directly available in Java greatly simplified writing Java functions that handle the variable-length argument lists required by Scheme. Every Java method that implements a Scheme function takes a list of SchemeObject elements as its single argument, and pulls apart its "actual" arguments itself. It's important to note that List.Eval() calls Evargs() (evaluate arguments) on the rest of the list before passing these arguments to the function. (We use an auxiliary function, Evargs(), rather than Eval() itself, because a second call to Eval() would treat the first member of the rest of the list as a function - not at all what we want!) For this reason certain Scheme forms, called syntax forms, can't be handled by an ordinary function application. Consider the case of an if statement. The ANSI Scheme specification for if says that it evaluates its first argument and, if true, returns its second argument; otherwise it returns its third. But the specification also states that whichever branch is not returned must not be evaluated. If we used an ordinary function application to evaluate if, it would be too late! We'd have evaluated the arguments before even calling if. Thus these syntax forms are handled by special cases in List.Eval(). The objects created to represent these forms in the parse tree are all descendants of the HotScheme class SyntacticalForm, in keeping with our use of classes to capture grammar.
Composite: Building the Parse Tree Lisp programs are built from Lisp's main compositional structure: the list. The fact that a Lisp program processes lists and is also made up of lists lends an elegant simplicity to a well-built Lisp interpreter. Lists are a basic data type, and can generally appear in the same places as atoms such as integers and strings. This allows us to use a Composite to represent the parse tree. We represent lists as SchemeObjects that hold references to other SchemeObjects, which can themselves be lists. A client calling an object's Eval() or Print() method need not worry about whether it is dealing with an atom such as an integer or a composite such as a list - the call is made the same way by the client, with the composite recursively passing the call on to its components, if necessary. For an example of how Composite simplifies handling aggregate entities, see the implementation of List.Print() in Listing 2.
Abstract Factory: A Common Ancestor Creates All Scheme
Data Types The pattern DP refers to as Abstract Factory is also described in James Coplien's Advanced C++ (Addison Wesley 1992), where it is called the Exemplar idiom. An Abstract Factory allows clients to create subclasses of a class without specifying which subclass to create. To achieve this, SchemeObject itself determines which Scheme data type we are reading. When its static make() pseudo-constructor passes a reference to a terminal for input and to an environment for interpretation, it will return a (properly subclassed) reference to whatever object type it finds waiting for it on the input stream (see Listing 3). SchemeObject asks the lexical analyzer for the next token. It looks at the type of token it receives to see which of its subclasses to instantiate. This can be done recursively so that when we find a composite object like a list on the stream, the object returned will have constructed the elements of the composite and will be holding references to them. Abstract Factory posits that clients will deal only with the abstract interface provided by the factory class, and not call subclass-specific methods. The SchemeObject is this abstract interface in HotScheme, and is the base class for all concrete Scheme data objects. This is important, because many Scheme functions, such as predicates like list? and number?, can operate on any type of Scheme object. Also, Scheme lists and vectors are heterogeneous collections, and require a common base type to hold references to. Because of this level of abstraction, clients don't need to know about new Scheme data types as we add them to the system. Having SchemeObject as an abstract base class also helped when it came to error handling. Because Scheme is not strongly typed, any function might pass any data type for any of its arguments. However, this doesn't mean that every function can handle any data type! Many combinations should produce a runtime error. For example, the Scheme command first makes sense only when its first argument is a list. It's meaningless to ask for the "first element" of an integer. HotScheme handles this by throwing exceptions in the base SchemeObject class for most methods. A call to SchemeObject.first(), by default, throws an exception that states that the object the method was called on is not a list. We overrode that method only in the List class. This eliminates the need to scatter type-checking code throughout the system - in this case a big gain in both simplicity and performance.
Command: Scheme Built-in Commands as "Functors"
FaÇade: Our Terminal Interface In addition to our GUI version we've implemented a version for a Java character terminal, and it would be trivial to make a version that, for instance, interpreted code coming in on a socket. Instead of having the interpreter attempt to deal directly with character terminal, AWT, Swing, socket, Accessibility and other interfaces, the LispTerminal class presents a few abstract operations - like reading and printing - that the interpreter needs. The interpreter is passed an object that is a descendant of LispTerminal, to which it will direct I/O requests. Thus it is the creator of the interpreter instance, not the interpreter itself, that decides how I/O will be performed. For our character terminal version, input is read straight from the terminal. GUILispTerminal buffers keyboard input from HotScheme's input field and sends it all to the interpreter once the "Evaluate" button is clicked. To perform output, the interpreter calls the Print() method of the terminal passed to it, and the different terminal types output the text correctly. LispTerminal also provides a pushback buffer when the tokenizer has to read "too far" to tell when it has completely captured a token. (For example, a tokenizer can't tell that it's done reading a number until it reads the first character that is not a digit - one too many! The tokenizer needs to "push back" this extra character onto the input stream so that the next call for a token will read it.) Since lower-level I/O objects may not provide this capability, the Faade abstraction again simplifies our interfaces. See Listing 5 for the definition of LispTerminal.
How Did Patterns Help? The use of patterns is also crucial in communicating the design. In our discussions it was immensely helpful to be able to give names to the ideas shaping our work. To say "We'll employ an Abstract Factory to create Scheme types" captured a large piece of design in a simple, succinct statement. Finally, by densely combining patterns in a small design space, we began to glimpse the poetic quality that Christopher Alexander, in A Pattern Language (Oxford University Press 1977), asserts this "compression" of patterns can produce. As we wove these patterns into our design, the program began to surprise even its authors in the way new features effortlessly emerged from the structures we had already created. And this sense of adventure and elegance is what can make our profession a fulfilling one to pursue. We welcome communication from anyone interested in contributing to this project, and from any computer science departments or other educators who would like to deploy HotScheme at their institution.
Resources - URLs Reader Feedback: Page 1 of 1
Enterprise Open Source Magazine Latest Stories . . .
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||