Introduction & OverviewThe
weka.core.expressionlanguagepackage provides functionality to easily create simple languages. It does so through creating an AST (abstract syntax tree) that can then be evaluated. At the heart of the AST is the
Nodeinterface. It's an empty interface to mark types to be an AST node. Thus there are no real constraints on AST nodes so that they have as much freedom as possible to reflect abstractions of programs. To give a common base to build upon the
Primitivesclass provides the subinterfaces for the primitive boolean (
Primitives.BooleanExpression), double (
Primitives.DoubleExpression) and String (
Primitives.StringExpression) types. It furthermore provides implementations of constants and variables of those types. Most extensibility is achieved through adding macros to a language. Macros allow for powerful meta-programming since they directly work with AST nodes. The
Macrointerface defines what a macro looks like. Variable and macro lookup is done through
MacroDeclarationsresp. Furthermore, both can be combined through
MacroDeclarationsCompositorresp. This really allows to add built-in variables and powerful built-in functions to a language. Useful implementations are:
weka.core.expressionlanguage.parserpackage. However the framework allows for other means to construct an AST if needed. Built-in operators like (+, -, *, / etc) are a special case, since they can be seen as macros, however they are strongly connected to the parser too. To separate the parser and these special macros there is the
Operatorsclass which can be used by the parser to delegate operator semantics elsewhere.
A word on parsersCurrently the parser is generated through the CUP parser generator and jflex lexer generator. While parser generators are powerful tools they suffer from some unfortunate drawbacks:
- The parsers are generated. So there is an additional indirection between the grammar file (used for parser generation) and the generated code.
- The grammar files usually have their own syntax which may be quite different from the programming language otherwise used in a project.
- In more complex grammars it's easy to introduce ambiguities and unwanted valid syntax.
SummaryA flexible AST structure is given by the
Macrointerface allows for powerful meta-programming which is an important part of the extensibility features. The
Primitivesclass gives a good basis for the primitive boolean, double & String types. The parser is responsible for building up the AST structure. It delegates operator semantics to
Operators. Symbol lookup is done through the
MacroDeclarationsinterfaces which can be combined with the
UsageWith the described framework it's possible to create languages in a declarative way. Examples can be found in
SubsetByExpression. A commonly used language is:
// exposes instance values and 'ismissing' macro
InstancesHelper instancesHelper = new InstancesHelper(dataset);
// creates the AST
Node node = Parser.parse(
expression, // textual representation of the program
// type checking is neccessary, but allows for greater flexibility
if (!(node instanceof DoubleExpression))
throw new Exception("Expression must be of boolean type!");
DoubleExpression program = (DoubleExpression) node;
HistoryPreviously there were three very similar languages in the
weka.core.AttributeExpressionclass and the
weka.filters.unsupervised.instance.subsetbyexpressionpackage. Due to their similarities it was decided to unify them into one expressionlanguage. However backwards compatibility was an important goal, that's why there are some quite redundant parts in the language (e.g. both 'and' and '&' are operators for logical and).
- $Revision: 1000 $
- Benjamin Weber ( benweber at student dot ethz dot ch )