Package weka.core.expressionlanguage
package weka.core.expressionlanguage
Package for a framework for simple, flexible and performant expression
languages
Introduction & Overview
Theweka.core.expressionlanguage
package provides functionality to
easily create simple languages.
It does so through creating an AST (abstract syntax tree) that can then be
evaluated.
At the heart of the AST is the Node
interface. It's an empty interface to mark types to be an AST node.
Thus there are no real constraints on AST nodes so that they have as much
freedom as possible to reflect abstractions of programs.
To give a common base to build upon the Primitives
class provides the subinterfaces for the primitive boolean
(Primitives.BooleanExpression
),
double (Primitives.DoubleExpression
)
and String (Primitives.StringExpression
)
types.
It furthermore provides implementations of constants and variables of those
types.
Most extensibility is achieved through adding macros to a language. Macros
allow for powerful meta-programming since they directly work with AST nodes.
The Macro
interface defines what a
macro looks like.
Variable and macro lookup is done through
VariableDeclarations
and
MacroDeclarations
resp. Furthermore,
both can be combined through
VariableDeclarationsCompositor
and MacroDeclarationsCompositor
resp.
This really allows to add built-in variables and powerful built-in functions
to a language.
Useful implementations are:
SimpleVariableDeclarations
MathFunctions
IfElseMacro
JavaMacro
NoVariables
NoMacros
InstancesHelper
StatsHelper
weka.core.expressionlanguage.parser
package.
However the framework allows for other means to construct an AST if needed.
Built-in operators like (+, -, *, / etc) are a special case, since they can
be seen as macros, however they are strongly connected to the parser too.
To separate the parser and these special macros there is the
Operators
class which can be used
by the parser to delegate operator semantics elsewhere.
A word on parsers
Currently the parser is generated through the CUP parser generator and jflex lexer generator. While parser generators are powerful tools they suffer from some unfortunate drawbacks:- The parsers are generated. So there is an additional indirection between the grammar file (used for parser generation) and the generated code.
- The grammar files usually have their own syntax which may be quite different from the programming language otherwise used in a project.
- In more complex grammars it's easy to introduce ambiguities and unwanted valid syntax.
Summary
A flexible AST structure is given by theNode
interface. The
Macro
interface allows for powerful
meta-programming which is an important part of the extensibility features. The
Primitives
class gives a good
basis for the primitive boolean, double & String types.
The parser is responsible for building up the AST structure. It delegates
operator semantics to Operators
.
Symbol lookup is done through the
VariableDeclarations
and
MacroDeclarations
interfaces which
can be combined with the
VariableDeclarationsCompositor
and MacroDeclarationsCompositor
classes resp.
Usage
With the described framework it's possible to create languages in a declarative way. Examples can be found inMathExpression
,
AddExpression
and
SubsetByExpression
.
A commonly used language is:
// exposes instance values and 'ismissing' macro
InstancesHelper instancesHelper = new InstancesHelper(dataset);
// creates the AST
Node node = Parser.parse(
// expression
expression, // textual representation of the program
// variables
instancesHelper,
// macros
new MacroDeclarationsCompositor(
instancesHelper,
new MathFunctions(),
new IfElseMacro(),
new JavaMacro()
)
);
// type checking is neccessary, but allows for greater flexibility
if (!(node instanceof DoubleExpression))
throw new Exception("Expression must be of boolean type!");
DoubleExpression program = (DoubleExpression) node;
History
Previously there were three very similar languages in theweka.core.mathematicalexpression
package,
weka.core.AttributeExpression
class and the
weka.filters.unsupervised.instance.subsetbyexpression
package.
Due to their similarities it was decided to unify them into one expressionlanguage.
However backwards compatibility was an important goal, that's why there are
some quite redundant parts in the language (e.g. both 'and' and '&' are operators
for logical and).- Version:
- $Revision: 1000 $
- Author:
- Benjamin Weber ( benweber at student dot ethz dot ch )