Rook
March 5th, 2007Introduction
Rook is an interpreted, statically typed functional language with semantics similar to ML & Scheme. The language is in its infancy, so the available data types and built-in operations are limited.
Rook is not yet publicly available. Although everything outlined in this document is already implemented, there are a few additional features I intend to include before it would be worthwhile to play with. This document covers the language in its current state, and includes a listing of upcoming additions at the end.
In the code examples in this document, the symbol ‘=>’ occasionally appears in order to show the value of an expression. ‘=>’ is not a part of the language, nor are the values and comments that appear on the right hand side of that symbol. For example, to say that the expression 1+2 evaluates to the integer 3, this document would show the example as:
1+2 => 3
Syntax
The formal grammar of Rook is expressed with the PEG notation. Nonterminals appear as CamelCased names, and terminals appear as single-quoted strings or as lower-case names.
Keywords appear as terminals in the grammar: true, false, if, else, let, fn, def.
The terminal identifier is any alphanumeric sequence of characters whose first character is alphabetic, excluding keywords, as well as any of the following parenthesized operators: (*) (/) (+) (-) (<=) (<) (>=) (>) (==) (!=) (!). For instance, the token ‘!’ is not an identifier, but the token ‘(!)’ is an identifier.
The terminal number-literal includes any continuous series of digits 0 through 9. The grammar assumes that whitespace is automatically consumed between tokens.
Rook Grammar
Program := (’def’ identifier ‘(’ ParameterList ‘)’ Expression ‘;’)* Expression ‘;’
Expression :=
‘{’ Let ‘}’
/ ‘if’ ‘(’ Expression ‘)’ Expression ‘else’ Expression
/ ‘fn’ ‘(’ ParameterList ‘)’ Expression
/ Or
Let := (’let’ identifier ‘=’ Expression ‘;’)* Expression ‘;’
ParameterList := &’)’ / identifier (’,’ identifier)*
Or := And (’||’ And)*
And := Equality (’&&’ Equality)*
Equality := Relational ((’==’ / ‘!=’) Relational)*
Relational := Additive ((’<’ / ‘<=’ / ‘>’ / ‘>=’) Additive)*
Additive := Multiplicative ((’+’ / ‘-’) Multiplicative)*
Multiplicative := Unary ((’*’ / ‘/’) Unary)*
Unary := (’-’ / ‘!’)? Value
Value := ‘true’ / ‘false’
/ number-literal
/ ‘(’ Expression ‘)’ (’(’ ArgumentList ‘)’)+
/ ‘(’ Expression ‘)’
/ identifier (’(’ ArgumentList ‘)’)+
/ identifier
ArgumentList := &’)’ / Expression (’,’ Expression)*
Semantics
In Rook, everything is an expression. Everything, including compound expressions spanning multiple lines, has a value. Using a type inference algorithm similar to the core of ML, all identifiers have statically-determined data types. However, these statically-determined data types take part in parametric polymorphism. In other words, we can statically determine whether an entire program makes valid use of types, but we can also write functions that work against multiple types. An example of this is presented later in this document (see the identity function). Types include booleans (true, false), numbers (integers on the range -2,147,483,648 through 2,147,483,647), and function types. Function types include information about the type of each argument as well as the return type. A function that accepts two numbers and returns a boolean has the type (Number, Number) => Boolean.
Operators
Several operators are available by default. The && and || operators (’and’ and ‘or’) perform short-circuiting logic: the right hand side is evaluated only when the left hand side’s value demands it. The following table lists all of the operators in order from highest-precedence to lowest:
| ( ) | Grouping and Function Calls |
| ! - | Unary Not and Negation |
| * / | Multiplicative |
| + - | Additive |
| < <= > >= | Relational |
| == != | Equality |
| && | Logical And |
| || | Logical Or |
If Expressions
If expressions consist of the keyword ‘if’, a boolean condition, an expression to evaluate when the condition is true, and an ‘else’ expression to evaluate when the condition is false. The value of the if expression is the value of the executed subexpression.
if (true) 1 else 0 => 1
if (false) 1 else 0 => 0
if (1==0) 1 else 0=> 0
Anonymous Functions
Also know as Lambda Expressions, anonymous function expressions can be declared with the keyword ‘fn’, a possibly-empty list of parameter names, and a body expression. When called, argument values are bound to the parameters in order, and the body expression is evaluated. The value of a function call is the value of the function’s body expression. In order to declare a function and call it in the same expression, the function definition must be surrounded in parentheses.
Function parameters are bound to whatever values are passed in. Parameters shadow any variables in the surrounding scope. Functions are first-class objects in Rook, so they can be returned as values from other functions, passed as arguments to other functions, etc. When the value of a call is also a function, that result can also be called in the same expression.
(fn () 5)() => 5
(fn (x) x*x)(2) => 4
(fn (x) x*x)(true) => Static typing error, * expects two numbers.
(fn (x, y) x==y)(5, 5) => true
(fn (x, y) (fn (x) x*y))(1, 2)(3) => 6
Compound Expressions
Similar to the let* special form in Scheme, compound expressions in Rook allow for a series of local variables to be declared and given values within a new scope, so that another expression can be evaluated in terms of those variables. In other words, braces { } allow you to introduce short-lived variables that can be used throughout the body expression found at the end of the block. Variables declared within this block temporarily shadow any variables in the outer scope that happen to have the same name.
Each variable declaration, called a binding, must include a value and must be terminated by a semicolon. Bindings are introduced with the ‘let’ keyword.
A compound expression can have zero or more bindings followed by one body expression. Body expressions are terminated by a semicolon too. Bindings are evaluated in order, affecting the local scope sequentially. The same variable cannot be bound twice inside the same compound expression. Once it has a value, it retains that value for the remainder of the block.
The value of a compound statement is the value of the terminating body expression. Compound expressions can be found anywhere that any expression can be found, so they can be nested, composed with if-expressions, form the bodies of functions, etc.
{
let x = 1;
{
let x = 2;
let y = x;
let square = fn(x) x*x;
square(y);
};
}
=> 4
Although the ‘fn’ syntax for declaring small functions inline can be convenient, it becomes a little awkward when defining a program with many functions. A Rook program is basically one large compound statement, but with a few key syntactic differences. A Rook program is a series of function definitions, introduced with the ‘def’ keyword, followed by an expression. The def keyword provides a shorthand for “let x = fn(…) …;” and is only allowed at the outermost scope. Each of these functions may or may not contain regular compound expressions (using the let syntax), and the program’s body expression may be a compound statement itself. Another difference between compound statements with ‘let’ and programs with ‘def’ is that identifiers defined with ‘def’ are allowed to be mutually recursive, while identifiers introduced with ‘let’ cannot be recursive.
When a program is executed inside the Rook editor (a simple IDE of sorts), additional expressions can be evaluated at the interactive REPL, and these expressions are considered to be a part of the same scope as the program’s function definitions. In other words, within the Rook editor, a program might contain only function definitions. After ‘running’ that program, the user could then evaluate calls to those functions on the REPL interface. If programs were exactly like compound statements, these function definitions would no longer exist by the time the user tried to call them from the REPL.
Recursion
The type inference algorithm is able to recognize the type of recursive and mutually recursive functions. Consider the following examples of both simple recursion and mutual recursion:
def factorial(n)
if (n==0)
1
else
n * factorial(n-1);
def even(n)
if (n==0)
true
else
odd(n-1);
def odd(n)
if (n==0)
false
else
even(n-1);
Rook correctly identifies the type of factorial to be (Number) => Number, and identifies both even and odd to be of type (Number) => Boolean. Note that we gain most of the benefits of static typing with the brevity of dynamic typing.
Closure
As mentioned above, the value of a function call is the value of that function’s body. In fact, the value of a function call is the value of that function’s body, using the environment that was in effect at the moment the function definition was encountered. In other words, the value of a lambda expression itself is actually a closure, or a pairing of the code for the function along with the environment in effect at the time the definition is evaluated. Then, when the definition/environment pair is called, the function body is evaluated with the more relevant scope, rather than the scope in which the call is made. For example, the value of the following program is 10, not 1007.
def main()
{
let x = 3;
let addX = fn (y) x+y;
{
let x = 1000;
addX(7);
};
};
main();
=> 10
Operator/Function Duality
Several of the built-in operators can be optionally treated as functions. To treat an operator symbol as a function identifier, simply surround it with parentheses. This notation is similar to Haskell. The operators which take part in this feature are: *, /, +, -, <=, <, >=, >, ==, !=, and !. Note that && and || cannot be treated as functions, because of their short-circuit behavior.
def apply(func, x, y)
func(x, y);
def first(x, y)
x;
def second(x, y)
y;
def identity(x)
x;
apply(first, 1, 2) => 1
apply(second, 1, 2) => 2
apply(+, 1, 2) => Error: "Invalid token near '+'; expected identifier."
apply((+), 1, 2) => 3
identity(1) => 1
identity(false) => false
identity((+)) => A Rook.Closure with type (Number, Number) => Number.
apply(first, identity(1), identity(2)) => 1
apply(second, identity(1), identity(true)) => true
apply((+), identity(1), identity(true)) => Error: Type mismatch between Number and Boolean.
When a function is written to work with multiple types, that function’s type uses lower-case type names as a standin for whatever concrete types will actually be sent to it. The types of the four functions declared above are:
apply: ((a, b) => c, a, b) => c
first: (a, b) => a
second: (a, b) => b
identity: (a) => a
Future Enhancements
Before Rook is made available to the public, I hope to implement the following enhancements:
- Function overloading. This may be based solely on the number of arguments, or may take into account optional parameter type declarations.
- Native string data types, with 2-byte characters as in .NET and Java.
- User-defined types, using a purely-functional variation on .NET/Java classes.
- Generators similar to Python and C#’s
yieldkeywords.