Formalities
March 19th, 2007I’ve added a new article on the Formalities page. It’s a rewrite of some things I wrote shortly after college. It’s a tad long, don’t say I didn’t warn you.
I’ve added a new article on the Formalities page. It’s a rewrite of some things I wrote shortly after college. It’s a tad long, don’t say I didn’t warn you.
When I started writing Rook in January, I had a general idea of how things would work. I’d start with a grammar, then I’d write a recursive descent parser to turn the input source code into an abstract syntax tree. These steps aren’t trivial, but they’re also not as tough as they sound. It’s easy to find examples of these techniques.
A few weeks back I was armed with this totally awesome syntax tree, and didn’t quite know what to do with it next. I knew it was the right tool for the job, but I wasn’t 100% sure what that job was. The trouble is, this is the point where most of my sources just plain had nothing to say about the next step. What I wanted to be able to do was take a reference to the root expression in my syntax tree and ask it, “What is your value, after you evaluate your own subexpressions?”
This is a tricky question to answer, since some expressions introduce new scope, and some variables in one scope may “shadow” variables in the parent scope. A special data structure called a Spaghetti Stack helps us keep track of all these variations in scope. I’ll get to the specifics of spaghetti stacks in a moment, but the bottom line is that this data structure lets you keep track of which variables are in scope at any particular point in the input source code. My particular spaghetti stack also stores the runtime value associated with each identifier in each scope.
Note to self: Data structures named after food are more likely to be remembered. This fact has been demonstrated time and time again: see Spaghetti Stacks & Hash Tables. Commence work on Hamburger Queue immediately.
A spaghetti stack is really just a plain old tree structure, but there are two special details worth noting. First, all of the references point “up” toward the root rather than the other way around. Second, all of these references are immutable; once a node is created it will always point at the same parent, until it gets garbage collected. The nifty result of these restrictions is that from the point of view of any one node, that node is the top of a singly-linked stack. For instance, the colored nodes in the example do form a stack, with the “top” of the stack located at the bottom of the tree. Nodes have no sense of gravity, so it doesn’t matter that it’s upside down: it’s a stack. To sum up, we have a collection of stacks that are sharing nodes whenever they can, and every node thinks that it is at the top of the stack.
In this diagram, we see the state of a spaghetti stack after evaluation of a program has started. The root node represents the global namespace, and contains things like primitive operator definitions (Rook treats operators as functions, so the operator symbols are just identifiers). Node 1 represents a new scope that introduces three local variables, such as a function with 3 parameters. Node 3 represents a scope that shadows the “x” in the parent scope, but can see the parent scope’s “y” unhindered. Some expressions that can provide new items in the new scope don’t always need to, such as a function with zero parameters, and this case results in uninteresting nodes like node 6. All in all, this is just a tree representation of namespaces, and we probably already build these in our heads while we read code, so it isn’t really anything new.
In Rook, each node in the spaghetti stack is an instance of a class named Environment, which contains a name/object dictionary and a reference to the parent Environment. When lookup attempts fail, we defer to the parent before deciding whether the identifier exists:
public object this[string identifier]
{
get
{
if (_locals.ContainsKey(identifier))
return _locals[identifier];
if (_parent != null)
return _parent[identifier];
throw new UndefinedIdentifierException(identifier);
}
set
{
if (_locals.ContainsKey(identifier))
throw new DuplicateIdentifierException(identifier);
_locals[identifier] = value;
}
}
Each node in the accompanying abstract syntax tree is an Expression whose value depends on an Environment instance:
public interface Expression
{
object Value(Environment environment);
}
To run a Rook program, we just pass the root environment to the Value method of the root expression. We let the Value method “push” a new Environment whenever the expression node introduces new scope. Simple expression types like integer and boolean literals just ignore the Environment, since they can determine their own values without it:
//IntegerLiteral Expressions
//Example: 1234
public override object Value(Environment environment)
{
return _value;
}
For an Identifier expression, the value of the expression is the value found by looking up the identifier in the Environment’s dictionary:
//Identifier Expressions:
//Example: x
public override object Value(Environment environment)
{
return environment[_identifier];
}
For If expressions, we also don’t need to introduce new scope. Rather, we just pass the given Environment along to each subexpression:
//If Expressions:
//Example: if (x>1) y else z
public override object Value(Environment environment)
{
if ((bool)_condition.Value(environment))
return _whenTrue.Value(environment);
if (_whenFalse == null)
return null;
return _whenFalse.Value(environment);
}
Compound expressions introduce new scope, so they “push” a new Environment onto the stack, add the identifiers for each new variable binding, and then use that new Environment to evaluate subexpressions:
//Compound Expressions:
//Example: { let a = 5; def square(x) x*x; square(a); }
public override object Value(Environment environment)
{
Environment withBindings = new Environment(environment);
foreach (Binding binding in _bindings)
withBindings[binding.Identifier] = binding.Value(withBindings);
object result = null;
foreach (Expression expression in _expressions)
result = expression.Value(withBindings);
return result;
}
We never explicitly pop from one of these stacks. Each implementation of Value may or may not push a new Environment onto a stack, and will use that Environment, but the caller never gains a direct reference to it. We let the garbage collector do our “pops” for us. In fact, there are cases where we will deliberately preserve some references to these otherwise-temporary Environments, preventing them from being garbage collected. This makes implementing closures painfully simple, but that is a discussion for another day:
//Lambda Expressions:
//Example: fn(x, y) x*y
public override object Value(Environment environment)
{
return new Closure(this, environment);
}
For the past few weeks, I’ve been documenting my progress on the development of a new programming language. It is mostly an experiment to see whether or not I could do it, rather than an attempt to actually gain widespread use. Also, it’s giving me an opportunity to learn more about how functional languages work. In fact, I’m not even releasing the source code yet. I think it’ll be worth releasing after a few more key features are implemented. Still, I’ve made a lot of progress in the past week, and it’s worth presenting the first draft of the language’s documentation as well as some examples.
It’s called Rook. If Guido van Rossum can name a language after his favorite comedy troupe, I can name mine after my favorite member of the passerine order of birds.
In this post I’ll just dive right in with some examples. If something seems fishy, I’d recommend skimming the documentation linked to above. The bottom line is that it behaves a lot like a subset of Scheme, but its syntax is supposed to be more familiar to people who use C-derived languages. I’ve even got a little editor with an interactive REPL based loosely on DrScheme.
In the example screenshot, I start by defining a recursive function (factorial) as well as two mutually-recursive functions (even and odd). The ‘def’ expressions define functions. Rook is dynamically typed, so if we tried to pass a boolean value to the factorial function, it wouldn’t let you know something was wrong until it actually tried to evaluate the function body. Also, the identifiers inside method bodies aren’t checked until they are actually used, so we can define even(n) before odd(n) actually exists, and all is well. By being lazy about these checks, recursion works with no need for special-case code in the interpreter.
Next, the screenshot shows a function named apply, which takes in a function and two additional arguments, and calls that function with those arguments. first(a,b) and second(a,b) return the respective argument, ignoring the other one. In the REPL, you can see that apply(first, 1, 2) evaluates to 1 and apply(second, 1, 2) evaluates to 2. Also, note that we can treat operators as identifiers, passing in the plus operator so that apply((+), 1, 2) evaluates to 3.
The more exhaustive documentation linked to above goes into more about scope, nested scope, local variables, closures, and the like, but more examples won’t be very interesting until I have support for some collection data types like linked lists.
Powered by WordPress