Grammar 3: Function Definitons

February 7th, 2007

This is part of a series of articles that started with Test-Driving a Compiler.

Grammar 3 introduces functions, identifiers, and programs composed of multiple function definitions. Here’s the grammar, with changes from Grammar 2 in bold:

Program := Function*
Function := identifier '(' ParameterList ')' Expression
ParameterList := &')'
                  / Parameter (',' Parameter)*
Parameter := identifier identifier?
Expression := IfExpression
IfExpression := 'if' Expression Expression 'else' Expression
              / 'if' Expression Expression
              / OrExpression
OrExpression := AndExpression ('||' AndExpression)*
AndExpression := EqualityExpression ('&&' EqualityExpression)*
EqualityExpression := RelationalExpression (('==' / '!=') RelationalExpression)*
RelationalExpression := AdditiveExpression (('<' / '<=' / '>' / '>=') AdditiveExpression)*
AdditiveExpression := MultiplicativeExpression (('+' / '-') MultiplicativeExpression)*
MultiplicativeExpression := Value (('*' / '/') Value)*
Value := boolean-literal / integer-literal / identifier / '(' Expression ')'

In English (more or less), this says that a program is zero or more functions. A function is an identifier, a possibly-empty parameter list, and a body composed of a single expression. Items in the parameter list are each specified by one or two identifiers (a name, or a name with an explicit type identifier).

My intention is to use type inference with optional explicit type declarations, so that’s why parameters don’t require the explicit type identifier. Also, since the entire function body is just one expression, I don’t need curly braces { } to mark the start and end, and I don’t need an explicit ‘return’ keyword. Multi-statement functions with curly braces and return statements will likely be included later, but I like having a shorthand for one-liner functions.

In the ParameterList grammar rule, I start by testing &')'. This is PEG notation for peeking ahead. &')' will match when the next token is a ‘)’, and will fail otherwise. Upon failure, we try the next part of the rule. Whether the test for ‘)’ passes or fails, none of the input is actually consumed by the test. So, when a parameter list is empty, this lets us avoid looking for any parameters while protecting the fact that the Function rule will expect to consume ‘)’. As I’ve stated before, I have one parsing method per grammar rule. In the implementation of ParameterList, I perform the &')' part by calling the Peek method, which returns true if the next token matches the given string, without consuming any of the input:

// ParameterList := &')' / Parameter (',' Parameter)*
private ParameterListNode ParameterList()
{
	List<ParameterNode> parameters = new List<ParameterNode>();

	if (_tokenizer.Peek(")"))
		return new ParameterListNode(parameters);

	parameters.Add(Parameter());
	while (_tokenizer.TryConsume(","))
		parameters.Add(Parameter());

	return new ParameterListNode(parameters);
}

Filed under: Language Design

Powered by WordPress