Grammar 4: Statements & Function Calls
February 14th, 2007This is part of a series of articles that started with Test-Driving a Compiler.
Grammar 3 introduced function definitions. Since then, I’ve introduced some unary operators, statements, and function call expressions. First, I added the unary-minus and unary-not operators with a higher priority than multiplication, using tests to confirm the basic parsing routines as well as the relative precedence compared to multiplication/division and parenthetical expressions:
MultiplicativeExpression := UnaryExpression (('*' / '/') UnaryExpression)*
UnaryExpression := ('-' / '!')? Value
Next, I introduced function calls. I expect to have higher-order functions, so function calls can be made against an identifier, as well as parenthetical expressions:
FunctionName(arg1, arg2);
(SomeExpressionThatIsCallable)(arg1, arg2);
To deal with the complexity of the grammar for these cases, I took advantage of the fact that PEG grammars have ordered sub-rules that allow backtracking, as I already discussed for Grammar 2:
Value := boolean-literal
/ integer-literal
/ '(' Expression ')' '(' ArgumentList ')'
/ '(' Expression ')'
/ identifier '(' ArgumentList ')'
/ identifier
Function definitions are already set up for a special shorthand: whenever the entire body is an expression whose value should be returned, curly-braces and an explicit ‘return’ can be omitted. Now, I added the more common scenario for multi-statement function bodies.
Function := identifier '(' ParameterList ')' &'{' CompoundStatement
/ identifier '(' ParameterList ')' Expression
Lastly, I introduced the usual statements available in imperative languages. Here’s the complete grammar to date:
Program := Function*
Function := identifier '(' ParameterList ')' &'{' CompoundStatement
/ identifier '(' ParameterList ')' Expression
ParameterList := &')' / Parameter (',' Parameter)*
Parameter := identifier identifier?
Statement := &'{' CompoundStatement
/ &'if' IfStatement
/ &'for' ForStatement
/ &'while' WhileStatement
/ &'return' ReturnStatement
/ &'yield' YieldStatement
/ identifier '=' Expression ';'
/ Expression ';'
CompoundStatement := '{' Statement* '}'
IfStatement := 'if' '(' Expression ')' Statement ('else' Statement)?
ForStatement := 'for' '(' identifier 'in' Expression ')' Statement
WhileStatement := 'while' '(' Expression ')' Statement
ReturnStatement := 'return' (Expression)? ';'
YieldStatement := 'yield' Expression ';'
Expression := IfExpression
IfExpression := 'if' '(' Expression ')' Expression 'else' Expression
/ OrExpression
OrExpression := AndExpression ('||' AndExpression)*
AndExpression := EqualityExpression ('&&' EqualityExpression)*
EqualityExpression := RelationalExpression (('==' / '!=') RelationalExpression)*
RelationalExpression := AdditiveExpression (('<' / '<=' / '>' / '>=') AdditiveExpression)*
AdditiveExpression := MultiplicativeExpression (('+' / '-') MultiplicativeExpression)*
MultiplicativeExpression := UnaryExpression (('*' / '/') UnaryExpression)*
UnaryExpression := ('-' / '!')? Value
Value := boolean-literal
/ integer-literal
/ '(' Expression ')' '(' ArgumentList ')'
/ '(' Expression ')'
/ identifier '(' ArgumentList ')'
/ identifier
ArgumentList := &')' / Expression (',' Expression)*
I’ve noticed a pattern while developing each of these changes to the grammar. Usually, when I decide on a single rule change or rule addition, that change suggests a change to the tokenizer, such as recognizing a new keyword or symbol. The new rule often suggests a new AST node type as well. So, I have been tending toward the following cycle:
- Write a test for the new tokens to be recognized.
- Update the tokenizer to pass the test (this has become a smaller and smaller task and is now trivial).
- Write a stub implemention of the new AST node type, as well as a test for the new class, asserting that ToString yields an expected S-Expression. The stub ToString returns null so that the test initially fails.
- Flesh out the AST node type to pass the test.
- Write a test for the Recursive Descent Parser class, for the new grammar rule.
- Change the RDP to match the grammar and pass the test.
For nontrivial rules, or rules that need to be composed with other rules, I’ll repeat the last two steps until I’m satisfied that all the relevant variations have been tested.