Test-Driving a Compiler
January 25th, 2007For some time now, I’ve been interested in making a programming language. Whenever I’ve attempted this in the past, I instantly ran into resistance because I was too concerned with making a Big Design Up Front. I would try to think about the entire language all at once, and put that into a grammar. Of course, that’s way too much to fit in your head all at once, so I would start with a grammar for an existing language, like Python or C#, and whittle away at it until I liked what I was looking at. Half-way through whittling, the house of cards that were my thoughts would come tumbling down.
At work, I’ve been Unit Testing a great deal for about a year now, but I have only been able to seriously write test-first code a fraction of the time. I’d like to attempt to create a programming language while strictly adhering to the TDD mantra “Red, Green, Refactor”, and I’ll be posting my progress here. This is a learning opportunity for me, so I’ll be posting my missteps as well as my successes.
About two years ago, I attended a conference where Robert C. Martin lead a session. Although the session wasn’t supposed to be about TDD, it quickly took that turn. I’m glad that it did, because I hadn’t seen TDD in action like that before. During this session, he started out by drawing a UML diagram to collect his thoughts and get the big picture down. But, he made a point of saying that now that we had a design document, we should hide it in a drawer. Although the diagram seemed appropriate and full of OOG (Object-Oriented Goodness; look it up), the end product that came out of the TDD style was nothing like that original diagram. Had we stuck with the diagram, we would have made something overly complicated and just plane wrong.Following that example, I’ll give a very high-level design for the language, but will then shove it in a drawer lest it get in the way. In future posts, I’ll actually show the code as it develops. Early on, I’ll show every minute step that I take, but after I get on a roll I’ll begin to give less tedious summaries of my progress.
A Back-of-a-Napkin Design
People have been writing compilers for quite some time now, and it only makes sense to base the design on those successes.
So, I know that I’ll start with a grammar using one of the standard notations, I’ll build a parser from that grammar which
turns source code into an Abstract Syntax Tree (AST), and then I’ll use the tree to validate the program and generate code for the target platform.
For the parsing, I’ll be using the Recursive Descent technique, which is less efficient than bottom-up parsing, but is much easier to fit in my head. It turns out that there is a special grammar notation that is perfect for recursive descent, called a Parsing Expression Grammar. These references on RD and PEGs are my primary background materials as I start this project, and I’ll consider them a prerequisite to anyone who wants to follow my progress.
The target platform may be MSIL, Java byte code, Parrot, or something else; maybe even another high-level language. If I target a high level language, then I’ll benefit from all the brilliant people who already wrote slick, optimized compilers. Imagine that VB.NET were compiled by first translating to C#, and then sending that translation to the C# compiler.
It is quite likely that at some point I’ll want to replace the top-down, recursive descent parser with a bottom-up technique, since those are more sophisticated. I’ll keep that in mind as a possible change to be made in the future, but I won’t be over-engineering things now just to make that potential-change easier.
I suspect that I’ll eventually need to use the Visitor Design Pattern for inspecting the AST. Most discussions of the Visitor pattern use ASTs as the prime example. I’ll keep this pattern in mind once I start needing to do varying actions against the tree, but I won’t actually include this pattern in my code until I really feel the need for it. When things start to hurt in a way that Visitor will help with, I’ll start to use it.
Now that I know where I’m going, I’ll put this 100,000 foot view off to the side, and later I’ll see how close it is to what the TDD process produces.