Turn it up to 11
February 26th, 2007I started running into a little trouble with my TestFixture classes. The glaring example was with the TokenizerTest fixture, which tested every nook and cranny of the Tokenizer class. The tokenizer is responsible for advancing through the input source code, chopping it up into meaningful tokens, such as operators, keywords, identifiers, and the like. One of the features of the tokenizer is the ability to ‘peek ahead’ to the next token without consuming it. Having a separate tokenizing step before the actual parsing starts makes things easier; I get to write the grammar and parser at a higher level than I would otherwise. It’s a separation of concerns.
Well, even when I separate these concerns, the tokenizer has a great deal of state changes and edge cases. At any moment, there are many different kinds of tokens that might come next, and whitespace needs to be skipped, and there are a few unexpected situations that raise exceptions, and…
As a result of some naive test case writing, the TokenizerTest fixture started to feel its age. In order to handle the many different states, I had to “set up” (an unfortunate choice of words that should have been a clue early on) some state inside each test method before making an assertion. This seemed reasonable at the time, since each test was still isolated from all the others, and the thing being tested just plain needed a special initial state.
Around the time that this got unwieldy, I read Specs vs. Tests, which pointed me toward RSpec, one of Ruby’s answers to xUnit. That article got me interested in the idea of using unit tests as a form of automatic documentation, and also reminded me that each TestFixture should really deal with one set of state to be tested (within reason). If different features require notably different setup state, then they belong in different TestFixtures each with their own SetUp method. Having lots of test methods each with their own internal set up code is usually a code smell.
None of this discussion is really news to those who have read much about unit tests, but a distinction between what I have been reading and what I have been doing has been a matter of exactly how disciplined I was being. When you deliberately write your tests with the intention of generating documentation of behavior in various contexts, you just plain write better tests.
I applied this mostly-superficial change to my tokenizer tests, and now they read a whole lot better than before. Also, it didn’t take much to make a tool that transforms a TestFixture into some documentation, similar to what RSpec generates. I applied this technique (rather, turned the existing technique up to 11) to several fixtures, and noticed that it made a big difference for classes that had to deal with many states, or at least had more complex behaviors, than for those classes that don’t really do much yet. I also found that it was much easier to stick close to having one assertion per test method, although I deliberately don’t follow that rule 100% of the time.
For those who love word games, the distinction being made is between Test Driven Development and Behavior Driven Development, but all this really means is that from the very beginning of the xUnit libraries, it was misleading to even call them “tests”. Now that “tests” has been beaten into our heads, it’s kind of hard to avoid using the word, so making the new buzzword “BDD” isn’t entirely lame or pointless. It forces your mind to take notice long enough to hear the real lesson: it has always been about designing meaningful interfaces, rather than just verifying that an interface produces expected results. I still call them tests, and that doesn’t matter. What matters is that when I write them I can use a more appropriate point of view.
There’s more to RSpec than writing smaller fixtures with smaller tests and more appropriate setup methods. RSpec assertions are written a little differently, too, making them read more like English. For the .NET world, we’ll need to wait for C# 3 before we can do such nifty things as obj.ShouldBeNull().
Here’s the documentation generated for the Tokenizer and Parser classes. Each context line came from the Description property on a TestFixture attribute, and each specification line (the indented rules) came from turning “CamelCaseMethodName” into “camel case method name”.
An empty tokenizer
- returns null upon consume attempts
- returns false upon all peek attempts
- returns false upon try consume
- cannot consume expected tokens
- cannot consume identifiers
A tokenizer with garbage
- cannot consume
A single-integer tokenizer
- consumes entire integer
- returns true upon peek integer
- returns false upon peek identifier
- cannot consume identifiers
A tokenizer with white space
- separates tokens by white space
- can consume expected token
- does nothing when try consume fails
- advances when try consume succeeds
- can peek with expected tokens
A tokenizer with operators
- consumes operators greedily
A tokenizer with identifiers
- distinguishes identifiers from other tokens
A tokenizer with parenthesized operators
- treats parenthesized operators as identifiers
A tokenizer with keywords
- can peek with expected tokens
- distinguishes keywords from identifiers
- cannot consume keyword as identifier
An expression parser
- parses boolean literals
- parses integer literals
- parses parenthetical expression
- demands balanced parentheses
- parses function calls
- parses unary negation
- parses unary not
- ranks parentheses before unary operators
- ranks function calls before unary operators
- parses multiplicative operations
- treats binary operations as left associative
- ranks unary before multiplicative operators
- parses additive operations
- ranks multiplicative before additive operators
- parses relational operations
- ranks additive before relational operators
- parses equality operations
- ranks relational before equality operators
- parses logical and operations
- ranks equality before logical and operator
- parses logical or operations
- ranks logical and before logical or operator
- parses if expression
- associates else with nearest preceding if
- requires parentheses for if expression conditions
- requires else part for if expressions
- requires expression bodies inside if expressions
A statement parser
- parses let statements only in compound statements
- parses let statements only at start of compound statements
- parses let statements in compound statements
- parses return statements
- parses yield statements
- parses if statements
- associates else with nearest preceding if
- requires parentheses for if statement conditions
- parses for statements
- requires parentheses on for statement declaration
- parses while statements
- requires parentheses on while statement conditions
- parses expression statements
- cannot parse if expressions as expression statements
A function parser
- allows expression as body with implied return
- cannot redefine keywords
- allows compound statement bodies
- parses untyped arguments
- parses typed arguments
- parses multiple arguments
- parses operator overloads
A program parser
- parses zero or more functions