Formal Systems
March 19th, 2007Being in the Wrong Mindset
During my junior year at college, I took a course called “Principles of Theoretical Computer Science”. This course seemed very different from all my other CS and math courses, though it seemed to have elements of both. It didn’t quite feel like math because the things we were manipulating didn’t look like numbers, and it wasn’t even really about computers, unless you were squinting really hard when you looked at the whiteboard. Still, it had a math-vibe and a CS-vibe going on anyway. We approached a discipline that I never knew existed, from a few bare first principles, and it was intriguing to see so much arise from the idea of pushing symbols around.
The course was difficult, but not in the usual sense. It seemed so different from everything I had learned before. In retrospect, it shouldn’t have been difficult, and in fact it should have been trivial to keep up with, but my earlier education had not prepared me enough to realize that the course material was just one more example of a process that had been at the root of all of my math classes.
You say Girdle, I say Gödel
The following year, I was a peer tutor for a First Year Seminar course on modern mathematics. One of the course texts was Gödel, Escher, Bach: An Eternal Golden Braid. Hands down, it is the best non-fiction I’ve read, and will likely hold that title for some time to come. Unfortunately, the title makes it hard to slip into conversation. It’s not exactly good getting-to-know-you material:
Sally: So, what’s your favorite book?
Bob: Goedlesherbok.
Uncomfortable Silence
Sally: Um, what’s your second favorite book?
Bob: Cryptonomicon.
Sally: I think we should see other people.
GEB starts by introducing the notion of a formal system. I’ll get into the details later on in this document, but the general idea is that you start with a few facts that you will accept as true without proof such as, “0 is a number,” and a set of rules that let you turn facts into other facts, and then you try to prove new things by only using the facts and rules you started with.
GEB uses this discussion to explain Gödel’s incompleteness theorem. Gödel proved that number theory, which can be expressed as a formal system, is flawed as a result of its own power, and there is no way to render it flawless. It turns out that there are statements in number theory that are true but cannot be proven using number theory.
This flaw is problematic, because there is value in being able to prove something while working within the confines of a formal system such as number theory. We start with as few assumptions as possible, assumptions that seem like they just plain have to be true, and we still run into trouble. Also, the concept of an unreachable truth is not easy to grasp, and much of GEB is devoted to building up the required knowledge to appreciate Gödel’s work. In fact, it wasn’t easy for other mathematicians to grasp at the time either. Prior to Gödel, a mathematician could feel that no matter how hard a problem was, it was just a matter of time and effort before a solution could be found. Now, we know that isn’t the case. You might work on a proof your whole life with no chance of actually figuring it out.
GEB isn’t nearly as dry as I’m making it sound. It’s actually hard to put down, and contains both a talking turtle and an infinite collection of nested Genies. I can honestly promise you that it’s the most whimsical treatise of the incompleteness theorem you’ll ever read, with the possible exception of Dr. Seuss’ “Horton Hears a Fundamental Flaw in Number Theory”.
I found GEB’s discussion to be very reminiscent of much of what we learned in Theoretical CS. In fact, Turing’s work on the halting problem is one example of the “undecidable” statements Gödel was working on [source].
In thinking about connections between my Theoretical CS class and the First Year Seminar, GEB helped me to see more clearly why Theoretical was difficult. The problem was that every one of us is taught the wrong way to think about math from an early age. I had strong grades in both math and computer science, but I got into the habit of treating each new mathematical subject as a completely new discipline. After all, what could algebra and geometry possibly have in common? I didn’t see one rhombus in algebra, and my teachers didn’t use the word “proof” regularly outside of geometry. It turns out that the way we learned geometry was the closest we ever got to formal systems in my public education, even though they form the foundation for all math. Your tax dollars at work.
State of the Mathematical Union
We first learn to count by seeing examples of counting. Then we’re exposed to addition by referring to collections of physical objects that we can count together. Next, we’re shown how to write down a sum using symbols scratched onto paper. After knowing that symbols on paper could represent more complex ideas, we could develop an intuition for what a number is in the abstract, without having to attach them to apples, puppy dogs, pieces of peanut brittle, and the like.
Repeated addition is common and inefficient, so we were introduced to the multiplication symbol: an abstract symbol defined in terms of the simpler symbol we’d already built up an intuition for.
After dealing with more complex arithmetic, we hit another efficiency barrier, so we were shown yet another abstraction in the form of functions and variables. Now we could talk about a whole slew of expressions in one fell swoop.
We hit yet another efficiency barrier when trying to describe the behavior of functions, so we were shown an abstraction in the form of derivatives, integrals, and the like.
There is a pattern going on here that we’re never explicitly told about: when confronted with a barrier to our ability to efficiently talk about something abstract, we come up with a new batch of symbols which act as shorthand. We see how the symbols correspond with doing things “the hard way”, and we memorize the rules governing the way we push these symbols around on paper. The new symbols become a stand-in for the more verbose discussions we’re trying to avoid repeating.
After the official definition of the “derivative operator” is written down for safe keeping, we can skip that and flat-out resort to symbol manipulation to find the derivative of a function. We did the same thing with addition: after doing addition via brute-force counting, we were shown that we can add large numbers through symbol manipulation too. In both cases, once we have an intuition for the underlying operation, and discover patterns in the results, we can throw away brute-force action and stick with shuffling symbols around on paper instead.
The problem with the current education system is that we’re never told about the process that’s going on here. It’s all going on in the background. Sitting in calculus, I felt like derivatives and the like were an entirely new thing, instead of simply yet another application of the same process that had been going on since kindergarten: identifying a reusable operation, naming it, and discovering symbol-manipulation that achieves the same end. I had to wait until reading GEB to truly see what power comes from formal symbol juggling. That realization should have come much, much sooner. Say, around the third grade.
The result of being kept in the dark about what was really going on is that I didn’t get to appreciate the use of strict formalism. We’re encouraged to be sloppy in algebra because we aren’t aware of what’s really going on. We get confused when the teacher writes down the next step of a problem differently than we would. We stumble intellectually because we haven’t been encouraged to think consistently, so we end up treating each new concept as a new discipline.
We have been taught to blindly memorize the results of the process, and leave the details up to the big gray-bearded mathematicians in the sky. If our education from day one more closely resembled GEB, we wouldn’t have this problem. We’d be explicitly taught about the underlying process, we would know the value of strict formalism, and we’d be able to use those teachings to keep us in the proper mindset whenever something “new” like calculus or regular languages or lambda calculus or Turing machines come along.
To sum up, our education system is flawed because it starts with counting, neglects a focus on formal systems, and encourages
rote memorization as a substitute for independent thought. Starting with counting puts us squarely on the Arithmetic -> Algebra -> Calculus path, skipping over a huge number of really interesting and relevant subjects that would otherwise be within our grasp at an early age. Imagine how many of us might be programming in functional languages right now if we had an intuition for the lambda calculus by the 9th grade. We’re given an inferior toolbox. The students’ claim, “I can’t waste time learning about Peano numbers and propositional logic, I’m too busy learning math,” is exactly the same as saying, “I can’t sharpen my axe, I’m too busy chopping all this wood!”
Simple Formal Systems
So far, I’ve ranted about what’s missing in the education system. Now, I’ll begin to present an alternative. I can’t quite decide at what age this first collection of examples should apply to. I think anyone capable of adding and subtracting multiple-digit numbers should be able to handle everything in this section. Somewhere in the range of the first or second grade, we need to be taught about formal systems.
Formal systems are simple enough. You have 3 pieces of information to start with: a set of symbols, a set of axioms, and a set of derivation rules. You use these building blocks to build strings of symbols called theorems. Of course, we can’t start throwing big words at first graders. So we can’t call them formal systems, we can’t say theorem, or axiom, or derivation, etc. But we can very easily deal with deliberately-simplistic formal systems as if they were games or puzzles, without any need for big words.
Axioms are strings of symbols that you get to start with. Rules tell you how to write down new strings based on the strings you already have. You can keep applying rules to generate new strings. We can easily turn this process into puzzles:
Symbol: O
Start With: O
Rule: You can replace O with OO
Try to make OOO
Answer:
1. O
2. OO
3. OOO
After enough practice, we could make more tricky problems, where there is more than one possible direction to take at each step,
which will encourage some important reasoning about the system instead of blindly jumping in:
Symbols: O, ~
Start With: O~
Rules:
a. You can replace O with ~~
b. You can replace ~ with O
Try to make O~O
Answer:
1. O~
2. ~~~
3. O~~
4. O~O
At each step above, there is a choice to be made. There might be more than one way to arrive at a solution, and there might be some dead-ends, in that we could choose to apply one rule that sends us along a path that never yields the goal. We have to start out with brute force, so trial-and-error will be the main tool available at first. That’s not such a bad thing. At this point we’re really only trying to hammer home the process: you start with x, you can do y, so just turn the crank and watch all the crazy strings pop out.
We should start with symbols and rules that have no apparent meaning. Because the strings don’t mean anything, we are dealing 100% with the process of a formal system, and not muddying our minds with pesky lessons on how to count apples, puppy dogs, and pieces of peanut brittle.
We can always try to be clever and attach meaning to these examples. The first system represents all of the natural numbers, if you want it to. But meaning should come later. We need to start completely in the abstract: that makes it feel more like a game than like work, and it keeps everyone from building up the bad habit of attaching meaning too early.
Euclid’s Tunnel Vision
What does it mean to attach meaning too early, and why is that so bad? After all, if all this symbol stuff is ultimately about math, why not say that the first example above is a system for counting?
Euclid created a formal system for geometry, and produced a large number of proofs starting with just a few axioms. His process was very formal, and is a prime example of a real formal system in action. Although his work is clearly valuable, he assumed without proof the following axiom:
“If a straight line crossing two straight lines makes the interior angles on the same side less than two right angles, the two straight lines, if extended indefinitely, meet on that side on which are the angles less than the two right angles.”
[Euclid’s Fifth Postulate]
In English, he assumed that non-parallel lines intersect, and in turn defined parallel lines in the 9th-grade geometry sense. But this axiom is markedly more complex than the other four he started with, and it seems fishy to assume as true something that had to be described in complex terms. This is not what we want to start with for our own axioms, if we can avoid it. If the axioms are simple, then we can feel safer about our results. The above statement would ideally be the result of using the other four axioms as a starting point on their own, eventually proving this fact about parallel lines after many steps.
Euclid was trying to “clean up” geometry by starting from scratch, assuming only the bare essentials. People tried to prove the fifth postulate in terms of the other four to no avail. After much frustration and gnashing of teeth, some clever individuals decided to pretend that the fifth postulate was false, with the intent of using it until a contradiction was reached, thus proving the original fifth postulate to be true. But no such contradiction was found there, either. You could start out with Euclids’ first four postulates, as well as the negation of his fifth postulate, and use those to generate all sorts of statements about shapes. The statements produced are weird, but no matter how many of those statements they tried to create, they could not come up with two statements that contradict each other. So, was the fifth postulate true, or false? It had to be one or the other, right? After all, a line is a line is a line…
The problem with that train of thought is that we’re reasoning about the formalism from outside the system. We’re approaching the fifth postulate with a preconceived notion of what a point and line are! If you throw out all of that, and come up with a new image of what a point and line are, you can see more clearly why it doesn’t “hurt” to assume the fifth postulate is false. Both systems (that with the fifth, and that with the negation of the fifth) are separate and internally consistent. Neither one is more right than the other. Each simply describes its own little “universe” of shapestuff.
Earlier, I talked about how we develop formalisms after dealing with real-world ideas, in order to be efficient in our thinking about those ideas, but that is not the only point in having formal systems. If we force ourselves to think even more abstractly than normal, we can use formal systems to uncover whole new ways of thinking (ie. Non-Euclidean Geometry) that are every bit as real. As we’ll see a little later, it isn’t even fair to assume what a number is!
GEB’s author demands that we not attach meaning to our symbols before using a formal system to act on them. He calls this the “Requirement of Formality”. In reality, we often do have such meaning attached in the back of our minds, but we must push that aside while using the system, and only bring meaning into play when using the results of turning the system’s crank.
I had to make many a formal proof in my math classes over the years, but I didn’t learn about Euclid’s fifth postulate and the Requirement of Formality until I was a senior in college taking a course that was not widely offered. Something is very wrong with that.
More Examples
Note that in the earlier examples, it is assumed that each new string has to be made from the one right above it. That isn’t actually a requirement of formal systems, but it’s that extra bit of simplicity that will make this more teachable at an early age.
Once they’ve been through enough of this stuff, we can teach them how to use all of the available strings: multiple axioms, if given, as well as all the strings built along the way. That will help to keep them from having to repeat themselves down the road. Motivation for the new approach would come in the form of a single formal system applying to several problems:
Symbols: O, ~
Start With: O~ or ~~O~~
Rules:
a. You can replace O with ~~
b. You can replace ~ with O
Problem 1: Try to make O~O
Answer:
1. O~
2. ~~~
3. O~~
4. O~O
Problem 2: Try to make O~~O
Answer:
1. O~
2. ~~~
3. O~~
4. O~O
5. ~~~O
6. O~~O
We can point out that there is a pointless re-working of problem 1 in order to arrive at problem 2’s solution. Since it’s obvious that repetition might come up a lot, we can finally let them in on the fact that you don’t have to only work on the most recently-built string. Instead, they can use any string that they have ever made for the given system, in any problem about that system.
This modification also means that a single derivation that involves a lot of trial-and-error might become hard to grade, so we’ll also introduce a little more notation so that the student’s line of reasoning is clearer. At each step, we can add a little note saying which rule was used, and which string that rule was applied to. We can also note that Problem 1 basically defines a new rule, and we can just add that as a third rule. We could redo problem 2 like so:
Symbols: O, ~
Start With: O~
Rules:
a. You can replace O with ~~
b. You can replace ~ with O
c. You can replace O~ with O~O
Try to make: O~~O
Answer:
1. O~ [Start]
2. O~O [1, c]
3. ~~~O [2, a]
4. O~~O [3, b]
We start with a bucket that contains all of the axioms (the “Start With” strings), and the rules are like a machine: we stick a string into it, and out pops the original string as well as a new one. We put both of these into the bucket and repeat.
This is a pretty big concept for a youngster. We did a lot of work, and we don’t want to waste time doing it again, so we named it and added it to our set of available tools. The same thing happened when someone proved that the quadratic equation solves a whole slew of problems. Instead of re-inventing the quadratic equation every time you want to use it, you already have it as an available named tool. It’s important to be able to use the tools others have created, but it’s just as important to know how to produce your own tools on new problems.
At this point, we can start giving more useful terminology. Axiom and Theorem are still inaccessible to young students, so we could just call them Facts. A formal system could just be called a system. A system is a list of facts, as well as rules that turn facts into new facts.
Meaningful Formal Systems
Jumping into arithmetic forces intellectual tunnel vision upon us. It’s worth it to push that back in order to learn formal systems first. In this section, I’ll get more into the proper terminology that we’d have to hide from first graders, and make the connection between all these pointless puzzles and something more concrete.
As I mentioned before, a formal system is composed of a set of symbols, a set of axioms, and a set of derivation rules. A string is an ordered list of symbols taken from the given set. Basically, a string is anything that you could surround in quotation marks, like “#$%^^*” or “()~” or “elephant pants”. Axioms are strings that you get to start with. Derivation rules describe how to turn any string that is a part of the system into a new string that is also a part of the system. Our collection of strings starts with the axioms. We apply rules to generate new strings that become a part of the collection. These generated strings are called theorems. This setup directly parallels the more common usage of “axiom” and “theorem”. In the common usage, axioms are what you accept as true without proof, and we apply known laws to axioms in order to find new theorems, which we can finally accept as true in addition to the axioms.
The derivation rules can take forms much more complex than the toy examples I gave before. We can define rules that require us to chop up a string into all of its possible substrings, recognize and compare substrings, copy & paste or erase substrings, etc. For convenience, the empty string, the string composed of zero symbols, is considered a substring of any string. You can imagine that the empty string is wedged between every single symbol in a string. Our rules will often deal with the empty string implicitly. Also, for convenience, we can imagine that all of our formal systems include two “invisible” symbols: the start-of-string marker, and the end-of-string marker. Our rules will also often deal with these special symbols implicitly. They are implied in our set of symbols, they are implied in our axioms and theorems, and they are implied in our rules. If we ever felt like we were kidding ourselves, we could be explicit about it, but then we’d have to question our sanity.
When I write a rule that refers to a symbol that is not part of the system, that letter stands for any particular substring of the string in question. For instance, if a rule says that “if you have a string that ends in x, you may erase the x part”, we’re really saying that you are allowed to chop off any sized chunk from the right half of a theorem, which is an example of how the end-of-string marker can come into play.
In some systems, there may be too many axioms to write down explicitly. In that case, it is sufficient to just describe what all of the axioms look like, such as “All strings that start with o$! are axioms of this system.” We can call this description the axiom template.
Below is a formal system that looks a lot like the ones I talked about previously. The rules are a little more complex, but it is every bit as pointless as the ones before:
Symbols: o, #, /
Rules: (Let x stand for any substring composed of 1 or more o's)
Axiom Template:
Any string that is composed of one or more #terms is an axiom, where
a #term is defined as zero-or-more o's, followed by a /, followed by
zero-or-more o's. Multiple #terms are always joined with the # symbol.
(ie. oo/, oo/ooo#/o, /#/#/, etc...)
1.x/x may be replaced with /
2./#x may be replaced with x/#
3.x#/ may be replaced with #/x
4./#/ may be replaced with /
5.x#x may be replaced with #
6.If a string ends with #/, you can remove that #/
7.If a string starts with /#, you can remove that /#
A sample problem for this system could be “If xyz is the only axiom, find the shortest theorem of the system.” If our given axiom is ooo/#oo/, for instance, we can find the shortest string like so:
ooo/#oo/ [Axiom]
ooooo/#/ [1, Rule 2]
ooooo/ [2, Rule 6]
No more rules apply at this point. How can we be sure this is the shortest theorem possible? Perhaps we made a choice that sent us to a “dead end”, and if we had made different choices we might have found something shorter. It turns out that this will be easier to prove a little later.
Now, let’s redo all of the above, replacing that ugly # sign with the familiar +, for absolutely no reason at all (wink).
Symbols: o, +, /
Rules: (Let x stand for any substring composed of 1 or more o's)
Axiom Template:
Any string that is composed of one or more +terms is an axiom, where
a +term is defined as zero-or-more o's, followed by a /, followed by
zero-or-more o's. Multiple +terms are always joined with the + symbol.
(ie. oo/, oo/ooo+/o, /+/+/, etc...)
1. x/x may be replaced with /
2. /+x may be replaced with x/+
3. x+/ may be replaced with +/x
4. /+/ may be replaced with /
5. x+x may be replaced with +
6. If a string ends with +/, you can remove that +/
7. If a string starts with /+, you can remove that /+
Here’s a few more examples, now using + instead of #. Assume that the first line of each example is the given axiom, and we still wish to find the shortest theorem:
ooo/+oo/
ooooo/+/
ooooo/
/ooo+/oo
/+/ooooo
/ooooo
ooo/+/oo
ooo/oo
o/
/ooo+oo/
/o+/
/o
All of that still looks like garbage. Try out a few problems. Be sure that you come up with axioms that are well-formed in this system, as described in the axiom template. Also, don’t make silly mistakes by trying to guess what meaning any of this might have. Don’t let the + sign fool you, as a normal notion of addition doesn’t jive (see rule 5, for instance). A pattern will emerge. It is as if the rules let us “collect” the o’s into groups under certain conditions, and then “cancel them out” using Rule 1. Rules 6 and 7 seem to be cleaning up after us when +terms lose all of their o’s to other, more greedy +terms.
We can apply meaning to a formal system by establishing a one-to-one mapping between each aspect of the system and some object/idea in the real world. For example, we attach meaning to the obscure symbol 12 by saying that it corresponds with the number of items in a typical carton of eggs. There is nothing about 12 that demands that relationship. We use symbols entirely by convention. Here is one possible mapping between the o/+ system and the real world:
ooo/ = 3
oo/ = 2
o/ = 1
/ = zero
/o = -1
/oo = -2
/ooo = -3
Etc...
+ == addition as we know it
Maybe now that we’ve attached meaning to the o’s and /’s, the bizarre rules involving + don’t seem so arbitrary.
Our 1-1 mapping with the real world applies to the rules too:
Rule 1 == cancellation within a term
Rule 2 == grouping of positive terms (for pos + pos)
Rule 3 == grouping of negative terms (for neg + neg)
Rule 4 == cancellation across terms (for pos + neg)
Rule 5 == cancellation across terms (for neg + pos)
Rule 6 & 7 == additive identity
Now it might be clearer how the rules always yield the shortest theorem. All well-formed axioms happen to represent an integer sum, which can only possibly have a single integer as its value. A single integer is written with the fewest characters. A sum that adds up to the number n is written most economically with n+1 symbols (one slash, and one o for 1..n). Since our mapping between integers and the formal system is complete and consistent, a proof in terms of integer sums applies to the system as well. Applying the rules always results in a single integer in reduced form.
Formal systems could be defined that perform algebra, where something 5x+2=12 is a given axiom, and x=2 is the goal string to produce from the rules. Algebra could be taught as a completely mechanical process that just happens to always be the same as solving a problem in the normal sense. By having a rigorous process, you won’t have 10 kids each solving the same problem using their own half-hearted version of the notation.
Also, we haven’t limited ourselves to plain-old arithmetic and algebra. Plain-old arithmetic and algebra would be just one system among many at our disposal. Why wait until college to learn about the lambda calculus or propositional logic when you have all of the tools necessary to understand them both by the 8th grade?
The idea of this lesson was to present a system that has no apparent meaning, but can be blessed with meaning by mapping it to real concepts in a consistent way. Every axiom, every rule, and every theorem generated maps to integer sums. The “pointless” system gives the same result every single damn time we try to think of a real world collection of objects to sum up. We’re well on our way to arithmetic here, but we haven’t made the mistake of limiting ourselves to arithmetic as the foundation. Instead, we have formal systems as a foundation, which brings many more concepts well within reach besides arithmetic.
Surreal Numbers
Earlier I mentioned that we shouldn’t even assume what a we mean when we say “number”. An example of a complex formal system is given in Surreal Numbers by Donald E. Knuth. This book covers a formal system created by John H Conway, that guy who made the Game of Life. At a glance, the axioms and rules appear to describe the real numbers as we know them. The two characters start producing theorems left and right, eventually proving such things as 1+1=2. Eventually, though, they discover that there exist numbers in this system that follow all the rules, but are not themselves real numbers. The discussion then gets into oddball ideas like having different sizes of infinity. Anyway, it’s a great example of how everything, even the concept of a number, is open to interpretation from one system to the next. It’s challenging, but it’s also short, and is written in a very accessible style much like GEB. Two thumbs up.
