Cargo Cult Programming

December 09, 2011

During World War II, Japanese and American troops had to transport a great deal of cargo through the Pacific islands, setting up many short-lived landing strips along the way. Members of tribal societies in the area, who had not seen landing strips and planes before, didn't quite understand the how or the why, only the fact that the presence of landing strips and planes coincided with the arrival of supplies. After the war, they would try to build immitation landing strips and planes in the hopes that this would cause new cargo to arrive.

This is an example of a cargo cult. The term has since been applied to a wide variety of other topics, when we supposedly-more-advanced people make the exact same mistake. When someone superficially mimics all the trappings of a legitimate scientist in order to support ideas with no scientific leg to stand on, they're engaging in cargo cult science. Similarly, when programmers write unhelpful code because it looks or feels like something they ought to be doing, they're engaging in cargo cult programming.

Years ago, I was working on a project in which the architects required the team to code the layers of our application according to a very strict pattern. It had all the trappings of a Good Layered Archictecture, but had the opposite effect in practice.

At the bottom, we had a Data Access Layer responsible for making database calls. Above that we had a Business Layer, which our ASP.NET code-behinds depended on. A general fear of exceptions led to a lot of rules about try/catch blocks and logging. It turned out, very little business was actually going on in these methods. They largely just forwarded arguments passed from the UI layer down to the Data Access Layer, amounting to hundreds of methods with the following structure:

The Handle method would serialize the exception to a log file, and then rethrow the exception after wrapping it in a BusinessException devoted to this layer.

Furthermore, each of these methods had to have full XML documentation in order to pass code review. Monotonous, you say? No worries, because everyone could install a plugin to automatically write those comments by inspecting the code.

In an elevator pitch, these might sound like good practices. "You don't want your UI layer making database calls, right? You want to handle your exceptions, right? You want to log your exceptions, right? You need documentation, right?"

Unfortunately, these decisions were exceptionally costly in several ways. The initial costs came in development time. Imagine how often developers would reach for their copy and paste shortcuts while developing hundreds of these methods, effectively one for each database call. Imagine creating and then tracking down the inevitable copy/paste bugs. Imagine how silly I felt writing unit tests for these.

The documentation was almost always generated by a plugin, meaning that we all very quickly learned to skip over them while reading through the code. Whenever an important comment was actually written, it would therefore never been seen. A comment that tells me exactly what my code says is not a comment.

The costs were magnified when it came to troubleshooting production issues. Because every exception was caught, wrapped, and rethrown, stack traces became quite garbled and misleading. A similar try/catch appeared in the Data Access Layer, so when you tried to read the log file for a failed database call, the complexity was compounded. Instead of seeing a stack trace that showed the steps from the incoming request down to the actual point of failure, you had to walk through nested stack traces sprinkled with the untrue "failure" of each call to Handle().

The costs were further magnified in those cases where we actually did want some specific exception types to fall up to the caller. Here, in order to follow the pattern and pass code review, our try/catches had to get extra clever:

This all walked like exception handling, and quacked like exception handling, but it was really just a duck.

Handle()'s contributions were the logging of exceptions and the introduction of a layer-specific exception wrapper. Logging is useful, but could be performed by a logging framework higher up the call chain so that we don't have to keep on remembering to include it. The layer-specific exception instance served no purpose other than to remind us that we did in fact have a layer named "Business". Removing the exception handling pattern would leave us with the following:

Harrumph. This entire layer's contribution, now, is to provide a namespace containing the word "Business". It can be removed entirely. Hide the remaining Data Access layer behind some repository interfaces and you're well on the way to an architecture that solves problems. To see where this train of thought eventually leads you, see the Onion Architecture.

Setting up conventions and patterns for your team to follow can be extrememly valuable, but those conventions and patterns always come with a cost. You have to weigh the cost of including a pattern with the savings they provide. You have to know what pain you're solving. Even following a great pattern without understanding can lead to waste. You've got to trust your team members to learn and understand not only what to do, but why to do it.