Geek Sublime
With his emphasis on programmer happiness, Matz makes explicit his allegiance to Donald Knuth’s literate programming. He writes:
Programs share some attributes with essays. For essays, the most important question readers ask is, “What is it about?” For programs, the main question is, “What does it do?” In fact, the purpose should be sufficiently clear that neither question ever needs to be uttered … Both essays and lines of code are meant—before all else—to be read and understood by human beings.3
The trouble of course is that as software programs grow bigger and more complex, the code they comprise tends to become unreadable and incomprehensible to human beings. Programmers like to point out that if each line of code, or even each logical statement (which may spread to more than one physical line), is understood to be a component, software systems are the most complicated things that humans have ever built: the Lucent 5ESS switch, used in telephone exchanges, derives its functionality from a hundred million lines of code; the 2008 Fedora 9 distribution of Linux comprises over two hundred million lines of code.4 No temple, no cathedral has ever contained as many moving parts. So if you’ve ever written code, you understand in your bones the truth of Donald Knuth’s assertion, “Software is hard. It’s harder than anything else I’ve ever had to do.”5 If you’ve ever written code, the fact that so much software works so much of the time can seem profoundly miraculous.
Software is complicated because it tries to model the irreducible complexity of the world. Even a simple software requirement for a small company that, say, provides secretarial services for the medical insurance industry—“We need an application that makes it easier for our scribes to write up reports from doctors’ examinations of insurance claimants”—will always reveal a swirling hodgepodge of exceptions and special cases. Some of the doctors will have two addresses on file, some will have three, and this one was on Broadway until January 22 and in the East Village afterwards. A report will always begin with a summary of the patient’s claimed condition, unless it’s being written for Company X, which wants a narration of the doctor’s exam up front. There are four main types of boilerplate text for the doctor’s conclusions, but there needs to be a “freeform” option, and room for other templates, but creation of new templates needs to be restricted to certain users. And so on, and on, and on.
The program you create in response to these requirements must reduce repetitive labor, automate the work that must be done each time, yet remain flexible enough to allow variation. The business practices that can be formalized into sets of procedures, into Grace Hopper’s dinner recipes—first do “a,” then “b,” then “c”—are easy to convert into code. In fact, the software practice that I learned in the eighties was called “procedural programming”—you wrote a program as a series of procedures that you called in sequence. Your job as a programmer was to “chunk” the system you were trying to model into clean, self-contained actions, and then construct more complex parts out of these simple elements. So, if you want to write a new report, what you do is: retrieve the doctor’s address, retrieve the patient’s information, create a new file. Or, in (pseudo) code: RetrieveDoctorsAddress(), RetrievePatientInfo(), CreateNewFile(). And these procedures would be called from BeginNewReport(). Which might be called from ShowApplicationMainMenu().
To engage in this kind of assemblage of functionality is bracingly logical and orderly; you feel like you are making a perfect little machine, a clean, comprehensible automaton that you have set in motion. But soon, as you adapt your procedural engine for the exceptions, for all the variations that exist in the real world, you find yourself snarled in squirming thickets of if-then-else constructs, each of which contains yet other if-elseif and switch-case monsters, and you find that you have to break out of your beautiful Report-Main-Body loop and backtrack to other reports to retrieve history, and then, inevitably your procedures become more complex and start doing two things instead of one, RetrievePatientInfo() is now doing the retrieving but is also checking for valid addresses, you know that functionality should be somewhere else but you don’t have the time to bother, the users ask for a new feature and you patch it in, and of course you mean to come back later and clean everything up, but then, before you know it, you are trapped inside an unwholesome, uncontrollable atrocity, a Big Ball of Mud: “[a] haphazardly structured, sprawling, sloppy, duct-tape and bailing wire, spaghetti code jungle.”6
Often, it is not the lack of programming skill that leads to the emergence of a Big Ball of Mud, but something akin to the time-honored Indian practice of jugaad. Jugaad is Hindi for a creative workaround, a working improvisation that is built in the absence of resources and under pressure of time (from the Sanskrit yukti, trick, combination, concatenation). There can be something heroic about jugaad, as in the strange-looking trucks one sees bumping down country roads in rural India, which on closer examination turn out to be carts with diesel irrigation pumps strapped on; or the amphibious bicycles buoyed by improvised air floats and powered by blades taken from ceiling fans. Jugaad makes do, it gets work done, it maneuvres around uncooperative bureaucracies, it hacks. In recent years, jugaad has been recognized as down-to-earth creativity, as a prized national resource, and has acquired the dignifying sobriquet of “frugal engineering.”
In software, repeated applications of excessively frugal engineering by a series of programmers leads to a scheme that has no discernible structure, within which components use each other’s functionality promiscuously, so that the logic of the program becomes hard or impossible to follow. In a Big Ball of Mud—and yes, it is a technical term of art—effects flow across boundaries, so that introducing a small change to one piece of code results in unpredictable behavior in distant parts of the system. Software needs maintenance: bugs need to be fixed, new features are demanded by users. If you are the programmer asked to go into the depths of a Big Ball of Mud, the prospect is terrifying. I mean that quite literally; as you poke and prod into the innards of a badly written program on which users depend, you are often beset by paralyzing dread. How can you fix something you can’t understand? What if your fix introduces new bugs that reveal themselves in some future disaster which corrupts and loses data? The impulse then is to rewrite the whole program from the bottom up, in accordance with hard-won principles of good program design. But—often there is no budget for a complete rewrite, there is no time, there isn’t enough manpower. So maybe you patch a bit here, work in a clumsy kludge there—jugaad! Hundreds of programmers may have worked on such a program over the years, each contributing a little to the mess. So you add your handful of mud to the Big Ball, or maybe you just back away carefully and leave the damn thing alone. There are some areas of code in running programs that may as well be marked Here Be Dragons, and there are some programs that have run for decades—at universities, corporations, banks—that cannot be efficiently maintained or enhanced because nobody completely understands how they work.
For instance: the US Pentagon’s Defense Finance and Accounting Service (DFAS) is known to make “widespread” errors in paying soldiers’ salaries, and is slow to correct these mistakes when challenged. The software that the Pentagon uses for payroll and accounting comprises about seven million lines of COBOL code, mostly written in the sixties. The system hasn’t been updated in more than a dozen years, and significant portions of the code have been “corrupted”—long-running systems can suffer from “software entropy” or “code rot,” a slow deterioration in functionality because of the changing hardware or software environments they run within. A retired Pentagon employee reported that the system is “nearly impossible” to update because its documentation disappeared long ago: “It’s hard to make a change to a program if you don’t know what’s in there.”7
When a situation like this becomes desperate enough, the powers-that-be may employ skilled “code archaeologists” to spelunk into the depths. Or pony up for a complete rewrite. Mostly, managers prefer to plug up the holes and leave the Big Balls of
Mud to roll on. COBOL, a language first introduced in 1959 by Grace Hopper (“Grandma COBOL”), still processes 90 percent of the planet’s financial transactions, and 75 percent of all business data.8 You can make a comfortable living maintaining code in languages like COBOL, the computing equivalents of Mesopotamian cuneiform dialects. These ancient applications—too expensive to replace, sometimes too tangled to fix or improve—run on, serving up the data that appears on the chromed-up surface of your browser, which gives you the illusion that your bank and your local utility companies live on the technological cutting edge. But as always, the past lives on under the shiny surface of the present, and often, it is too densely tangled to comprehend.
The International Obfuscated C Code Contest annually awards recognition to the writer of “the most Obscure/Obfuscated C program”—that is, to the person who can produce the most incomprehensible working code in the language C.9 The stated pedagogical aim of the contest is “to show the importance of programming style, in an ironic way.”10 But it has always seemed to me that confronting unfathomable code is the programming equivalent of staring at the abject, of slowing down to peer into the carnage of a car wreck. This is the reason that programmers expend time and effort in designing esoteric, purposely difficult computer languages like the infamous “brainfuck”—that really is its official name, with the lowercase b—which was created as an exercise in writing the smallest possible compiler (240 bytes) that could run on the Amiga operating system. “Hello, world!” in brainfuck is:
brainfuck is a “Turing tarpit,” which is to say it is a very small language in which you can write any program that you could write in C or Java; but attempting to do so would, well, fuck your brain, and therefore the delectable frisson of terror the code above induces in discerning code cognoscenti. brainfuck is venerable and famous, but my favorite esoteric language is Malbolge, which was designed solely to be the most outrageously difficult language to program in. It is named, appropriately, after the eighth circle of hell in Dante’s Inferno, Malebolge (“Evil ditches,” reserved for frauds). In the language Malbolge, the result of any instruction depends on where it is located in memory, which effectively means that what specific code does changes with every run. Code and data are very hard to reuse, and the constructs to control program flow are almost nonexistent.11 Malbolge inverts the sacred commandments of literate programming, and is so impenetrable that it took two years after the language was first released for the first working program to appear, and that program was not written by a human, but generated by a computerized search program that examined the space of all possible Malbolge programs and winnowed out one possibility. “Hello, world!” in Malbolge is:
And yet, this snippet doesn’t convey the true, titillating evil of Malbolge, which changes and quakes like quicksand. To contemplate Malbolge is to stare into the abyss in which machines speak their own tongues, indifferent to the human gaze; the programmer thereafter knows the pathos of her situation, and recognizes the costs of sacrilege. The coder’s quest is for functionality—“all computer programs are designed to accomplish some kind of task”—and the extension and maintenance of that functionality demands clarity and legibility. Illegibility, incomprehensibility—that way madness lies.
The desire for lucidity as well as power, and therefore the maximization of productivity and happiness among programmers, has—according to some accounts—created more than eight thousand computer languages over the last half-century.12 Each of these artificial, formal languages embodies a philosophy of human-computer interaction; whole family trees of dialects have evolved from the hacker’s eternal desire to improve, to implement better. Adherents of one language will criticize another’s choice with the fierce religiosity of those who are convinced that they are completely rational. The computer scientist Edsger Dijkstra, for instance, didn’t hold back his feelings in a famous 1975 memo: FORTRAN was an “infantile disorder”; PL/I was a “fatal disease”; programmers exposed to BASIC were “mentally mutilated beyond hope of recognition”; use of COBOL “cripples the mind, its teaching should, therefore, be regarded as a criminal offence”; APL was a “mistake, carried through to perfection.”13
In any given year, there are a couple of languages that are cool—at the time of this writing, all the really hip kids are learning Clojure. And there are certain languages that smell of terminal uncoolness: users of Microsoft’s Visual Basic are regarded by many as the dorks of the computing world, the most Mort-ish of the Morts. Like all its predecessors in the long “Basic” lineage of languages, which goes back to 1964, Visual Basic has an English-like syntax. With its first release in 1991, Visual Basic provided a relatively easy way for non-experts to build programs for Windows; precisely for these reasons, it has attracted the contempt of the math-inclined Einsteins, who despise its verbosity and blame its accessibility for the plague of terrible programs that still afflicts Windows.
The evolution of computer languages also reflects the development of computer science and the craft of programming. Procedural programming called for the decomposition of complexity into simpler procedures which were then called sequentially. But this method was clearly inadequate: Big Balls of Mud were being built everywhere, software projects overshot budgets and failed at an alarming rate, there was an ever-present air of crisis. In the mid-eighties, a newly popular method, “Object Oriented Programming,” offered salvation. OOP—first conceived in the nineteen sixties—uses “objects,” code constructions that contain both data (attributes that describe the object) and behavior (methods or procedures that you can call to effect changes); objects interact by passing messages to each other. In procedural programming, information and functionality were scattered all over the place; a doctor’s addresses might be in a database, the knowledge of how to use these addresses in a procedure somewhere, and the ability to create new addresses in yet another procedure elsewhere. With OOP, the promise was that you could neatly encapsulate both data and behavior inside objects, and then use these objects to faithfully model the stuff of the real world, which of course is full of objects. Using OOP, you could write code like this to create an object representing a new doctor and store his phone number:
Doctor drKumar = new Doctor();
drKumar.PhoneNumber = ‘5105550568’;
Here, “drKumar” is an object of the type “Doctor”; one of the data fields of this object is “PhoneNumber.” You may have another type called “Patient,” which might have a data field called “Diagnosis.” So with objects, when you needed the good Dr. Kumar’s address, you didn’t have to go searching through a master list of addresses. You could just retrieve the drKumar object and write something like the following to assign Dr. Kumar’s address to the “MailingAddress” field of the report the user wanted to print:
report.MailingAddress = drKumar.Address;
The dawn of OOP was a heady time. I imagined shiny metallic objects rising out of the primal Big Ball of Mud, pristine and clean, shooting messages at each other like bolts of energy. The rest of the programming world seemed equally excited: hundreds of books were written, OOP conferences were held, some programmers became superstar OOP gurus who charged large sums to teach the secrets of the OOP way of thinking. Every prominent programming language now acquired object orientation, if only as a possible method among others. And really, programming in this new idiom made it possible to solve certain problems with ease and a degree of elegance.
Yet—Big Balls of Mud continued to be born, to grow. Software projects continued to crash and burn, ruining budgets and careers. If OOP managed to avoid the inevitable snarls of procedural programming, it introduced new kinds of complexity. Even small systems now divided functionality into hundreds and thousands of objects. Thinking about all these pieces and layers and their interactions became increasingly difficult; subtle bugs arose from what felt like spooky action at a distance—entanglements between objects emerged from their whirling dance. In defense, programmers codified best practices into a smor
gasbord of easily abbreviated rubrics: Separation of Concerns, Single Responsibility Principle, Don’t Repeat Yourself, Liskov Substitution Principle. Automated testing of each independent section of code became more crucial than ever, first to ensure that the code was actually doing what you wanted, and later to be sure that it was still doing what you wanted after you made changes in some other section of code—effects sometimes flowed unpredictably from one region to another.
In fact, adherents of Test-Driven Design (TDD) would have you write the tests before you ever write the code—that is, you write DoesMyAddTwoNumbersFunctionReallyReturnTheRightSum() before you write AddTwoNumbers(), thus forcing you to design AddTwoNumbers() to be easily testable. Usually, you write several tests for each section of code, checking for correct behavior under varying conditions, and so the lines of test code can easily outnumber the lines of program code by orders of magnitude. The open-source database SQLite, at the time of this writing, has 1,177 times the amount of test code as it does program code.14 Most non-programmers have never heard of SQLite, but it is the most widely deployed database in the world.15 SQLite is a tiny program. It runs within your Firefox browser, storing your bookmarks; it is used widely within the Mac operating system; it runs within each copy of Skype; it runs on your smartphone, storing contacts and appointments. SQLite’s vast suite of tests is an attempt to prevent bugs from creeping into a program that has become an essential, foundational component of the working memory of humanity.