What Just Happened: A Chronicle From the Information Frontier
For example, in the sentence, “Apples are red,” the first word eliminated other kinds of fruit and all other objects in general. The second directs attention to some property or condition of apples, and the third eliminates other possible colors.…
The number of symbols available at any one selection obviously varies widely with the type of symbols used, with the particular communicators and with the degree of previous understanding existing between them.♦
Hartley had to admit that some symbols might convey more information, as the word was commonly understood, than others. “For example, the single word ‘yes’ or ‘no,’ when coming at the end of a protracted discussion, may have an extraordinarily great significance.” His listeners could think of their own examples. But the point was to subtract human knowledge from the equation. Telegraphs and telephones are, after all, stupid.
It seemed intuitively clear that the amount of information should be proportional to the number of symbols: twice as many symbols, twice as much information. But a dot or dash—a symbol in a set with just two members—carries less information than a letter of the alphabet and much less information than a word chosen from a thousand-word dictionary. The more possible symbols, the more information each selection carries. How much more? The equation, as Hartley wrote it, was this:
H = n log s
where H is the amount of information, n is the number of symbols transmitted, and s is the size of the alphabet. In a dot-dash system, s is just 2. A single Chinese character carries so much more weight than a Morse dot or dash; it is so much more valuable. In a system with a symbol for every word in a thousand-word dictionary, s would be 1,000.
The amount of information is not proportional to the alphabet size, however. That relationship is logarithmic: to double the amount of information, it is necessary to quadruple the alphabet size. Hartley illustrated this in terms of a printing telegraph—one of the hodgepodge of devices, from obsolete to newfangled, being hooked up to electrical circuits. Such telegraphs used keypads arranged according to a system devised in France by Émile Baudot. The human operators used keypads, that is—the device translated these key presses, as usual, into the opening and closing of telegraph contacts. The Baudot code used five units to transmit each character, so the number of possible characters was 25 or 32. In terms of information content, each such character was five times as valuable—not thirty-two times—as its basic binary units.
Telephones, meanwhile, were sending their human voices across the network in happy, curvaceous analog waves. Where were the symbols in those? How could they be counted?
Hartley followed Nyquist in arguing that the continuous curve should be thought of as the limit approached by a succession of discrete steps, and that the steps could be recovered, in effect, by sampling the waveform at intervals. That way telephony could be made subject to the same mathematical treatment as telegraphy. By a crude but convincing analysis, he showed that in both cases the total amount of information would depend on two factors: the time available for transmission and the bandwidth of the channel. Phonograph records and motion pictures could be analyzed the same way.
These odd papers by Nyquist and Hartley attracted little immediate attention. They were hardly suitable for any prestigious journal of mathematics or physics, but Bell Labs had its own, The Bell System Technical Journal, and Claude Shannon read them there. He absorbed the mathematical insights, sketchy though they were—first awkward steps toward a shadowy goal. He noted also the difficulties both men had in defining their terms. “By the speed of transmission of intelligence is meant the number of characters, representing different letters, figures, etc., which can be transmitted in a given length of time.”♦ Characters, letters, figures: hard to count. There were concepts, too, for which terms had yet to be invented: “the capacity of a system to transmit a particular sequence of symbols …”♦
THE BAUDOT CODE
Shannon felt the promise of unification. The communications engineers were talking not just about wires but also the air, the “ether,” and even punched tape. They were contemplating not just words but also sounds and images. They were representing the whole world as symbols, in electricity.
* * *
♦ In an evaluation forty years later the geneticist James F. Crow wrote: “It seems to have been written in complete isolation from the population genetics community.…[Shannon] discovered principles that were rediscovered later.… My regret is that [it] did not become widely known in 1940. It would have changed the history of the subject substantially, I think.”
♦ In standard English, as Russell noted, it is one hundred and eleven thousand seven hundred and seventy-seven.
7 | INFORMATION THEORY
(All I’m After Is Just a Mundane Brain)
Perhaps coming up with a theory of information and its processing is a bit like building a transcontinental railway. You can start in the east, trying to understand how agents can process anything, and head west. Or you can start in the west, with trying to understand what information is, and then head east. One hopes that these tracks will meet.
—Jon Barwise (1986)♦
AT THE HEIGHT OF THE WAR, in early 1943, two like-minded thinkers, Claude Shannon and Alan Turing, met daily at teatime in the Bell Labs cafeteria and said nothing to each other about their work, because it was secret.♦ Both men had become cryptanalysts. Even Turing’s presence at Bell Labs was a sort of secret. He had come over on the Queen Elizabeth, zigzagging to elude U-boats, after a clandestine triumph at Bletchley Park in deciphering Enigma, the code used by the German military for its critical communication (including signals to the U-boats). Shannon was working on the X System, used for encrypting voice conversations between Franklin D. Roosevelt at the Pentagon and Winston Churchill in his War Rooms. It operated by sampling the analog voice signal fifty times a second—“quantizing” or “digitizing” it—and masking it by applying a random key, which happened to bear a strong resemblance to the circuit noise with which the engineers were so familiar. Shannon did not design the system; he was assigned to analyze it theoretically and—it was hoped—prove it to be unbreakable. He accomplished this. It was clear later that these men, on their respective sides of the Atlantic, had done more than anyone else to turn cryptography from an art into a science, but for now the code makers and code breakers were not talking to each other.
With that subject off the table, Turing showed Shannon a paper he had written seven years earlier, called “On Computable Numbers,” about the powers and limitations of an idealized computing machine. They talked about another topic that turned out to be close to their hearts, the possibility of machines learning to think. Shannon proposed feeding “cultural things,” such as music, to an electronic brain, and they outdid each other in brashness, Turing exclaiming once, “No, I’m not interested in developing a powerful brain. All I’m after is just a mundane brain, something like the president of the American Telephone & Telegraph Company.”♦ It bordered on impudence to talk about thinking machines in 1943, when both the transistor and the electronic computer had yet to be born. The vision Shannon and Turing shared had nothing to do with electronics; it was about logic.
Can machines think? was a question with a relatively brief and slightly odd tradition—odd because machines were so adamantly physical in themselves. Charles Babbage and Ada Lovelace lay near the beginning of this tradition, though they were all but forgotten, and now the trail led to Alan Turing, who did something really outlandish: thought up a machine with ideal powers in the mental realm and showed what it could not do. His machine never existed (except that now it exists everywhere). It was only a thought experiment.
Running alongside the issue of what a machine could do was a parallel issue: what tasks were mechanical (an old word with new significance). Now that machines could play music, capture images, aim antiaircraft guns, connect telephone calls, control assembly lines, and perform mathematical calculations, the word did not seem quite so pejorative. But
only the fearful and superstitious imagined that machines could be creative or original or spontaneous; those qualities were opposite to mechanical, which meant automatic, determined, and routine. This concept now came in handy for philosophers. An example of an intellectual object that could be called mechanical was the algorithm: another new term for something that had always existed (a recipe, a set of instructions, a step-by-step procedure) but now demanded formal recognition. Babbage and Lovelace trafficked in algorithms without naming them. The twentieth century gave algorithms a central role—beginning here.
Turing was a fellow and a recent graduate at King’s College, Cambridge, when he presented his computable-numbers paper to his professor in 1936. The full title finished with a flourish in fancy German: it was “On Computable Numbers, with an Application to the Entscheidungsproblem.” The “decision problem” was a challenge that had been posed by David Hilbert at the 1928 International Congress of Mathematicians. As perhaps the most influential mathematician of his time, Hilbert, like Russell and Whitehead, believed fervently in the mission of rooting all mathematics in a solid logical foundation—“In der Mathematik gibt es kein Ignorabimus,” he declared. (“In mathematics there is no we will not know.”) Of course mathematics had many unsolved problems, some quite famous, such as Fermat’s Last Theorem and the Goldbach conjecture—statements that seemed true but had not been proved. Had not yet been proved, most people thought. There was an assumption, even a faith, that any mathematical truth would be provable, someday.
The Entscheidungsproblem was to find a rigorous step-by-step procedure by which, given a formal language of deductive reasoning, one could perform a proof automatically. This was Leibniz’s dream revived once again: the expression of all valid reasoning in mechanical rules. Hilbert posed it in the form of a question, but he was an optimist. He thought or hoped that he knew the answer. It was just then, at this watershed moment for mathematics and logic, that Gödel threw his incompleteness theorem into the works. In flavor, at least, Gödel’s result seemed a perfect antidote to Hilbert’s optimism, as it was to Russell’s. But Gödel actually left the Entscheidungsproblem unanswered. Hilbert had distinguished among three questions:
Is mathematics complete?
Is mathematics consistent?
Is mathematics decidable?
Gödel showed that mathematics could not be both complete and consistent but had not definitively answered the third question, at least not for all mathematics. Even though a particular closed system of formal logic must contain statements that could neither be proved nor disproved from within the system, it might conceivably be decided, as it were, by an outside referee—by external logic or rules.♦♦
Alan Turing, just twenty-two years old, unfamiliar with much of the relevant literature, so alone in his work habits that his professor worried about his becoming “a confirmed solitary,”♦ posed an entirely different question (it seemed): Are all numbers computable? This was an unexpected question to begin with, because hardly anyone had considered the idea of an uncomputable number. Most numbers that people work with, or think about, are computable by definition. The rational numbers are computable because they can be expressed as the quotient of two integers, a/b. The algebraic numbers are computable because they are solutions of polynomial equations. Famous numbers like Π and e are computable; people compute them all the time. Nonetheless Turing made the seemingly mild statement that numbers might exist that are somehow nameable, definable, and not computable.
What did this mean? He defined a computable number as one whose decimal expression can be calculated by finite means. “The justification,” he said, “lies in the fact that the human memory is necessarily limited.”♦ He also defined calculation as a mechanical procedure, an algorithm. Humans solve problems with intuition, imagination, flashes of insight—arguably nonmechanical calculation, or then again perhaps just computation whose steps are hidden. Turing needed to eliminate the ineffable. He asked, quite literally, what would a machine do? “According to my definition, a number is computable if its decimal can be written down by a machine.”
No actual machine offered a relevant model. “Computers” were, as ever, people. Nearly all the world’s computation was still performed through the act of writing marks on paper. Turing did have one information machine for a starting point: the typewriter. As an eleven-year-old sent to boarding school he had imagined inventing one. “You see,” he wrote to his parents, “the funny little rounds are letters cut out on one side slide along to the round along an ink pad and stamp down and make the letter, thats not nearly all though.”♦ Of course, a typewriter is not automatic; it is more a tool than a machine. It does not flow a stream of language onto the page; rather, the page shifts its position space by space under the hammer, where one character is laid down after another. With this model in mind, Turing imagined another kind of machine, of the utmost purity and simplicity. Being imaginary, it was unencumbered by the real-world details one would need for a blueprint, an engineering specification, or a patent application. Turing, like Babbage, meant his machine to compute numbers, but he had no need to worry about the limitations of iron and brass. Turing did not plan ever to build his machine.
He listed the very few items his machine must possess: tape, symbols, and states. Each of these required definition.
Tape is to the Turing machine what paper is to a typewriter. But where a typewriter uses two dimensions of its paper, the machine uses only one—thus, a tape, a long strip, divided into squares. “In elementary arithmetic the two-dimensional character of the paper is sometimes used,” he wrote. “But such a use is always avoidable, and I think that it will be agreed that the two-dimensional character of paper is no essential of computation.”♦ The tape is to be thought of as infinite: there is always more when needed. But just one square is “in the machine” at any given time. The tape (or the machine) can move left or right, to the next square.
Symbols can be written onto the tape, one per square. How many symbols could be used? This required some thought, especially to make sure the number was finite. Turing observed that words—in European languages, at least—behaved as individual symbols. Chinese, he said, “attempts to have an enumerable infinity of symbols.” Arabic numerals might also be considered infinite, if 17 and 999,999,999,999,999 were treated as single symbols, but he preferred to treat them as compound: “It is always possible to use sequences of symbols in the place of single symbols.” In fact, in keeping with the machine’s minimalist spirit, he favored the absolute minimum of two symbols: binary notation, zeroes and ones. Symbols were not only to be written but also read from the tape—“scanned” was the word Turing used. In reality, of course, no technology could yet scan symbols written on paper back into a machine, but there were equivalents: for example, punched cards, now used in tabulating machines. Turing specified one more limitation: the machine is “aware” (only the anthropomorphic word would do) of one symbol at a time—the one on the square that is in the machine.
States required more explaining. Turing used the word “configurations” and pointed out that these resembled “states of mind.” The machine has a few of these—some finite number. In any given state, the machine takes one or more actions depending on the current symbol. For example, in state a, the machine might move one square to the right if the current symbol is 1, or move one square to the left if the current symbol is 0, or print 1 if the current symbol is blank. In state b, the machine might erase the current symbol. In state c, if the symbol is 0 or 1, the machine might move to the right, and otherwise stop. After each action, the machine finishes in a new state, which might be the same or different. The various states used for a given calculation were stored in a table—how this was to be managed physically did not matter. The state table was, in effect, the machine’s set of instructions.
And this was all.
Turing was programming his machine, though he did not yet use that word. From the primitive actions—moving, printing, er
asing, changing state, and stopping—larger processes were built up, and these were used again and again: “copying down sequences of symbols, comparing sequences, erasing all symbols of a given form, etc.” The machine can see just one symbol at a time, but can in effect use parts of the tape to store information temporarily. As Turing put it, “Some of the symbols written down … are just rough notes ‘to assist the memory.’ ” The tape, unfurling to the horizon and beyond, provides an unbounded record. In this way all arithmetic lies within the machine’s grasp. Turing showed how to add a pair of numbers—that is, he wrote out the necessary table of states. He showed how to make the machine print out (endlessly) the binary representation of Π. He spent considerable time working out what the machine could do and how it would accomplish particular tasks. He demonstrated that this short list covers everything a person does in computing a number. No other knowledge or intuition is necessary. Anything computable can be computed by this machine.