The Information
Solomonoff, Kolmogorov, and Chaitin tackled three different problems and came up with the same answer. Solomonoff was interested in inductive inference: given a sequence of observations, how can one make the best predictions about what will come next? Kolmogorov was looking for a mathematical definition of randomness: what does it mean to say that one sequence is more random than another, when they have the same probability of emerging from a series of coin flips? And Chaitin was trying to find a deep path into Gödel incompleteness by way of Turing and Shannon—as he said later, “putting Shannon’s information theory and Turing’s computability theory into a cocktail shaker and shaking vigorously.”♦ They all arrived at minimal program size. And they all ended up talking about complexity.
The following bitstream (or number) is not very complex, because it is rational:
D: 14285714285714285714285714285714285714285714285714…
It may be rephrased concisely as “PRINT 142857 AND REPEAT,” or even more concisely as “1/7.” If it is a message, the compression saves keystrokes. If it is an incoming stream of data, the observer may recognize a pattern, grow more and more confident, and settle on one-seventh as a theory for the data.
In contrast, this sequence contains a late surprise:
E: 10101010101010101010101010101010101010101010101013
The telegraph operator (or theorist, or compression algorithm) must pay attention to the whole message. Nonetheless, the extra information is minimal; the message can still be compressed, wherever pattern exists. We may say it contains a redundant part and an arbitrary part.
It was Shannon who first showed that anything nonrandom in a message allows compression:
F: 101101011110110110101110101110111101001110110100111101110
Heavy on ones, light on zeroes, this might be emitted by the flip of a biased coin. Huffman coding and other such algorithms exploit statistical regularities to compress the data. Photographs are compressible because of their subjects’ natural structure: light pixels and dark pixels come in clusters; statistically, nearby pixels are likely to be similar; distant pixels are not. Video is even more compressible, because the differences between one frame and the next are relatively slight, except when the subject is in fast and turbulent motion. Natural language is compressible because of redundancies and regularities of the kind Shannon analyzed. Only a wholly random sequence remains incompressible: nothing but one surprise after another.
Random sequences are “normal”—a term of art meaning that on average, in the long run, each digit appears exactly as often as the others, one time in ten; and each pair of digits, from 00 to 99, appears one time in a hundred; and each triplet likewise, and so on. No string of any particular length is more likely to appear than any other string of that length. Normality is one of those simple-seeming ideas that, when mathematicians look closely, turn out to be covered with thorns. Even though a truly random sequence must be normal, the reverse is not necessarily the case. A number can be statistically normal yet not random at all. David Champernowne, a young Cambridge friend of Turing’s, invented (or discovered) such a creature in 1933—a construction made of all the integers, chained together in order:
G: 12345678910111213141516171819202122232425262728293…
It is easy to see that each digit, and each combination of digits, occurs equally often in the long run. Yet the sequence could not be less random. It is rigidly structured and completely predictable. If you know where you are, you know what comes next.
Even apart from freaks like Champernowne’s, it turns out that normal numbers are difficult to recognize. In the universe of numbers, normality is the rule; mathematicians know for sure that almost all numbers are normal. The rational numbers are not normal, and there are infinitely many rational numbers, but they are infinitely outnumbered by normal numbers. Yet, having settled the great and general question, mathematicians can almost never prove that any particular number is normal. This in itself is one of the more remarkable oddities of mathematics.
Even Π retains some mysteries:
C: 3.1415926535897932384626433832795028841971693993751…
The world’s computers have spent many cycles analyzing the first trillion or so known decimal digits of this cosmic message, and as far as anyone can tell, they appear normal. No statistical features have been discovered—no biases or correlations, local or remote. It is a quintessentially nonrandom number that seems to behave randomly. Given the nth digit, there is no shortcut for guessing the nth plus one. Once again, the next bit is always a surprise.
How much information, then, is represented by this string of digits? Is it information rich, like a random number? Or information poor, like an ordered sequence?
The telegraph operator could, of course, save many keystrokes—infinitely many, in the long run—by simply sending the message “Π.” But this is a cheat. It presumes knowledge previously shared by the sender and the receiver. The sender has to recognize this special sequence to begin with, and then the receiver has to know what Π is, and how to look up its decimal expansion, or else how to compute it. In effect, they need to share a code book.
This does not mean, however, that Π contains a lot of information. The essential message can be sent in fewer keystrokes. The telegraph operator has several strategies available. For example, he could say, “Take 4, subtract 4/3, add 4/5, subtract 4/7, and so on.” The telegraph operator sends an algorithm, that is. This infinite series of fractions converges slowly upon Π, so the recipient has a lot of work to do, but the message itself is economical: the total information content is the same no matter how many decimal digits are required.
The issue of shared knowledge at the far ends of the line brings complications. Sometimes people like to frame this sort of problem—the problem of information content in messages—in terms of communicating with an alien life-form in a faraway galaxy. What could we tell them? What would we want to say? The laws of mathematics being universal, we tend to think that Π would be one message any intelligent race would recognize. Only, they could hardly be expected to know the Greek letter. Nor would they be likely to recognize the decimal digits “3.1415926535 …” unless they happened to have ten fingers.
The sender of a message can never fully know his recipient’s mental code book. Two lights in a window might mean nothing or might mean “The British come by sea.” Every poem is a message, different for every reader. There is a way to make the fuzziness of this line of thinking go away. Chaitin expressed it this way:
It is preferable to consider communication not with a distant friend but with a digital computer. The friend might have the wit to make inferences about numbers or to construct a series from partial information or from vague instructions. The computer does not have that capacity, and for our purposes that deficiency is an advantage. Instructions given the computer must be complete and explicit, and they must enable it to proceed step by step.♦
In other words: the message is an algorithm. The recipient is a machine; it has no creativity, no uncertainty, and no knowledge, except whatever “knowledge” is inherent in the machine’s structure. By the 1960s, digital computers were already getting their instructions in a form measured in bits, so it was natural to think about how much information was contained in any algorithm.
A different sort of message would be this:
Even to the eye this sequence of notes seems nonrandom. It happens that the message they represent is already making its way through interstellar space, 10 billion miles from its origin, at a tiny fraction of light speed. The message is not encoded in this print-based notation, nor in any digital form, but as microscopic waves in a single long groove winding in a spiral engraved on a disc twelve inches in diameter and one-fiftieth of an inch in thickness. The disc might have been vinyl, but in this case it was copper, plated with gold. This analog means of capturing, preserving, and reproducing sound was invented in 1877 by Thomas Edison, who called it phonography. It remained the most popular audio technology a hundred
years later—though not for much longer—and in 1977 a committee led by the astronomer Carl Sagan created a particular phonograph record and stowed copies in a pair of spacecraft named Voyager 1 and Voyager 2, each the size of a small automobile, launched that summer from Cape Canaveral, Florida.
So it is a message in an interstellar bottle. The message has no meaning, apart from its patterns, which is to say that it is abstract art: the first prelude of Johann Sebastian Bach’s Well-Tempered Clavier, as played on the piano by Glenn Gould. More generally, perhaps the meaning is “There is intelligent life here.” Besides the Bach prelude, the record includes music samples from several different cultures and a selection of earthly sounds: wind, surf, and thunder; spoken greetings in fifty-five languages; the voices of crickets, frogs, and whales; a ship’s horn, the clatter of a horse-drawn cart, and a tapping in Morse code. Along with the phonograph record are a cartridge and needle and a brief pictographic instruction manual. The committee did not bother with a phonograph player or a source of electrical power. Maybe the aliens will find a way to convert those analog metallic grooves into waves in whatever fluid serves as their atmosphere—or into some other suitable input for their alien senses.
THE “GOLDEN RECORD” STOWED ABOARD THE VOYAGER SPACECRAFT (Illustration credit 12.1)
Would they recognize the intricate patterned structure of the Bach prelude (say), as distinct from the less interesting, more random chatter of crickets? Would the sheet music convey a clearer message—the written notes containing, after all, the essence of Bach’s creation? And, more generally, what kind of knowledge would be needed at the far end of the line—what kind of code book—to decipher the message? An appreciation of counterpoint and voice leading? A sense of the tonal context and performance practices of the European Baroque? The sounds—the notes—come in groups; they form shapes, called melodies; they obey the rules of an implicit grammar. Does the music carry its own logic with it, independent of geography and history? On earth, meanwhile, within a few years, even before the Voyagers had sailed past the solar system’s edge, music was seldom recorded in analog form anymore. Better to store the sounds of the Well-Tempered Clavier as bits: the waveforms discretized without loss as per the Shannon sampling theorem, and the information preserved in dozens of plausible media.
In terms of bits, a Bach prelude might not seem to have much information at all. As penned by Bach on two manuscript pages, this one amounts to six hundred notes, characters in a small alphabet. As Glenn Gould played it on a piano in 1964—adding the performer’s layers of nuance and variation to the bare instructions—it lasts a minute and thirty-six seconds. The sound of that performance, recorded onto a CD, microscopic pits burned by a laser onto a slim disc of polycarbonate plastic, comprises 135 million bits. But this bitstream can be compressed considerably with no loss of information. Alternatively, the prelude fits on a small player-piano roll (descendant of Jacquard’s loom, predecessor of punched-card computing); encoded electronically with the MIDI protocol, it uses a few thousands bits. Even the basic six-hundred-character message has tremendous redundancy: unvarying tempo, uniform timbre, just a brief melodic pattern, a word, repeated over and over with slight variations till the final bars. It is famously, deceptively simple. The very repetition creates expectations and breaks them. Hardly anything happens, and everything is a surprise. “Immortal broken chords of radiantly white harmonies,” said Wanda Landowska. It is simple the way a Rembrandt drawing is simple. It does a lot with a little. Is it then rich in information? Certain music could be considered information poor. At one extreme John Cage’s composition titled 4′33″ contains no “notes” at all: just four minutes and thirty-three seconds of near silence, as the piece absorbs the ambient sounds around the still pianist—the listeners’ shifting in their seats, rustling clothes, breathing, sighing.
How much information in the Bach C-major Prelude? As a set of patterns, in time and frequency, it can be analyzed, traced, and understood, but only up to a point. In music, as in poetry, as in any art, perfect understanding is meant to remain elusive. If one could find the bottom it would be a bore.
In a way, then, the use of minimal program size to define complexity seems perfect—a fitting apogee for Shannon information theory. In another way it remains deeply unsatisfying. This is particularly so when turning to the big questions—one might say, the human questions—of art, of biology, of intelligence.
According to this measure, a million zeroes and a million coin tosses lie at opposite ends of the spectrum. The empty string is as simple as can be; the random string is maximally complex. The zeroes convey no information; coin tosses produce the most information possible. Yet these extremes have something in common. They are dull. They have no value. If either one were a message from another galaxy, we would attribute no intelligence to the sender. If they were music, they would be equally worthless.
Everything we care about lies somewhere in the middle, where pattern and randomness interlace.
Chaitin and a colleague, Charles H. Bennett, sometimes discussed these matters at IBM’s research center in Yorktown Heights, New York. Over a period of years, Bennett developed a new measure of value, which he called “logical depth.” Bennett’s idea of depth is connected to complexity but orthogonal to it. It is meant to capture the usefulness of a message, whatever usefulness might mean in any particular domain. “From the earliest days of information theory it has been appreciated that information per se is not a good measure of message value,”♦ he wrote, finally publishing his scheme in 1988.
A typical sequence of coin tosses has high information content but little value; an ephemeris, giving the positions of the moon and planets every day for a hundred years, has no more information than the equations of motion and initial conditions from which it was calculated, but saves its owner the effort of recalculating these positions.
The amount of work it takes to compute something had been mostly disregarded—set aside—in all the theorizing based on Turing machines, which work, after all, so ploddingly. Bennett brought it back. There is no logical depth in the parts of a message that are sheer randomness and unpredictability, nor is there logical depth in obvious redundancy—plain repetition and copying. Rather, he proposed, the value of a message lies in “what might be called its buried redundancy—parts predictable only with difficulty, things the receiver could in principle have figured out without being told, but only at considerable cost in money, time, or computation.” When we value an object’s complexity, or its information content, we are sensing a lengthy hidden computation. This might be true of music or a poem or a scientific theory or a crossword puzzle, which gives its solver pleasure when it is neither too cryptic nor too shallow, but somewhere in between.
Mathematicians and logicians had developed a tendency to think of information processing as free—not like pumping water or carrying stones. In our time, it certainly has gotten cheap. But it embodies work after all, and Bennett suggests that we recognize this work, reckon its expense in understanding complexity. “The more subtle something is, the harder it is to discover,” Bennett says. He applied the idea of logical depth to the problem of self-organization: the question of how complex structures develop in nature. Evolution starts with simple initial conditions; complexity arises, apparently building on itself. Whatever the basic processes involved, physical or biological, something is under way that begins to resemble computation.
* * *
♦ “Our definition of the quantity of information has the advantage that it refers to individual objects and not to objects treated as members of a set of objects with a probability distribution given on it. The probabilistic definition can be convincingly applied to the information contained, for example, in a stream of congratulatory telegrams. But it would not be clear how to apply it, for example, to an estimate of the quantity of information contained in a novel or in the translation of a novel into another language relative to the original.”
♦ 1729 = 13 + 12
3 = 93 + 103
♦ More precisely, it looked like this: “The finite binary sequence S with the first proof that S cannot be described by a Turing machine with n states or less” is a (log2 n+cF)–state description of S.
13 | INFORMATION IS PHYSICAL
(It from Bit)
The more energy, the faster the bits flip. Earth, air, fire, and water in the end are all made of energy, but the different forms they take are determined by information. To do anything requires energy. To specify what is done requires information.
—Seth Lloyd (2006)♦
QUANTUM MECHANICS HAS WEATHERED in its short history more crises, controversies, interpretations (the Copenhagen, the Bohm, the Many Worlds, the Many Minds), factional implosions, and general philosophical breast-beating than any other science. It is happily riddled with mysteries. It blithely disregards human intuition. Albert Einstein died unreconciled to its consequences, and Richard Feynman was not joking when he said no one understands it. Perhaps arguments about the nature of reality are to be expected; quantum physics, so uncannily successful in practice, deals in theory with the foundations of all things, and its own foundations are continually being rebuilt. Even so, the ferment sometimes seems more religious than scientific.