The Information
Brains and electronic computers both use quantities of energy in performing their work of logic—“all of which is wasted and dissipated in heat,” to be carried away by the blood or by ventilating and cooling apparatus. But this is really beside the point, Wiener said. “Information is information, not matter or energy. No materialism which does not admit this can survive at the present day.”
Now came a time of excitement.
“We are again in one of those prodigious periods of scientific progress—in its own way like the pre-Socratic period,” declared the gnomic, white-bearded neurophysiologist Warren McCulloch to a meeting of British philosophers. He told them that listening to Wiener and von Neumann put him in mind of the debates of the ancients. A new physics of communication had been born, he said, and metaphysics would never be the same: “For the first time in the history of science we know how we know and hence are able to state it clearly.”♦ He offered them heresy: that the knower was a computing machine, the brain composed of relays, perhaps ten billion of them, each receiving signals from other relays and sending them onward. The signals are quantized: they either happen or do not happen. So once again the stuff of the world, he said, turns out to be the atoms of Democritus—“indivisibles—leasts—which go batting about in the void.”
It is a world for Heraclitus, always “on the move.” I do not mean merely that every relay is itself being momentarily destroyed and re-created like a flame, but I mean that its business is with information which pours into it over many channels, passes through it, eddies within it and emerges again to the world.
That these ideas were spilling across disciplinary borders was due in large part to McCulloch, a dynamo of eclecticism and cross-fertilization. Soon after the war he began organizing a series of conferences at the Beekman Hotel on Park Avenue in New York City, with money from the Josiah Macy Jr. Foundation, endowed in the nineteenth century by heirs of Nantucket whalers. A host of sciences were coming of age all at once—so-called social sciences, like anthropology and psychology, looking for new mathematical footing; medical offshoots with hybrid names, like neurophysiology; not-quite-sciences like psychoanalysis—and McCulloch invited experts in all these fields, as well as mathematics and electrical engineering. He instituted a Noah’s Ark rule, inviting two of each species so that speakers would always have someone present who could see through their jargon.♦ Among the core group were the already famous anthropologist Margaret Mead and her then-husband Gregory Bateson, the psychologists Lawrence K. Frank and Heinrich Klüver, and that formidable, sometimes rivalrous pair of mathematicians, Wiener and von Neumann.
Mead, recording the proceedings in a shorthand no one else could read, said she broke a tooth in the excitement of the first meeting and did not realize it till afterward. Wiener told them that all these sciences, the social sciences especially, were fundamentally the study of communication, and that their unifying idea was the message.♦ The meetings began with the unwieldy name of Conferences for Circular Causal and Feedback Mechanisms in Biological and Social Systems and then, in deference to Wiener, whose new fame they enjoyed, changed that to Conference on Cybernetics. Throughout the conferences, it became habitual to use the new, awkward, and slightly suspect term information theory. Some of the disciplines were more comfortable than others. It was far from clear where information belonged in their respective worldviews.
The meeting in 1950, on March 22 and 23, began self-consciously. “The subject and the group have provoked a tremendous amount of external interest,” said Ralph Gerard, a neuroscientist from the University of Chicago’s medical school, “almost to the extent of a national fad. They have prompted extensive articles in such well known scientific magazines as Time, News-Week, and Life.”♦ He was referring, among others, to Time’s cover story earlier that winter titled “The Thinking Machine” and featuring Wiener:
Professor Wiener is a stormy petrel (he looks more like a stormy puffin) of mathematics and adjacent territory.… The great new computers, cried Wiener with mingled alarm and triumph, are … harbingers of a whole new science of communication and control, which he promptly named “cybernetics.” The newest machines, Wiener pointed out, already have an extraordinary resemblance to the human brain, both in structure and function. So far, they have no senses or “effectors” (arms and legs), but why shouldn’t they have?
It was true, Gerard said, that his field was being profoundly affected by new ways of thought from communications engineering—helping them think of a nerve impulse not just as a “physical-chemical event” but as a sign or a signal. So it was helpful to take lessons from “calculating machines and communications systems,” but it was dangerous, too.
To say, as the public press says, that therefore these machines are brains, and that our brains are nothing but calculating machines, is presumptuous. One might as well say that the telescope is an eye or that a bulldozer is a muscle.♦
Wiener felt he had to respond. “I have not been able to prevent these reports,” he said, “but I have tried to make the publications exercise restraint. I still do not believe that the use of the word ‘thinking’ in them is entirely to be reprehended.”♦♦♦
Gerard’s main purpose was to talk about whether the brain, with its mysterious architecture of neurons, branching dendrite trees, and complex interconnections alive within a chemical soup, could properly be described as analog or digital.♦ Gregory Bateson instantly interrupted: he still found this distinction confusing. It was a basic question. Gerard owed his own understanding to “the expert tutelage that I have received here, primarily from John von Neumann”—who was sitting right there—but Gerard took a stab at it anyway. Analog is a slide rule, where number is represented as distance; digital is an abacus, where you either count a bead or you do not; there’s nothing in between. A rheostat—light dimmer—is analog; a wall switch that snaps on or off, digital. Brain waves and neural chemistry, said Gerard, are analog.
Discussion ensued. Von Neumann had plenty to say. He had lately been developing a “game theory,” which he viewed effectively as a mathematics of incomplete information. And he was taking the lead in designing an architecture for the new electronic computers. He wanted the more analog-minded of the group to think more abstractly—to recognize that digital processes take place in a messy, continuous world but are digital nonetheless. When a neuron snaps between two possible states—“the state of the nerve cell with no message in it and the state of the cell with a message in it”♦—the chemistry of this transition may have intermediate shadings, but for theoretical purposes the shadings may be ignored. In the brain, he suggested, just as in a computer made of vacuum tubes, “these discrete actions are in reality simulated on the background of continuous processes.” McCulloch had just put this neatly in a new paper called “Of Digital Computers Called Brains”: “In this world it seems best to handle even apparent continuities as some numbers of some little steps.”♦ Remaining quiet in the audience was the new man in the group, Claude Shannon.
The next speaker was J. C. R. Licklider, an expert on speech and sound from the new Psycho-Acoustic Laboratory at Harvard, known to everyone as Lick. He was another young scientist with his feet in two different worlds—part psychologist and part electrical engineer. Later that year he moved to MIT, where he established a new psychology department within the department of electrical engineering. He was working on an idea for quantizing speech—taking speech waves and reducing them to the smallest quantities that could be reproduced by a “flip-flop circuit,” a homemade gadget made from twenty-five dollars of vacuum tubes, resistors, and capacitors.♦ It was surprising—even to people used to the crackling and hissing of telephones—how far speech could be reduced and still remain intelligible. Shannon listened closely, not just because he knew about the relevant telephone engineering but because he had dealt with the issues in his secret war work on audio scrambling. Wiener perked up, too, in part because of a special interest in prosthetic hearing aids.
&nb
sp; When Licklider described some distortion as neither linear nor logarithmic but “halfway between,” Wiener interrupted.
“What does ‘halfway’ mean? X plus S over N?”
Licklider sighed. “Mathematicians are always doing that, taking me up on inexact statements.”♦ But he had no problem with the math and later offered an estimate for how much information—using Shannon’s new terminology—could be sent down a transmission line, given a certain bandwidth (5,000 cycles) and a certain signal-to-noise ratio (33 decibels), numbers that were realistic for commercial radio. “I think it appears that 100,000 bits of information can be transmitted through such a communication channel”—bits per second, he meant. That was a staggering number; by comparison, he calculated the rate of ordinary human speech this way: 10 phonemes per second, chosen from a vocabulary of 64 phonemes (26, “to make it easy”—the logarithm of the number of choices is 6), so a rate of 60 bits per second. “This assumes that the phonemes are all equally probable—”
“Yes!” interrupted Wiener.♦
“—and of course they are not.”
Wiener wondered whether anyone had tried a similar calculation for “compression for the eye,” for television. How much “real information” is necessary for intelligibility? Though he added, by the way: “I often wonder why people try to look at television.”
Margaret Mead had a different issue to raise. She did not want the group to forget that meaning can exist quite apart from phonemes and dictionary definitions. “If you talk about another kind of information,” she said, “if you are trying to communicate the fact that somebody is angry, what order of distortion might be introduced to take the anger out of a message that otherwise will carry exactly the same words?”♦
That evening Shannon took the floor. Never mind meaning, he said. He announced that, even though his topic was the redundancy of written English, he was not going to be interested in meaning at all.
He was talking about information as something transmitted from one point to another: “It might, for example, be a random sequence of digits, or it might be information for a guided missile or a television signal.”♦ What mattered was that he was going to represent the information source as a statistical process, generating messages with varying probabilities. He showed them the sample text strings he had used in The Mathematical Theory of Communication—which few of them had read—and described his “prediction experiment,” in which the subject guesses text letter by letter. He told them that English has a specific entropy, a quantity correlated with redundancy, and that he could use these experiments to compute the number. His listeners were fascinated—Wiener, in particular, thinking of his own “prediction theory.”
“My method has some parallelisms to this,” Wiener interrupted. “Excuse me for interrupting.”
There was a difference in emphasis between Shannon and Wiener. For Wiener, entropy was a measure of disorder; for Shannon, of uncertainty. Fundamentally, as they were realizing, these were the same. The more inherent order exists in a sample of English text—order in the form of statistical patterns, known consciously or unconsciously to speakers of the language—the more predictability there is, and in Shannon’s terms, the less information is conveyed by each subsequent letter. When the subject guesses the next letter with confidence, it is redundant, and the arrival of the letter contributes no new information. Information is surprise.
The others brimmed with questions about different languages, different prose styles, ideographic writing, and phonemes. One psychologist asked whether newspaper writing would look different, statistically, from the work of James Joyce. Leonard Savage, a statistician who worked with von Neumann, asked how Shannon chose a book for his test: at random?
“I just walked over to the shelf and chose one.”
“I wouldn’t call that random, would you?” said Savage. “There is a danger that the book might be about engineering.”♦ Shannon did not tell them that in point of fact it had been a detective novel.
Someone else wanted to know if Shannon could say whether baby talk would be more or less predictable than the speech of an adult.
“I think more predictable,” he replied, “if you are familiar with the baby.”
English is actually many different languages—as many, perhaps, as there are English speakers—each with different statistics. It also spawns artificial dialects: the language of symbolic logic, with its restricted and precise alphabet, and the language one questioner called “Airplanese,” employed by control towers and pilots. And language is in constant flux. Heinz von Foerster, a young physicist from Vienna and an early acolyte of Wittgenstein, wondered how the degree of redundancy in a language might change as the language evolved, and especially in the transition from oral to written culture.
Von Foerster, like Margaret Mead and others, felt uncomfortable with the notion of information without meaning. “I wanted to call the whole of what they called information theory signal theory,” he said later, “because information was not yet there. There were ‘beep beeps’ but that was all, no information. The moment one transforms that set of signals into other signals our brain can make an understanding of, then information is born—it’s not in the beeps.”♦ But he found himself thinking of the essence of language, its history in the mind and in the culture, in a new way. At first, he pointed out, no one is conscious of letters, or phonemes, as basic units of a language.
I’m thinking of the old Maya texts, the hieroglyphics of the Egyptians or the Sumerian tables of the first period. During the development of writing it takes some considerable time—or an accident—to recognize that a language can be split into smaller units than words, e.g., syllables or letters. I have the feeling that there is a feedback between writing and speaking.♦
The discussion changed his mind about the centrality of information. He added an epigrammatic note to his transcript of the eighth conference: “Information can be considered as order wrenched from disorder.”♦
Hard as Shannon tried to keep his listeners focused on his pure, meaning-free definition of information, this was a group that would not steer clear of semantic entanglements. They quickly grasped Shannon’s essential ideas, and they speculated far afield. “If we could agree to define as information anything which changes probabilities or reduces uncertainties,” remarked Alex Bavelas, a social psychologist, “changes in emotional security could be seen quite easily in this light.” What about gestures or facial expressions, pats on the back or winks across the table? As the psychologists absorbed this artificial way of thinking about signals and the brain, their whole discipline stood on the brink of a radical transformation.
Ralph Gerard, the neuroscientist, was reminded of a story. A stranger is at a party of people who know one another well. One says, “72,” and everyone laughs. Another says, “29,” and the party roars. The stranger asks what is going on.
His neighbor said, “We have many jokes and we have told them so often that now we just use a number.” The guest thought he’d try it, and after a few words said, “63.” The response was feeble. “What’s the matter, isn’t this a joke?”
“Oh, yes, that is one of our very best jokes, but you did not tell it well.”♦
The next year Shannon returned with a robot. It was not a very clever robot, nor lifelike in appearance, but it impressed the cybernetics group. It solved mazes. They called it Shannon’s rat.
He wheeled out a cabinet with a five-by-five grid on its top panel. Partitions could be placed around and between any of the twenty-five squares to make mazes in different configurations. A pin could be placed in any square to serve as the goal, and moving around the maze was a sensing rod driven by a pair of little motors, one for east-west and one for north-south. Under the hood lay an array of electrical relays, about seventy-five of them, interconnected, switching on and off to form the robot’s “memory.” Shannon flipped the switch to power it up.
“When the machine was turned off,” he said, “the relays
essentially forgot everything they knew, so that they are now starting afresh, with no knowledge of the maze.” His listeners were rapt. “You see the finger now exploring the maze, hunting for the goal. When it reaches the center of a square, the machine makes a new decision as to the next direction to try.”♦ When the rod hit a partition, the motors reversed and the relays recorded the event. The machine made each “decision” based on its previous “knowledge”—it was impossible to avoid these psychological words—according to a strategy Shannon had designed. It wandered about the space by trial and error, turning down blind alleys and bumping into walls. Finally, as they all watched, the rat found the goal, a bell rang, a lightbulb flashed on, and the motors stopped.
Then Shannon put the rat back at the starting point for a new run. This time it went directly to the goal without making any wrong turns or hitting any partitions. It had “learned.” Placed in other, unexplored parts of the maze, it would revert to trial and error until, eventually, “it builds up a complete pattern of information and is able to reach the goal directly from any point.”♦
To carry out the exploring and goal-seeking strategy, the machine had to store one piece of information for each square it visited: namely, the direction by which it last left the square. There were only four possibilities—north, west, south, east—so, as Shannon carefully explained, two relays were assigned as memory for each square. Two relays meant two bits of information, enough for a choice among four alternatives, because there were four possible states: off-off, off-on, on-off, and on-on.