The Language Instinct: How the Mind Creates Language
The answer is a clear no. English (or any other language people speak) is hopelessly unsuited to serve as our internal medium of computation. Consider some of the problems.
The first is ambiguity. These headlines actually appeared in newspapers:
Child’s Stool Great for Use in Garden
Stud Tires Out
Stiff Opposition Expected to Casketless Funeral Plan
Drunk Gets Nine Months in Violin Case
Iraqi Head Seeks Arms
Queen Mary Having Bottom Scraped
Columnist Gets Urologist in Trouble with His Peers
Each headline contains a word that is ambiguous. But surely the thought underlying the word is not ambiguous; the writers of the headlines surely knew which of the two senses of the words stool, stud, and stiff they themselves had in mind. And if there can be two thoughts corresponding to one word, thoughts can’t be words.
The second problem with English is its lack of logical explicitness. Consider the following example, devised by the computer scientist Drew McDermott:
Ralph is an elephant.
Elephants live in Africa.
Elephants have tusks.
Our inference-making device, with some minor modifications to handle the English grammar of the sentences, would deduce “Ralph lives in Africa” and “Ralph has tusks.” This sounds fine but isn’t. Intelligent you, the reader, knows that the Africa that Ralph lives in is the same Africa that all the other elephants live in, but that Ralph’s tusks are his own. But the symbol-copier-creeper-sensor that is supposed to be a model of you doesn’t know that, because the distinction is nowhere to be found in any of the statements. If you object that this is just common sense, you would be right—but it’s common sense that we’re trying to account for, and English sentences do not embody the information that a processor needs to carry out common sense.
A third problem is called “co-reference.” Say you start talking about an individual by referring to him as the tall blond man with one black shoe. The second time you refer to him in the conversation you are likely to call him the man; the third time, just him. But the three expressions do not refer to three people or even to three ways of thinking about a single person; the second and third are just ways of saving breath. Something in the brain must treat them as the same thing; English isn’t doing it.
A fourth, related problem comes from those aspects of language that can only be interpreted in the context of a conversation or text—what linguists call “deixis.” Consider articles like a and the. What is the difference between killed a policeman and killed the policeman? Only that in the second sentence, it is assumed that some specific policeman was mentioned earlier or is salient in the context. Thus in isolation the two phrases are synonymous, but in the following contexts (the first from an actual newspaper article) their meanings are completely different:
A policeman’s 14-year-old son, apparently enraged after being disciplined for a bad grade, opened fire from his house, killing a policeman and wounding three people before he was shot dead.
A policeman’s 14-year-old son, apparently enraged after being disciplined for a bad grade, opened fire from his house, killing the policeman and wounding three people before he was shot dead.
Outside of a particular conversation or text, then, the words a and the are quite meaningless. They have no place in one’s permanent mental database. Other conversation-specific words like here, there, this, that, now, then, I, me, my, her, we, and you pose the same problems, as the following old joke illustrates:
First guy: I didn’t sleep with my wife before we were married, did you?
Second guy: I don’t know. What was her maiden name?
A fifth problem is synonymy. The sentences
Sam sprayed paint onto the wall.
Sam sprayed the wall with paint.
Paint was sprayed onto the wall by Sam.
The wall was sprayed with paint by Sam.
refer to the same event and therefore license many of the same inferences. For example, in all four cases, one may conclude that the wall has paint on it. But they are four distinct arrangements of words. You know that they mean the same thing, but no simple processor, crawling over them as marks, would know that. Something else that is not one of those arrangements of words must be representing the single event that you know is common to all four. For example, the event might be represented as something like
(Sam spray painti) cause (painti go to (on wall))
—which, assuming we don’t take the English words seriously, is not too far from one of the leading proposals about what mentalese looks like.
These examples (and there are many more) illustrate a single important point. The representations underlying thinking, on the one hand, and the sentences in a language, on the other, are in many ways at cross-purposes. Any particular thought in our head embraces a vast amount of information. But when it comes to communicating a thought to someone else, attention spans are short and mouths are slow. To get information into a listener’s head in a reasonable amount of time, a speaker can encode only a fraction of the message into words and must count on the listener to fill in the rest. But inside a single head, the demands are different. Air time is not a limited resource: different parts of the brain are connected to one another directly with thick cables that can transfer huge amounts of information quickly. Nothing can be left to the imagination, though, because the internal representations are the imagination.
We end up with the following picture. People do not think in English or Chinese or Apache; they think in a language of thought. This language of thought probably looks a bit like all these languages; presumably it has symbols for concepts, and arrangements of symbols that correspond to who did what to whom, as in the paint-spraying representation shown above. But compared with any given language, mentalese must be richer in some ways and simpler in others. It must be richer, for example, in that several concept symbols must correspond to a given English word like stool or stud. There must be extra paraphernalia that differentiate logically distinct kinds of concepts, like Ralph’s tusks versus tusks in general, and that link different symbols that refer to the same thing, like the tall blond man with one black shoe and the man. On the other hand, mentalese must be simpler than spoken languages; conversation-specific words and constructions (like a and the) are absent, and information about pronouncing words, or even ordering them, is unnecessary. Now, it could be that English speakers think in some kind of simplified and annotated quasi-English, with the design I have just described, and that Apache speakers think in a simplified and annotated quasi-Apache. But to get these languages of thought to subserve reasoning properly, they would have to look much more like each other than either one does to its spoken counterpart, and it is likely that they are the same: a universal mentalese.
Knowing a language, then, is knowing how to translate mentalese into strings of words and vice versa. People without a language would still have mentalese, and babies and many nonhuman animals presumably have simpler dialects. Indeed, if babies did not have a mentalese to translate to and from English, it is not clear how learning English could take place, or even what learning English would mean.
So where does all this leave Newspeak? Here are my predictions for the year 2050. First, since mental life goes on independently of particular languages, concepts of freedom and equality will be thinkable even if they are nameless. Second, since there are far more concepts than there are words, and listeners must always charitably fill in what the speaker leaves unsaid, existing words will quickly gain new senses, perhaps even regain their original senses. Third, since children are not content to reproduce any old input from adults but create a complex grammar that can go beyond it, they would creolize Newspeak into a natural language, possibly in a single generation. The twenty-first-century toddler may be Winston Smith’s revenge.
How Language Works
Journalists say that when a dog bites a man that is not news, but when a man
bites a dog that is news. This is the essence of the language instinct: language conveys news. The streams of words called “sentences” are not just memory prods, reminding you of man and man’s best friend and letting you fill in the rest; they tell you who in fact did what to whom. Thus we get more from most stretches of language than Woody Allen got from War and Peace, which he read in two hours after taking speed-reading lessons: “It was about some Russians.” Language allows us to know how octopuses make love and how to remove cherry stains and why Tad was heartbroken, and whether the Red Sox will win the World Series without a good relief pitcher and how to build an atom bomb in your basement and how Catherine the Great died, among other things.
When scientists see some apparent magic trick in nature, like bats homing in on insects in pitch blackness or salmon returning to breed in their natal stream, they look for the engineering principles behind it. For bats, the trick turned out to be sonar; for salmon, it was locking in to a faint scent trail. What is the trick behind the ability of Homo sapiens to convey that man bites dog?
In fact there is not one trick but two, and they are associated with the names of two European scholars who wrote in the nineteenth century. The first principle, articulated by the Swiss linguist Ferdinand de Saussure, is “the arbitrariness of the sign,” the wholly conventional pairing of a sound with a meaning. The word dog does not look like a dog, walk like a dog, or woof like a dog, but it means “dog” just the same. It does so because every English speaker has undergone an identical act of rote learning in childhood that links the sound to the meaning. For the price of this standardized memorization, the members of a language community receive an enormous benefit: the ability to convey a concept from mind to mind virtually instantaneously. Sometimes the gunshot marriage between sound and meaning can be amusing. As Richard Lederer points out in Crazy English, we drive on a parkway but park in a driveway, there is no ham in hamburger or bread in sweetbreads, and blueberries are blue but cranberries are not cran. But think about the “sane” alternative of depicting a concept so that receivers can apprehend the meaning in the form. The process is so challenging to the ingenuity, so comically unreliable, that we have made it into party games like Pictionary and charades.
The second trick behind the language instinct is captured in a phrase from Wilhelm Von Humboldt that presaged Chomsky: language “makes infinite use of finite media.” We know the difference between the forgettable Dog bites man and the newsworthy Man bites dog because of the order in which dog, man, and bites are combined. That is, we use a code to translate between orders of words and combinations of thoughts. That code, or set of rules, is called a generative grammar; as I have mentioned, it should not be confused with the pedagogical and stylistic grammars we encountered in school.
The principle underlying grammar is unusual in the natural world. A grammar is an example of a “discrete combinatorial system.” A finite number of discrete elements (in this case, words) are sampled, combined, and permuted to create larger structures (in this case, sentences) with properties that are quite distinct from those of their elements. For example, the meaning of Man bites dog is different from the meaning of any of the three words inside it, and different from the meaning of the same words combined in the reverse order. In a discrete combinatorial system like language, there can be an unlimited number of completely distinct combinations with an infinite range of properties. Another noteworthy discrete combinatorial system in the natural world is the genetic code in DNA, where four kinds of nucleotides are combined into sixty-four kinds of codons, and the codons can be strung into an unlimited number of different genes. Many biologists have capitalized on the close parallel between the principles of grammatical combination and the principles of genetic combination. In the technical language of genetics, sequences of DNA are said to contain “letters” and “punctuation”; may be “palindromic,” “meaningless,” or “synonymous”; are “transcribed” and “translated”; and are even stored in “libraries.” The immunologist Niels Jerne entitled his Nobel Prize address “The Generative Grammar of the Immune System.”
Most of the complicated systems we see in the world, in contrast, are blending systems, like geology, paint mixing, cooking, sound, light, and weather. In a blending system the properties of the combination lie between the properties of its elements, and the properties of the elements are lost in the average or mixture. For example, combining red paint and white paint results in pink paint. Thus the range of properties that can be found in a blending system are highly circumscribed, and the only way to differentiate large numbers of combinations is to discriminate tinier and tinier differences. It may not be a coincidence that the two systems in the universe that most impress us with their open-ended complex design—life and mind—are based on discrete combinatorial systems. Many biologists believe that if inheritance were not discrete, evolution as we know it could not have taken place.
The way language works, then, is that each person’s brain contains a lexicon of words and the concepts they stand for (a mental dictionary) and a set of rules that combine the words to convey relationships among concepts (a mental grammar). We will explore the world of words in the next chapter; this one is devoted to the design of grammar.
The fact that grammar is a discrete combinational system has two important consequences. The first is the sheer vastness of language. Go into the Library of Congress and pick a sentence at random from any volume, and chances are you would fail to find an exact repetition no matter how long you continued to search. Estimates of the number of sentences that an ordinary person is capable of producing are breathtaking. If a speaker is interrupted at a random point in a sentence, there are on average about ten different words that could be inserted at that point to continue the sentence in a grammatical and meaningful way. (At some points in a sentence, only one word can be inserted, and at others, there is a choice from among thousands; ten is the average.) Let’s assume that a person is capable of producing sentences up to twenty words long. Therefore the number of sentences that a speaker can deal with in principle is at least 1020 (a one with twenty zeros after it, or a hundred million trillion). At a rate of five seconds a sentence, a person would need a childhood of about a hundred trillion years (with no time for eating or sleeping) to memorize them all. In fact, a twenty-word limitation is far too severe. The following comprehensible sentence from George Bernard Shaw, for example, is 110 words long:
Stranger still, though Jacques-Dalcroze, like all these great teachers, is the completest of tyrants, knowing what is right and that he must and will have the lesson just so or else break his heart (not somebody else’s, observe), yet his school is so fascinating that every woman who sees it exclaims: “Oh why was I not taught like this!” and elderly gentlemen excitedly enroll themselves as students and distract classes of infants by their desperate endeavours to beat two in a bar with one hand and three with the other, and start off on earnest walks around the room, taking two steps backward whenever M. Dalcroze calls out “Hop!”
Indeed, if you put aside the fact that the days of our age are threescore and ten, each of us is capable of uttering an infinite number of different sentences. By the same logic that shows that there are an infinite number of integers—if you ever think you have the largest integer, just add I to it and you will have another—there must be an infinite number of sentences. The Guinness Book of World Records once claimed to recognize the longest English sentence: a 1,300-word stretch in William Faulkner’s novel Absalom, Absalom!, that begins:
They both bore it as though in deliberate flagellant exaltation…
I am tempted to achieve immortality by submitting the following record-breaker:
Faulkner wrote, “They both bore it as though in deliberate flagellant exaltation…”
But it would be only the proverbial fifteen minutes of fame, for soon I could be bested by:
Pinker wrote that Faulkner wrote, “They both bore it as though in deliberate flagellant exaltation…”
And that record, too, would fall when someone submitted:
Who cares that Pinker wrote that Faulkner wrote, “They both bore it as though in deliberate flagellant exaltation…”?
And so on, ad infinitum. The infinite use of finite media distinguishes the human brain from virtually all the artificial language devices we commonly come across, like pull-string dolls, cars that nag you to close the door, and cheery voice-mail instructions (“Press the pound key for more options”), all of which use a fixed list of prefabricated sentences.
The second consequence of the design of grammar is that it is a code that is autonomous from cognition. A grammar specifies how words may combine to express meanings; that specification is independent of the particular meanings we typically convey or expect others to convey to us. Thus we all sense that some strings of words that can be given common-sense interpretations do not conform to the grammatical code of English. Here are some strings that we can easily interpret but that we sense are not properly formed:
Welcome to Chinese Restaurant. Please try your Nice Chinese Food with Chopsticks: the traditional and typical of Chinese glorious history and cultual.