The Language Instinct: How the Mind Creates Language
In contrast, a great writer like Shaw can send a reader in a straight line from the first word of a sentence to the full stop, even if it is 110 words away.
A depth-first parser must use some criterion to pick one tree (or a small number) and run with it—ideally the tree most likely to be correct. One possibility is that the entirety of human intelligence is brought to bear on the problem, analyzing the sentence from the top down. According to this view, people would not bother to build any part of a tree if they could guess in advance that the meaning for that branch would not make sense in context. There has been a lot of debate among psycholinguists about whether this would be a sensible way for the human sentence parser to work. To the extent that a listener’s intelligence can actually predict a speaker’s intentions accurately, a top-down design would steer the parser toward correct sentence analyses. But the entirety of human intelligence is a lot of intelligence, and using it all at once may be too slow to allow for real-time parsing as the hurricane of words whizzes by. Jerry Fodor, quoting Hamlet, suggests that if knowledge and context had to guide sentence parsing, “the native hue of resolution would be sicklied o’er with the pale cast of thought.” He has suggested that the human parser is an encapsulated module that can look up information only in the mental grammar and the mental dictionary, not in the mental encyclopedia.
Ultimately the matter must be settled in the laboratory. The human parser does seem to use at least a bit of knowledge about what tends to happen in the world. In an experiment by the psychologists John Trueswell, Michael Tanenhaus, and Susan Garnsey, people bit on a bar to keep their heads perfectly still and read sentences on a computer screen while their eye movements were recorded. The sentences had potential garden paths in them. For example, read the sentence
The defendant examined by the lawyer turned out to be unreliable.
You may have been momentarily sidetracked at the word by, because up to that point the sentence could have been about the defendant’s examining something rather than his being examined. Indeed, the subjects’ eyes lingered on the word by and were likely to backtrack to reinterpret the beginning of the sentence (compared to unambiguous control sentences). But now read the following sentence:
The evidence examined by the lawyer turned out to be unreliable.
If garden paths can be avoided by common-sense knowledge, this sentence should be much easier. Evidence, unlike defendants, can’t examine anything, so the incorrect tree, in which the evidence would be examining something, is potentially avoidable. People do avoid it: the subjects’ eyes hopped through the sentence with little pausing or backtracking. Of course, the knowledge being applied is quite crude (defendants examine things; evidence doesn’t), and the tree that it calls for was fairly easy to find, compared with the dozens that a computer can find. So no one knows how much of a person’s general smarts can be applied to understanding sentences in real time; it is an active area of laboratory research.
Words themselves also provide some guidance. Recall that each verb makes demands of what else can go in the verb phrase (for example, you can’t just devour but have to devour something; you can’t dine something, you can only dine). The most common entry for a verb seems to pressure the mental parser to find the role players it wants. Trueswell and Tanenhaus watched their volunteers’ eyeballs as they read
The student forgot the solution was in the back of the book.
At the point of reaching was, the eyes lingered and then hopped back, because the people misinterpreted the sentence as being about a student forgetting the solution, period. Presumably, inside people’s heads the word forget was saying to the parser: “Find me an object, now!” Another sentence was
The student hoped the solution was in the back of the book.
With this one there was little problem, because the word hope was saying, instead, “Find me a sentence!” and a sentence was there to be found.
Words can also help by suggesting to the parser exactly which other words they tend to appear with inside a given kind of phrase. Though word-by-word transition probabilities are not enough to understand a sentence (Chapter 4), they could be helpful; a parser armed with good statistics, when deciding between two possible trees allowed by a grammar, can opt for the tree that was most likely to have been spoken. The human parser seems to be somewhat sensitive to word pair probabilities: many garden paths seem especially seductive because they contain common pairs like cotton clothing, fat people, and prime number. Whether or not the brain benefits from language statistics, computers certainly do. In laboratories at AT&T and IBM, computers have been tabulating millions of words of text from sources like the Wall Street Journal and Associated Press stories. Engineers are hoping that if they equip their parsers with the frequencies with which each word is used, and the frequencies with which sets of words hang around together, the parsers will resolve ambiguities sensibly.
Finally, people find their way through a sentence by favoring trees with certain shapes, a kind of mental topiary. One guideline is momentum: people like to pack new words into the current dangling phrase, instead of closing off the phrase and hopping up to add the words to a dangling phrase one branch up. This “late closure” strategy might explain why we travel the garden path in the sentence
Flip said that Squeaky will do the work yesterday.
The sentence is grammatical and sensible, but it takes a second look (or maybe even a third) to realize it. We are led astray because when we encounter the adverb yesterday, we try to pack it inside the currently open VP do the work, rather than closing off that VP and hanging the adverb upstairs, where it would go in the same phrase as Flip said. (Note, by the way, that our knowledge of what is plausible, like the fact that the meaning of will is incompatible with the meaning of yesterday, did not keep us from taking the garden path. This suggests that the power of general knowledge to guide sentence understanding is limited.) Here is an another example, though this time the psycholinguist responsible for it, Annie Senghas, did not contrive it as an example; one day she just blurted out, “The woman sitting next to Steven Tinker’s pants are like mine.” (Anne was pointing out that the woman sitting next to me had pants like hers.)
A second guideline is thrift: people to try to attach a phrase to a tree using as few branches as possible. This explains why we take the garden path in the sentence
Sherlock Holmes didn’t suspect the very beautiful young countess was a fraud.
It takes only one branch to attach the countess inside the VP, where Sherlock would suspect her, but two branches to attach her to an S that is itself attached to the VP, where he would suspect her of being a fraud:
The mental parser seems to go for the minimal attachment, though later in the sentence it proves to be incorrect.
Since most sentences are ambiguous, and since laws and contracts must be couched in sentences, the principles of parsing can make a big difference in people’s lives. Lawrence Solan discusses many examples in his recent book. Examine these passages, the first from an insurance contract, the second from a statute, the third from instructions to a jury:
Such insurance as is provided by this policy applies to the use of a non-owned vehicle by the named insured and any person responsible for use by the named insured provided such use is with the permission of the owner.
Every person who sells any controlled substance which is specified in subdivision (d) shall be punished…. (d) Any material, compound, mixture, or preparation which contains any quantity of the following substances having a potential for abuse associated with a stimulant effect on the central nervous system: Amphetamine; Methamphetamine…
The jurors must not be swayed by mere sentiment, conjecture, sympathy, passion, prejudice, public opinion or public feeling.
In the first case, a woman was distraught over being abandoned in a restaurant by her date, and drove off in what she thought was the date’s Cadillac, which she then totaled. It turned out to be someone else’s Cadillac, and she had to re
cover the money from her insurance company. Was she covered? A California appellate court said yes. The policy was ambiguous, they noted, because the requirement with the permission of the owner, which she obviously did not meet, could be construed as applying narrowly to any person responsible for use by the named insured, rather than to the named insured (that is, her) and any person responsible for use by the named insured.
In the second case, a drug dealer was trying to swindle a customer—unfortunately for him, an undercover narcotics agent—by selling him a bag of inert powder that had only a minuscule trace of methamphetamine. The substance had “a potential for abuse,” but the quantity of the substance did not. Did he break the law? The appellate court said he did.
In the third case, the defendant had been convicted of raping and murdering a fifteen-year-old-girl, and a jury imposed the death penalty. United States constitutional law forbids any instruction that would deny a defendant the right to have the jury consider any “sympathy factor” raised by the evidence, which in his case consisted of psychological problems and a harsh family background. Did the instructions unconstitutionally deprive the accused of sympathy, or did it deprive him only of the more trivial mere sympathy? The United States Supreme Court ruled 5–4 that he was denied only mere sympathy; that denial is constitutional.
Solan points out that the courts often resolve these cases by relying on “canons of construction” enshrined in the legal literature, which correspond to the principles of parsing I discussed in the preceding section. For example, the Last Antecedent Rule, which the courts used to resolve the first two cases, is simply the “minimal attachment” strategy that we just saw in the Sherlock sentence. The principles of mental parsing, then, literally have life-or-death consequences. But psycholinguists who are now worrying that their next experiment may send someone to the gas chamber can rest easy. Solan notes that judges are not very good linguists; for better or worse, they try to find a way around the most natural interpretation of a sentence if it would stand in the way of the outcome they feel is just.
I have been talking about trees, but a sentence is not just a tree. Since the early 1960s, when Chomsky proposed transformations that convert deep structures to surface structures, psychologists have used laboratory techniques to try to detect some kind of fingerprint of the transformation. After a few false alarms the search was abandoned, and for several decades the psychology textbooks dismissed transformations as having no “psychological reality.” But laboratory techniques have become more sophisticated, and the detection of something like a transformational operation in people’s minds and brains is one of the most interesting recent findings in the psychology of language.
Take the sentence
The policeman saw the boy that the crowd at the party accused (trace) of the crime.
Who was accused of a crime? The boy, of course, even though the words the boy do not occur after accused. According to Chomsky, that is because a phrase referring to the boy really does occur after accused in deep structure; it has been moved backwards to the position of that by a transformation, leaving behind a silent “trace.” A person trying to understand the sentence must undo the effect of the transformation and mentally put a copy of the phrase back in the position of the trace. To do so, the understander must first notice, while at the beginning of the sentence, that there is a moved phrase, the boy, that needs a home. The understander must hold the phrase in short-term memory until he or she discovers a gap: a position where a phrase should be but isn’t. In this sentence there is a gap after accused, because accused demands an object, but there isn’t one. The person can assume that the gap contains a trace and can then retrieve the phrase the boy from short-term memory and link it to the trace. Only then can the person figure out what role the boy played in the event—in this case, being accused.
Remarkably, every one of these mental processes can be measured. During the span of words between the moved phrase and the trace—the region I have underlined—people must hold the phrase in memory. The strain should be visible in poorer performance of any mental task carried out concurrently. And in fact, while people are reading that span, they detect extraneous signals (like a blip flashed on the screen) more slowly, and have more trouble keeping a list of extra words in memory. Even their EEC’s (electroencephalograms, or records of the brain’s electrical activity) show the effects of the strain.
Then, at the point at which the trace is discovered and the memory store can be emptied, the dumped phrase makes an appearance on the mental stage that can be detected in several ways. If an experimenter flashes a word from the moved phrase (for example, boy) at that point, people recognize it more quickly. They also recognize words related to the moved phrase—say, girl—more quickly. The effect is strong enough to be visible in brain waves: if interpreting the trace results in an implausible interpretation, as in
Which food did the children read (trace) in class?
the EEC’s show a boggle reaction at the point of the trace.
Connecting phrases with traces is a hairy computational operation. The parser, while holding the phrase in mind, must constantly be checking for the trace, an invisible and inaudible little nothing. There is no way of predicting how far down in the sentence the trace will appear, and sometimes it can be quite far down:
The girl wondered who John believed that Mary claimed that the baby saw (trace).
And until it is found, the semantic role of the phrase is a wild card, especially now that the who/whom distinction is going the way of the phonograph record.
I wonder who (trace) introduced John to Marsha. [ who = the introducer]
I wonder who Bruce introduced (trace) to Marsha. [who = the one being introduced]
I wonder who Bruce introduced John to (trace). [who = the target of the introduction]
This problem is so tough that good writers, and even the grammar of the language itself, take steps to make it easier. One principle for good style is to minimize the amount of intervening sentence in which a moved phrase must be held in memory (the underlined regions). This is a task that the English passive construction is good for (notwithstanding the recommendations of computerized “style-checkers” to avoid it across the board). In the following pair of sentences, the passive version is easier, because the memory-taxing region before the trace is shorter:
Reverse the clamp that the stainless steel hex-head bolt extending upward from the seatpost yoke holds (trace) in place.
Reverse the clamp that (trace) is held in place by the stainless steel hex-head bolt extending upward from the seatpost yoke.
And universally, grammars restrict the amount of tree that a phrase can move across. For example, one can say
That’s the guy that you heard the rumor about (trace).
But the following sentence is quite odd:
That’s the guy that you heard the rumor that Mary likes (trace).
Languages have “bounding” restrictions that turn some phrases, like the complex noun phrase the rumor that Mary likes him, into “islands” from which no words can escape. This is a boon to listeners, because the parser, knowing that the speaker could not have moved something out of such a phrase, can get away with not monitoring it for a trace. But the boon to listeners exerts a cost on speakers; for these sentences they have to resort to a clumsy extra pronoun, as in That’s the guy that you heard the rumor that Mary likes him.
Parsing, for all its importance, is only the first step in understanding a sentence. Imagine parsing the following real-life dialogue:
P: The grand jury thing has its, uh, uh, uh—view of this they might, uh. Suppose we have a grand jury proceeding. Would that, would that, what would that do to the Ervin thing? Would it go right ahead anyway?
D: Probably.
P: But then on that score, though, we have—let me just, uh, run by that, that—You do that on a grand jury, we could then have a much better cause in terms of saying, “Look, this is a grand jury, in which, uh, the prosecut
or—” How about a special prosecutor? We could use Petersen, or use another one. You see he is probably suspect. Would you call in another prosecutor?
D: I’d like to have Petersen on our side, advising us [laughs] frankly.
P: Frankly. Well, Petersen is honest. Is anybody about to be question him, are they?
D: No, no, but he’ll get a barrage when, uh, these Watergate hearings start.
P: Yes, but he can go up and say that he’s, he’s been told to go further in the Grand Jury and go in to this and that and the other thing. Call everybody in the White House. I want them to come, I want the, uh, uh, to go to the Grand Jury.
D: This may result—This may happen even without our calling for it when, uh, when these, uh—
P: Vesco?
D: No. Well, that’s one possibility. But also when these people go back before the Grand Jury here, they are going to pull all these criminal defendants back in before the Grand Jury and immunize them.
P: And immunize them: Why? Who? Are you going to—On what?
D: Uh, the U.S. Attorney’s Office will.
P: To do what?
D: To talk about anything further they want to talk about.
P: Yeah. What do they gain out of it?
D: Nothing.
P: To hell with them.
D: They, they’re going to stonewall it, uh, as it now stands. Except for Hunt. That’s why, that’s the leverage in his threat.