It’s a flying finches, they are.

  The child seems sleeping.

  Is raining.

  Sally poured the glass with water.

  Who did a book about impress you?

  Skid crash hospital.

  Drum vapor worker cigarette flick boom.

  This sentence no verb.

  This sentence has contains two verbs.

  This sentence has cabbage six words.

  This is not a complete. This either.

  These sentences are “ungrammatical,” not in the sense of split infinitives, dangling participles, and the other hobgoblins of the schoolmarm, but in the sense that every ordinary speaker of the casual vernacular has a gut feeling that something is wrong with them, despite their interpretability. Ungrammaticality is simply a consequence of our having a fixed code for interpreting sentences. For some strings a meaning can be guessed, but we lack confidence that the speaker has used the same code in producing the sentence as we used in interpreting it. For similar reasons, computers, which are less forgiving of ungrammatical input than human listeners, express their displeasure in all-too-familiar dialogues like this one:

  > PRINT (x + 1

  * * * * *SYNTAX ERROR* * * * *

  The opposite can happen as well. Sentences can make no sense but can still be recognized as grammatical. The classic example is a sentence from Chomsky, his only entry in Bartlett’s Familiar Quotations:

  Colorless green ideas sleep furiously.

  The sentence was contrived to show that syntax and sense can be independent of each other, but the point was made long before Chomsky; the genre of nonsense verse and prose, popular in the nineteenth century, depends on it. Here is an example from Edward Lear, the acknowledged master of nonsense:

  It’s a fact the whole world knows,

  That Pobbles are happier without their toes.

  Mark Twain once parodied the romantic description of nature written more for its mellifluousness than its content:

  It was a crisp and spicy morning in early October. The lilacs and laburnums, lit with the glory-fires of autumn, hung burning and flashing in the upper air, a fairy bridge provided by kind Nature for the wingless wild things that have their homes in the tree-tops and would visit together; the larch and the pomegranate flung their purple and yellow flames in brilliant broad splashes along the slanting sweep of the woodland; the sensuous fragrance of innumerable deciduous flowers rose upon the swooning atmosphere; far in the empty sky a solitary esophagus slept upon motionless wing; everywhere brooded stillness, serenity, and the peace of God.

  And almost everyone knows the poem in Lewis Carroll’s Through the Looking-Glass that ends:

  And, as in uffish thought he stood,

  The Jabberwock, with eyes of flame,

  Came whiffling through the tulgey wood,

  And burbled as it came!

  One, two! One, two! And through and through

  The vorpal blade went snicker-snack!

  He left it dead, and with its head

  He went galumphing back.

  “And hast thou slain the Jabberwock?

  Come to my arms, my beamish boy!

  O frabjous day! Callooh! Callay!”

  He chortled in his joy.

  ’Twas brillig, and the slithy toves

  Did gyre and gimble in the wabe:

  All mimsy were the borogoves,

  And the mome raths outgrabe.

  As Alice said, “Somehow it seems to fill my head with ideas—only I don’t exactly know what they are!” But though common sense and common knowledge are of no help in understanding these passages, English speakers recognize that they are grammatical, and their mental rules allow them to extract precise, though abstract, frameworks of meaning. Alice deduced, “Somebody killed something that’s clear, at any rate—.” And after reading Chomsky’s entry in Bartlett’s, anyone can answer questions like “What slept? How? Did one thing sleep, or several? What kind of ideas were they?”

  How might the combinatorial grammar underlying human language work? The most straightforward way to combine words in order is explained in Michael Frayn’s novel The Tin Men. The protagonist, Goldwasser, is an engineer working at an institute for automation. He must devise a computer system that generates the standard kinds of stories found in the daily papers, like “Paralyzed Girl Determined to Dance Again.” Here he is hand-testing a program that composes stories about royal occasions:

  He opened the filing cabinet and picked out the first card in the set. Traditionally, it read. Now there was a random choice between cards reading coronations, engagements, funerals, weddings, comings of age, births, deaths, or the churching of women. The day before he had picked funerals, and been directed on to a card reading with simple perfection are occasions for mourning. Today he closed his eyes, drew weddings, and was signposted on to are occasions for rejoicing.

  The wedding of X and Y followed in logical sequence, and brought him a choice between is no exception and is a case in point. Either way there followed indeed. Indeed, whichever occasion one had started off with, whether coronations, deaths, or births, Goldwasser saw with intense mathematical pleasure, one now reached this same elegant bottleneck. He paused on indeed, then drew in quick succession it is a particularly happy occasion, rarely, and can there have been a more popular young couple.

  From the next selection, Goldwasser drew X has won himself/herself a special place in the nation’s affections, which forced him to go on to and the British people have cleverly taken Y to their hearts already.

  Goldwasser was surprised, and a little disturbed, to realise that the word “fitting” had still not come up. But he drew it with the next card—it is especially fitting that.

  This gave him the bride/bridegroom should be, and an open choice between of such a noble and illustrious line, a commoner in these democratic times, from a nation with which this country has long enjoyed a particularly close and cordial relationship, and from a nation with which this country’s relations have not in the past been always happy.

  Feeling that he had done particularly well with “fitting” last time, Goldwasser now deliberately selected it again. It is also fitting that, read the card, to be quickly followed by we should remember, and X and Y are not mere symbols—they are a lively young man and a very lovely young woman.

  Goldwasser shut his eyes to draw the next card. It turned out to read in these days when. He pondered whether to select it is fashionable to scoff at the traditional morality of marriage and family life or it is no longer fashionable to scoff at the traditional morality of marriage and family life. The latter had more of the form’s authentic baroque splendor, he decided.

  Let’s call this a word-chain device (the technical name is a “finite-state” or “Markov” model). A word-chain device is a bunch of lists of words (or prefabricated phrases) and a set of directors for going from list to list. A processor builds a sentence by selecting a word from one list, then a word from another list, and so on. (To recognize a sentence spoken by another person, one just checks the words against each list in order.) Word-chain systems are commonly used in satires like Frayn’s, usually as do-it-yourself recipes for composing examples of a kind of verbiage. For example, here is a Social Science Jargon Generator, which the reader may operate by picking a word at random from the first column, then a word from the second, then one from the third, and stringing them together to form an impressive-sounding term like inductive aggregating interdependence.

  dialectical

  defunctionalized

  positivistic

  predicative

  multilateral

  quantitative

  divergent

  synchronous

  differentiated

  inductive

  integrated

  distributive

  participatory

  degenerative

  aggregating

  appropriative

  simulated

  homogeneous
>
  transfigurative

  diversifying

  cooperative

  progressive

  complementary

  eliminative

  interdependence

  diffusion

  periodicity

  synthesis

  sufficiency

  equivalence

  expectancy

  plasticity

  epigenesis

  constructivism

  deformation

  solidification

  Recently I saw a word-chain device that generates breathless book jacket blurbs, and another for Bob Dylan song lyrics.

  A word-chain device is the simplest example of a discrete combinatorial system, since it is capable of creating an unlimited number of distinct combinations from a finite set of elements. Parodies notwithstanding, a word-chain device can generate infinite sets of grammatical English sentences. For example, the extremely simple scheme

  assembles many sentences, such as A girl eats ice cream and The happy dog eats candy. It can assemble an infinite number because of the loop at the top that can take the device from the happy list back to itself any number of times: The happy dog eats ice cream, The happy happy dog eats ice cream, and so on.

  When an engineer has to build a system to combine words in particular orders, a word-chain device is the first thing that comes to mind. The recorded voice that gives you a phone number when you dial directory assistance is a good example. A human speaker is recorded uttering the ten digits, each in seven different sing-song patterns (one for the first position in a phone number, one for the second position, and so on). With just these seventy recordings, ten million phone numbers can be assembled; with another thirty recordings for three-digit area codes, ten billion numbers are possible (in practice, many are never used because of restrictions like the absence of 0 and 1 from the beginning of a phone number). In fact there have been serious efforts to model the English language as a very large word chain. To make it as realistic as possible, the transitions from one word list to another can reflect the actual probabilities that those kinds of words follow one another in English (for example, the word that is much more likely to be followed by is than by indicates). Huge databases of these “transition probabilities” have been compiled by having a computer analyze bodies of English text or by asking volunteers to name the words that first come to mind after a given word or series of words. Some psychologists have suggested that human language is based on a huge word chain stored in the brain. The idea is congenial to stimulus-response theories: a stimulus elicits a spoken word as a response, then the speaker perceives his or her own response, which serves as the next stimulus, eliciting one out of several words as the next response, and so on.

  But the fact that word-chain devices seem ready-made for parodies like Frayn’s raises suspicions. The point of the various parodies is that the genre being satirized is so mindless and cliché-ridden that a simple mechanical method can churn out an unlimited number of examples that can almost pass for the real thing. The humor works because of the discrepancy between the two: we all assume that people, even sociologists and reporters, are not really word-chain devices; they only seem that way.

  The modern study of grammar began when Chomsky showed that word-chain devices are not just a bit suspicious; they are deeply, fundamentally, the wrong way to think about how human language works. They are discrete combinatorial systems, but they are the wrong kind. There are three problems, and each one illuminates some aspect of how language really does work.

  First, a sentence of English is a completely different thing from a string of words chained together according to the transition probabilities of English. Remember Chomsky’s sentence Colorless green ideas sleep furiously. He contrived it not only to show that nonsense can be grammatical but also to show that improbable word sequences can be grammatical. In English texts the probability that the word colorless is followed by the word green is surely zero. So is the probability that green is followed by ideas, ideas by sleep, and sleep by furiously. Nonetheless, the string is a well-formed sentence of English. Conversely, when one actually assembles word chains using probability tables, the resulting word strings are very far from being well-formed sentences. For example, say you take estimates of the set of words most likely to come after every four-word sequence, and use those estimates to grow a string word by word, always looking at the four most recent words to determine the next one. The string will be eerily Englishy, but not English, like House to ask for is to earn out living by working towards a goal for his team in old New-York was a wonderful place wasn’t it even pleasant to talk about and laugh hard when he tells lies he should not tell me the reason why you are is evident.

  The discrepancy between English sentences and Englishy word chains has two lessons. When people learn a language, they are learning how to put words in order, but not by recording which word follows which other word. They do it by recording which word category—noun, verb, and so on—follows which other category. That is, we can recognize colorless green ideas because it has the same order of adjectives and nouns that we learned from more familiar sequences like strapless black dresses. The second lesson is that the nouns and verbs and adjectives are not just hitched end to end in one long chain; there is some overarching blueprint or plan for the sentence that puts each word in a specific slot.

  If a word-chain device is designated with sufficient cleverness, it can deal with these problems. But Chomsky had a definitive refutation of the very idea that a human language is a word chain. He proved that certain sets of English sentences could not, even in principle, be produced by a word-chain device, no matter how big or how faithful to probability tables the device is. Consider sentences like the following:

  Either the girl eats ice cream, or the girl eats candy.

  If the girl eats ice cream, then the boy eats hot dogs.

  At first glance it seems easy to accommodate these sentences:

  But the device does not work. Either must be followed later in a sentence by or, no one says Either the girl eats ice cream, then the girl eats candy. Similarly, if requires then; no one says If the girl eats ice cream, or the girl likes candy. But to satisfy the desire of a word early in a sentence for some other word late in the sentence, the device has to remember the early word while it is churning out all the words in between. And that is the problem: a word-chain device is an amnesiac, remembering only which word list it has just chosen from, nothing earlier. By the time it reaches the or/then list, it has no means of remembering whether it said if or either way back at the beginning. From our vantage point, peering down at the entire road map, we can remember which choice the device made at the first fork in the road, but the device itself, creeping antlike from list to list, has no way of remembering.

  Now, you might think it would be a simple matter to redesign the device so that it does not have to remember early choices at late points in the sentence. For example, one could join up either and or and all the possible word sequences in between into one giant sequence, and if and then and all the sequences in between as a second giant sequence, before returning to a third copy of the sequence—yielding a chain so long I have to print it sideways (“Chapter 4”). There is something immediately disturbing about this solution: there are three identical subnetworks. Clearly, whatever people can say between an either and an or, they can say between an if and a then, and also after the or or the then. But this ability should come naturally out of the design of whatever the device is in people’s heads that allows them to speak. It shouldn’t depend on the designer’s carefully writing down three identical sets of instructions (or, more plausibly, on the child’s having to learn the structure of the English sentence three different times, once between if and then, once between either and or, and once after a then or an or).

  But Chomsky showed that the problem is even deeper. Each of these sentences can be embedded in any of the others, including itself:

  If either the gir
l eats ice cream or the girl eats candy, then the boy eats hot dogs.

  Either if the girl eats ice cream then the boy eats ice cream, or if the girl eats ice cream then the boy eats candy.

  For the first sentence, the device has to remember if and either so that it can continue later with or and then, in that order. For the second sentence, it has to remember either and if so that it can complete the sentence with then and or. And so on. Since there’s no limit in principle to the number of if’s and either’s that can begin a sentence, each requiring its own order of then’s and or’s to complete it, it does no good to spell out each memory sequence as its own chain of lists; you’d need an infinite number of chains, which won’t fit inside a finite brain.

  This argument may strike you as scholastic. No real person ever begins a sentence with Either either if either if if, so who cares whether a putative model of that person can complete it with then…then…or…then…or…or? But Chomsky was just adopting the esthetic of the mathematician, using the interaction between either-or and if-then as the simplest possible example of a property of language—its use of “long-distance dependencies” between an early word and a later one—to prove mathematically that word-chain devices cannot handle these dependencies.

  The dependencies, in fact, abound in languages, and mere mortals use them all the time, over long distances, often handling several at once—just what a word-chain device cannot do. For example, there is an old grammarian’s saw about how a sentence can end in five prepositions. Daddy trudges upstairs to Junior’s bedroom to read him a bedtime story. Junior spots the book, scowls, and asks, “Daddy, what did you bring that book that I don’t want to be read to out of up for?” By the point at which he utters read, Junior has committed himself to holding four dependencies in mind: to be read demands to, that book that requires out of, bring requires up, and what requires for. An even better, real-life example comes from a letter to TV Guide: