Seeing Voices
There was no linguistic attention, no scientific attention, given to Sign until the late 1950s when William Stokoe, a young medievalist and linguist, found his way to Gallaudet College. Stokoe thought he had come to teach Chaucer to the deaf; but he very soon perceived that he had been thrown, by good fortune or chance, into one of the world’s most extraordinary linguistic environments. Sign language, at this time, was not seen as a proper language, but as a sort of pantomime or gestural code, or perhaps a sort of broken English on the hands. It was Stokoe’s genius to see, and prove, that it was nothing of the sort; that it satisfied every linguistic criterion of a genuine language, in its lexicon and syntax, its capacity to generate an infinite number of propositions. In 1960 Stokoe published Sign Language Structure, and in 1965 (with his deaf colleagues Dorothy Casterline and Carl Croneberg) A Dictionary of American Sign Language. Stokoe was convinced that signs were not pictures, but complex abstract symbols with a complex inner structure. He was the first, then, to look for a structure, to analyze signs, to dissect them, to search for constituent parts. Very early he proposed that each sign had at least three independent parts—location, handshape, and movement (analogous to the phonemes of speech)—and that each part had a limited number of combinations.31 In Sign Language Structure he delineated nineteen different handshapes, twelve locations, twenty-four types of movements, and invented a notation for these—American Sign Language had never been written before.32 His Dictionary was equally original, for the signs were arranged not thematically (e.g., signs for food, signs for animals, etc.) but systematically, according to their parts, and organization, and principles of the language. It showed the lexical structure of the language—the linguistic interrelatedness of a basic three thousand sign “words.”
It required a quiet and immense self-confidence, even obstinacy, to pursue these studies, for almost everyone, hearing and deaf alike, at first regarded Stokoe’s notions as absurd or heretical; his books, when they came out, as worthless or nonsensical.33 This is often the way with works of genius. But within a very few years, because of Stokoe’s works, the entire climate of opinion had been changed, and a revolution—a double revolution—was under way: a scientific revolution, paying attention to sign language, and its cognitive and neural substrates, as no one had ever thought to do before; and a cultural and political revolution.
The Dictionary of American Sign Language listed three thousand root signs—which might seem to be an extremely limited vocabulary (compared, for instance, with the 600,000 words or so in the Oxford English Dictionary). And yet, manifestly, Sign is highly expressive, can express essentially anything that a spoken language can.34 Clearly other, additional principles are at work. The great investigator of these other principles—of all that can turn a lexicon into a language—has been Ursula Bellugi and her collaborators at the Salk Institute.
A lexicon embodies all sorts of concepts, but these remain isolated (at the level of “Me Tarzan, you Jane”) in the absence of a grammar. There has to be a formal system of rules, by which coherent utterances—sentences, propositions—can be generated. (This is not entirely obvious, an intuitive concept, for utterance itself seems so immediate, so seamless, so personal that one might not at first feel it contained, or required, a formal system of rules: this, surely, is one reason why it was native signers, above all, who felt their own language as “undecomposable,” and regarded Stokoe’s—and later, Bellugi’s—efforts with incredulity.)
The idea of such a formal system, a “generative grammar,” is itself not new. Humboldt spoke of every language as making “infinite use of finite means.” But it is only in the last thirty years that we have been given, by Noam Chomsky, an explicit account of “how these finite means are put to infinite use in particular languages”—and an exploration of “the deeper properties that define ‘human language’ in general.” These deeper properties Chomsky calls the “deep structure” of grammar; he sees them as an innate, species-specific characteristic in man, one that is latent in the nervous system until kindled by actual language use. Chomsky visualizes this “deep grammar” as consisting of a vast system of rules (“many hundreds of rules of different types”), containing a certain fixed general structure, which at times he sees as analogous to the visual cortex, which has all sorts of innate devices for ordering visual perception.35 We are, as yet, almost totally ignorant of the neural substrate for such a grammar—but that there is one, and its approximate location, is indicated by the fact that there are aphasias, including Sign aphasias, in which grammatical competence, and this only, is specifically impaired.36
A person who knows a specific language, in Chomsky’s formulation, is one who has control of “a grammar that generates … the infinite set of potential deep structures, maps them onto associated surface structures, and determines the semantic and phonetic interpretations of these abstract objects.”37 How does he get (or get control of) such a grammar? How can such a device be acquired by a two-year-old? A child who is certainly not taught grammar explicitly, and who is subject not to exemplary utterances—pieces of grammar—but to the most spontaneous, offhand (and seemingly uninformative) talk of his parents? (Of course, the language of the parents is not “uninformative,” but full of implicit grammar and innumerable, unconscious linguistic hints and adjustments to which the child unconsciously responds. But there is no conscious or explicit transmission of grammar.) It is this which so strikes Chomsky—how the child is able to arrive at so much from so little.38
We cannot avoid being struck by the enormous disparity between knowledge and experience, in the case of language, between the generative grammar that expresses the linguistic competence of the native speaker and the meager and degenerate data [to which he is exposed] on the basis of which he has constructed this grammar for himself.
The child, then, is not taught grammar; nor does he learn it; he constructs it from the “meager and degenerate data” at his disposal. And this would not be possible were the grammar, or its possibility, not already within him, in some latent form that is waiting to be actualized. There must be, as Chomsky puts it, “an innate structure that is rich enough to account for the disparity between experience and knowledge.”
This innate structure, this latent structure, is not fully developed at birth, nor is it too obvious at the age of eighteen months. But then, suddenly, and in the most dramatic way, the developing child becomes open to language, becomes able to construct a grammar from the utterances of his parents. He shows a spectacular ability, a genius for language, between the ages of twenty-one and thirty-six months (this period is the same in all neurologically normal human beings, deaf as well as hearing; it is somewhat delayed, along with other developmental landmarks, in the retarded), and then a diminishing capacity, which ends at childhood’s end (roughly at the age of twelve or thirteen).39 This is, in Lenneberg’s term, the “critical period” for acquiring a first language—the only period when the brain, from scratch, can actualize a complete grammar. The parents play an essential, but only facilitating, role here: language itself develops “from within” at the critical time, and all they do (in Humboldt’s words) is “provide the thread along which it will develop of its own accord.” The process is more like maturation than learning—the innate structure (which Chomsky sometimes calls a Language Acquisition Device, or LAD) grows organically, differentiates, matures, like an embryo.
Bellugi, speaking of her early works with Roger Brown, singles out the sense of this as constituting, for her, the central wonder of language; she refers to a joint paper describing the process of “induction of the latent structure” of sentences by the child, and its final sentence: “The very intricate simultaneous differentiation and integration that constitutes the evolution of the noun-phrase is more reminiscent of the biological development of an embryo than it is of the acquisition of a conditioned reflex.” The second wonder of her life as a linguist, she says, was to see that this marvelous organic structure—the intricate embryo of grammar—
could exist in a purely visual form, and that it did so in Sign.
Bellugi has, above all, studied the morphological processes of ASL—the ways in which a sign is changed to express different meanings through grammar and syntax. It was evident that the bare lexicon of the Dictionary of American Sign was only a first step—for a language is not just a lexicon or code. (Indian sign language, so-called, is a mere code—i.e., a collection or vocabulary of signs, the signs themselves having no internal structure and scarcely capable of being modified grammatically.) A genuine language is continually modulated by grammatical and syntactic devices of all sorts. There is an extraordinary richness of such devices in ASL, which serve to simplify the basic vocabulary hugely.
Thus there are numerous forms of LOOK-AT (“look-at-me,” “look-at-her,” “look-at-each-of-them,” etc.), all of which are formed in distinctive ways: for example, the sign LOOK-AT is made with one hand moving away from the signer; but when inflected to mean “look at each other” is made with both hands moving towards each other simultaneously. A remarkable number of inflections are available to denote durational aspects (fig. 1); thus LOOK-AT (a) may be inflected to mean “stare” (b), “look at incessantly” (c), “gaze” (d), “watch” (e), “look for a long time” (f), or “look again and again” (g)—and many other permutations, including combinations of the above. Then there are large numbers of derivational forms, the sign LOOK being varied in specific ways to mean “reminisce,” “sightsee,” “look forward to,” “prophesy,” “predict,” “anticipate,” “look around aimlessly,” “browse,” etc.
The face may also serve special, linguistic functions in Sign: thus (as Corina, Liddell, and others have shown) specific facial expressions, or, rather “behaviors,” may serve to mark syntactic constructions such as topics, relative clauses, and questions, or function as adverbs or quantifiers.40 Other parts of the body may also be involved. Any or all of this—this vast range of actual or potential inflections, spatial and kinetic—can converge upon the root signs, fuse with them, and modify them, compacting an enormous amount of information into the resulting signs.
Figure 1. The root sign LOOK-AT may be modified in many ways. These are some of the inflections for the temporal aspects of LOOK-AT; there are many others, for distinctions of degree, manner, number, etc. (Reprinted by permission [with change in notation] from The Signs of Language, E. S. Klima & U. Bellugi. Harvard University Press, 1979.)
It is the compression of these sign units, and the fact that all their modifications are spatial, that makes Sign, at the obvious and visible level, completely unlike any spoken language, and which, in part, prevented it from being seen as a language at all. But it is precisely this, along with its unique spatial syntax and grammar, which marks Sign as a true language—albeit a completely novel one, out of the evolutionary mainstream of all spoken languages, a unique evolutionary alternative. (And, in a way, a completely surprising one, considering we have become specialized for speech in the last half million or two million years. The potentials for language are in us all—this is easy to understand. But that the potentials for a visual language mode should also be so great—this is astonishing, and would hardly be anticipated if visual language did not actually occur. But, equally, it might be said that making signs and gestures, albeit without complex linguistic structure, goes back to our remote, prehuman past—and that speech is really the evolutionary newcomer; a highly successful newcomer which could replace the hands, freeing them for other, non-communicational purposes. Perhaps, indeed, there have been two parallel evolutionary streams for spoken and signed forms of language: this is suggested by the work of certain anthropologists, who have shown the coexistence of spoken and signed languages in some primitive tribes.41 Thus the deaf, and their language, show us not only the plasticity but the latent potentials of the nervous system.)
The single most remarkable feature of Sign—that which distinguishes it from all other languages and mental activities—is its unique linguistic use of space.42 The complexity of this linguistic space is quite overwhelming for the “normal” eye, which cannot see, let alone understand, the sheer intricacy of its spatial patterns.
We see then, in Sign, at every level—lexical, grammatical, syntactic—a linguistic use of space: a use that is amazingly complex, for much of what occurs linearly, sequentially, temporally in speech, becomes simultaneous, concurrent, multileveled in Sign. The “surface” of Sign may appear simple to the eye, like that of gesture or mime, but one soon finds that this is an illusion, and what looks so simple is extraordinarily complex and consists of innumerable spatial patterns nested, three-dimensionally, in each other.43
The marvel of this spatial grammar, of the linguistic use of space, engrossed Sign researchers in the 1970s, and it is only in the present decade that equal attention has been paid to time. Although it was recognized earlier that there was sequential organization within signs, this was regarded as phonologically unimportant, basically because it could not be “read.” It has required the insights of a new generation of linguists—linguists who are themselves often deaf, or native users of Sign, who can analyze its refinements from their own experience of it, from “within”—to bring out the importance of such sequences within (and between) signs. The Supalla brothers, Ted and Sam, among others, have been pioneers here. Thus, in a groundbreaking 1978 paper, Ted Supalla and Elissa Newport demonstrated that very finely detailed differences in movement could distinguish some nouns from related verbs: it had been thought earlier (for example, by Stokoe) that there was a single sign for “sit” and “chair”—but Supalla and Newport showed the signs for these were subtly but crucially separate.
The most systematic research on the use of time in Sign has been done by Scott Liddell and Robert Johnson and their colleagues at Gallaudet. Liddell and Johnson see signing not as a succession of instantaneous “frozen” configurations in space, but as continually and richly modulated in time, with a dynamism of “movements” and “holds” analogous to that of music or speech. They have demonstrated many types of sequentiality in ASL signing—sequences of handshapes, locations, non-manual signs, local movements, movements-and-holds—as well as internal (phonological) segmentation within signs. The simultaneous model of structure is not able to represent such sequences, and may indeed prevent their being seen. Thus it has been necessary to replace the older static notions and descriptions with new, and often very elaborate, dynamic notations, which have some resemblances to the notations for dance and music.44
No one has watched these new developments with more interest than Stokoe himself, and he has focused specifically on the powers of “language in four dimensions”:45
Speech has only one dimension—its extension in time; writing has two dimensions; models have three; but only signed languages have at their disposal four dimensions—the three spatial dimensions accessible to a signer’s body, as well as the dimension of time. And Sign fully exploits the syntactic possibilities in its four-dimensional channel of expression.
The effect of this, Stokoe feels—and here he is supported by the intuitions of Sign artists, playwrights, and actors—is that signed language is not merely proselike and narrative in structure, but essentially “cinematic” too:
In a signed language … narrative is no longer linear and prosaic. Instead, the essence of sign language is to cut from a normal view to a close-up to a distant shot to a close-up again, and so on, even including flashback and flash-forward scenes, exactly as a movie editor works.… Not only is signing itself arranged more like edited film than like written narration, but also each signer is placed very much as a camera: the field of vision and angle of view are directed but variable. Not only the signer signing but also the signer watching is aware at all times of the signer’s visual orientation to what is being signed about.
Thus, in this third decade of research, Sign is seen as fully comparable to speech (in terms of its phonology, its temporal aspects, its streams and sequences), but with unique,
additional powers of a spatial and cinematic sort—at once a most complex and yet transparent expression and transformation of thought.46
The cracking of this enormously complex, four-dimensional structure may need the most formidable hardware, as well as an insight approaching genius.47 And yet it can also be cracked, effortlessly, unconsciously, by a three-year-old signer.48
What goes on in the mind and brain of a three-year-old signer, or any signer, that makes him such a genius at Sign, makes him able to use space, to “linguisticize” space, in this astonishing way? What sort of hardware does he have in his head? One would not think, from the “normal” experience of speech and speaking, or from the neurologist’s understanding of speech and speaking, that such spatial virtuosity could occur. It may indeed not be possible for the “normal” brain—i.e., the brain of someone who has not been exposed early to Sign.49 What then is the neurological basis of Sign?
Having spent the 1970s exploring the structure of sign languages, Ursula Bellugi and her colleagues are now examining its neural substrates. This involves, among other methods, the classical method of neurology, which is to analyze the effects produced by various lesions of the brain—the effect, here, on sign language and on spatial processing generally, as these may be observed in deaf signers with strokes or other lesions.
Figure 2. Computer-generated images showing three different grammatical inflections of the sign LOOK. The beauty of a spatial grammar, with its complex three-dimensional trajectories, is well brought out by this technique. (Reprinted by permission from Ursula Bellugi. The Salk Institute for Biological Studies, La Jolla, California.)