The Ghost in the Machine

Previous Page Next Page

The elementary speech souncls are called phonemes; they correspond roughly to the written alphabet; in English there are forty-flve of them. If listening to speech consisted in the chaining of separately perceived phonemes by the listener, he would literally not understand a word of what is said to him. Let me explain this paradox. If we were to translate the process of listening to speech from acoustical into optical terms, this would mean flashing onto a screen before the subject's eye printed letters one by one, at the rate of twenty letters per second. The result would be something like a nervous breakdown. The ear of the listener has to take in about twenty phonemes per second. If he tried to analyse each phoneme as a separate 'bit' -- or atom, or segment of language -- all he would perceive would be a steady buzz. I owe this illustration to Alvin Liberman of the Haskins Laboratories -- a pioneer in the field of speech-perception, and a participant in the Think-Tank seminar mentioned in the preface. He also commented wryly that if we go on labouring the point with the methods of the S-R theorist, 'we risk arriving at the conviction that human speech is an impossibility'.

The solution of the paradox becomes apparent when we revert from spoken to written language. When we read, we do not perceive the shape of one letter at a time (as in the screen-experiment just mentioned), but the patterns of one or several words at a time; the individual letters are perceived integrated into larger units. Similarly, when listening, we do not perceive separate phonemes in a serial order; perception combines them into higher units of approximately syllabic size. The speech sounds unite into patterns as musical sounds unite into melodies. But unlike the three-dimensional patterns perceived by the eye, speech and music form patterns in the single dimension of time -- which seems mysterious and baffling. We shall see, however, that the recognition of patterns in time is no more -- and no less -- baffling than the recognition of patterns in space, because the brain constantly transforms temporal sequences into spatial patterns and vice versa (page 81). If you look at a gramophone record through a magnifying glass, you only see a single, wavy spiral curve, which, however, contains in coded form the infinitely complex patterns produced by an orchestra of fifty instruments performing a symphony.

The airwaves which it sets in motion form, like the curve on the groove, a sequence with a single variable function -- the variation of pressure on the eardrum. But a single variable in time is sufficient to convey the most complex messages -- the Ninth Symphony or the Ancient Mariner -- provided there is a human brain to decode it, to retrieve the patterns hidden in the linear sequences of pressure waves. This is done by a series of operations, the nature of which is as yet little understood, but which can be represented as a multi-levelled hierarchy of processes. It has three main sub-divisions: the phonological, syntactic and semantic.

'What Did You Say?'

We may regard as the first step in decoding the spoken message -- the first step up the hierarchic tree -- the integration by the listener of phonemes into morphemes. Phonemes are just sounds; morphemes are the simplest meaningful units of language (short words, prefixes, suffixes, etc.); they form the next higher level of the hierarchy. Phonemes do not qualify as elementary units of language, first because they come in much too fast to be individually discriminated and recognised, but also for a second important reason: they are ambiguous. One and the same consonant sounds different, depending on the vowel which follows it, and vice versa, different consonants sometimes sound the same in front of the same vowel. Whether you hear 'big' or 'pig', 'map' or 'nap', depends, as the Haskins Laboratory experiments show [4], largely on the context. Thus the S-R chain theory breaks down even on the lowest level of speech, because the phonemic stimuli vary with the context, and can only be identified in the context. But as we move upward to higher levels of the hierarchy we again meet the same phenomenon: the 'response' to a syllable (its interpretation) depends on the word in which it occurs; and individual words occupy the same subordinate position relative to the sentence as phonemes relative to words. Their interpretation depends on the context, and must be referred to the next higher level in the hierarchy.

The late K. S. Lashley -- a Behaviourist turned renegade -- has given an amusing illustration of this:

Words stand in relation to the sentence as letters do to the word; the words themselves have no intrinsic temporal 'valence'. The word 'right', for example, is noun, adjective, adverb, and verb, and has four spellings and at least ten meanings. In such a sentence as 'the mill-wright on my right thinks it right that some conventional rite should symbolise the right of every man to write as he pleases', word arrangement is obviously not due to any direct associations of the word 'right' itself with other words, but to meanings which are determined by some broader relations. . . . Any theory of grammatical form which ascribes it to direct associative linkage of the words of the sentence overlooks the essential structure of speech. [5]

This is of course an extreme example of contrived ambiguity, but it makes its point with a vengeance against the S-R theorist who contends that speech sounds are 'like other bits of behaviour', and that language calls for no principles of explanation other than those employed in the operant conditioning of lower animals.

The ideal situation from the S-R theorist's point of view is a typist -- let's call her Miss Resp -- taking dictation from her boss, Mr. Stims. Here, one would think, we have a perfect example for a linear chain of sound stimuli controlling a string of key-pressing responses (Miss Resp being reinforced by Stims with the prospect of a salary). Since complex behaviour is supposedly the result of the chaining of simple S-R links, we must assume that each sound emitted by Stims will cause Miss Resp to type the corresponding letter (provided he dictates at the same speed at which she types, which is assumed). But we know of course that something quite different happens. Miss Resp waits expectantly, doing nothing, until at least half the sentence is completed, then, like a sprinter at the starter's shot, races ahead until she has caught up with Stims; then waits expectantly with an admiring expression on her face. The phenomenon is known to experimental psychologists as 'lagging behind'; it also occurs in Morse telegraphy and has been studied in great detail.* Miss Resp was lagging behind because she was mentally engaged in climbing the tree of language: first up, from sound level to word level to phrase level, then down again. The downward climb in the case of a skilled typist leads from 'phrase habit' through 'word habit' to 'letter habit'. The letter habits (hitting the correct key) are part of the word habits (a pre-set patterned sequence of movements triggered off as a single unit), which are part of the phrase habit (familiar turns of phrase which activate 'sweeps' of movements as integrated wholes). Although the performance is to a large extent as 'automatic' or 'mechanical' as any Behaviourist could wish for, it is nevertheless impossible to represent it as a linear chain of conditioned responses, because it is a multidimensional operation constantly oscillating between various levels, from the phonological to the semantic. No typist can be conditioned to take dictation in a language she does not know. It is this very complex knowledge, and not the chaining of simple S-R connections, which makes Miss Resp's fingers dance on the keyboard to Mr. Stim's reinforcing voice. And, oh wonder, she can even type a letter without dictation, for instance to her fiancé in Birmingham. In this case her behaviour is presumably controlled by S-R links which, like gravity, are capable of action-at-a-distance. * For a more detailed treatment see The Act of Creation, Chapter, 'Motor Skills', pp. 544.

The Postman and the Dog

So far I have touched on only a few of the difficulties of explaining how we convert pressure variations on the eardrum into ideas. Even more formidable is the problem how we convert ideas into air-pressure waves. Take a simple example: the farmer's little boy of about three, leaning out of the window, sees the dog snapping at the postman, and the postman retaliating with a vicious kick. All this happens in a flash, so fast that his vocal chords have not even had the time to get innervated; yet he knows quite clearly what happened and feels the urgent
need to communicate this as yet unverbalised event, image, idea, thought, or what-have-you, to his mum. So he bursts into the kitchen and shouts breathlessly: 'The postman kicked the dog.' Now the first remarkable fact about this is that he does not say, 'The dog kicked the postman', though he might say, 'Doggy was kicked by the postman'; and again, he will not say, 'Was the dog kicked by the postman?', and least of all, 'Dog the by was the kicked postman'.

This was an example of a very simple sentence consisting of four words only ('the' being used twice). Yet a change of the order of two words gave a totally different meaning; a more radical reshuffling, with two new words added, left the meaning unaltered; and most of the ninety-five possible permutations of the original words give no meaning at all. The problem is how a child ever learns the several thousand abstract rules and corollaries necessary to generate and comprehend meaningful sentences -- rules which his parents would be unable to name and define; which you and I are equally unable to define; and which nevertheless unfalteringly guide our speech. The few rules of grammar which the child learns at school long after it has learned to speak correctly -- and which it promptly forgets, are descriptive statements about language, not recipes to generate language. These recipes, or formulae, the child somehow discovers by intuitive processes -- probably not unlike the unconscious inferences which go into scientific discovery -- by the time it has reached the age of four. By that time 'he will have mastered very nearly the entire complex and abstract structure of the English language. In slightly more than two years, therefore [starting at about the age of two] children acquire full knowledge of the grammatical system of their native tongue. This stunning intellectual achievement is routinely performed by every pre-school child (McNeill [6])'. As another renegade Behaviourist, Professor James Jenkins, remarked at our Stanford seminar: 'The fact that we can freely produce sentences we had never heard before is amazing. The fact that we can understand them when produced is nothing short of miraculous. . . . A child never has a look at the machinery that produces English sentences. He could never have a look at that machinery. Nor is he being told about it since most speakers are completely unaware of it.'

The facts must indeed appear miraculous so long as we persist in confusing the string of words which is speech, with the silent machinery which generates speech. The difficulty is that the machinery is invisible, its working mostly unconscious, beyond the reach of inspection and introspection. But at least psycholinguistics has shown that the only conceivable model to represent the generation of a sentence does not work 'from left to right', but hierarchically, branching from the top downward.

The diagram below is a slightly modified version of Noam Chomsky's so-called 'phrase-structure generating grammar'.* This is about the simplest schema for generating a sentence.

* Chomsky did not claim that it shows how a sentence is actually produced, but observational analysis of how small children learn to speak (by Roger Brown [7], McNeill [8] and others) has confirmed that the model represents the basic principles involved.

At the apex of the inverted tree is /I/ -- it might be an Idea, a visual Image, the Intention of saying something -- which is not yet verbally articulated. Let us call this the /I/ stage.* Then the two main branches of the tree shoot out: the doer and his doing, which at the /I/ stage were still experienced as an indivisible unit, are split up into different speech categories: noun-phrase and verb-phrase.** This separation must be a tremendous feat of abstraction for the child -- how can you separate the cat from the grin, or the kick from the postman? -- yet it is a universal property of all known languages; and it is precisely with this feat of 'abstract thinking' that the child starts its adventures in language at a very early age -- in languages as different as Japanese and English. [9]

* Chomsky calls the apex S, standing for the whole sentence, which makes the model appear as a sentence-analysing, rather than a sentence-generating, model. ** The NP-VP division is more expressive and easier to handle than the related categories of subject and predicate.

The verb-phrase in its turn splits immediately into the doing and its object. Lastly, the noun, and the article which previously was somehow implied in the noun, are spelt out separately. Deciding at which point of the rapid, predominantly unconscious working of the machinery the actual words pop up and fall into their places on the moving conveyor belt of speech -- along the bottom line of the diagram -- is a delicate problem for the introspectionist. We all are familiar with the frustrating experience -- shared by semi-illiterates and professional writers alike -- of knowing what we want to say, but not knowing how to express it, searching for the right words that will exactly fit the empty spaces on the conveyor belt. The opposite phenomenon occurs when the message to be conveyed is very simple and can be put into a ready-made turn of phrase like 'How do you do?' or 'Don't mention it'. The living tree of language is weighed down heavily by these clichés, which hang from its branches like clusters of bananas that can be picked a whole bunch at a time. They are the Behaviourist's delight. In a famous speech, from which I have just quoted, Lashley said: 'A Behaviourist colleague once remarked to me that he had reached a stage where he could rise before an audience, turn his mouth loose, and go to sleep. He believed in the chain theory of language.' This, Lashley concluded ironically, 'clearly demonstrates the superiority of Behaviourist over introspective psychology'.

But classical introspectionism did not fare much better. Lashley went on to quote Titchener (the grand old man of introspective psychology at the turn of the century) who, describing the role of imagery (which might be visual or verbal), had written: 'When there is any difficulty in exposition, a point to be argued pro and con, I hear my own words just ahead of me.' [10] This may be a boon to the timid lecturer, but from the theoretical point of view it is not much help -- because the question how words arise in consciousness is merely pushed one step back, and thus becomes the question how world-images arise in consciousness.

Both answers -- the Behaviourist's and the introspectionist's -- avoid the bask issue of how thought is parcelled out into language, how the shapeless rocks of ideas are cunningly split into crystalline fragments of distinctive form, and put on the moving belt to be carried from left to right along the single dimension of time. The reverse operation is performed by the listener, who takes the string as his baseline to reconstruct the tree, converting sounds into patterns, words into phrases, and so on. When one listens to a speaker, the string of syllables itself hardly ever reaches consciousness; the words of the previous sentence, too, are rapidly effaced and only their meaning remains; the actual sentences suffer the same fate, and by the next day the twigs and branches of the tree have wilted away so that only the trunk survives -- a shadowy generalised schema. We can represent both processes diagrammatically, indicating how 'imagination bodies forth the forms of things unknown', and how the pen 'turns them to shapes, and gives to airy nothings a local habitation and a name'; and we can also go through the operation in reverse gear to show how the traces left by the pen lose their shape and revert to airy nothings. But while these diagrams yield reliable formulae and rules, they provide only a superficial kind of understanding of how a child attains mastery of language, and how adults convert thoughts into airwaves, and back.

A complete understanding of these phenomena will probably always elude our grasp because the operations which generate language include processes which cannot be expressed by language: the attempt to analyse speech leaves us speechless. To quote Wittgenstein: 'the thing which expresses itself in language, we cannot represent by language'.* This paradox is one of the many aspects of the mind-body problem, to which we shall return; for the moment let me merely point out that, in contrast to the rigid concept of the chain which drags the organism along its predetermined path, the dynamic concept of the growing tree implies an open-ended hierarchy. The meaning of 'openness' in this context will become evident as we go along.

* Was sich in der Sprache ausdrückt, können wir nicht durch sie au
sdrücken.

'What do you mean by that?'

Let me return for a moment to the ambiguity of language, which will provide a first example of 'open-endedness'.

There are different kinds of ambiguities on different levels of the hierarchy. On the lowest level, as we saw, is the purely acoustic ambiguity of phonemes, revealed by their sound-spectrograms (sounds transformed into visible patterns as on the sound-track of a film). They show that the transitions between /bay/, /day/ and /gay/ are continuous, like the colours of a rainbow, and that whether we hear /day/ or /gay/ depends mainly on the context.

On the next level we find, in addition to sound ambiguity, the subtler indeterminacies of the meaning of words, of which several types are shown in Lashley's mill-wright example. They can be put to deliberate use in the pun, in the play of words, in assonance and rhyme.

Previous Page Next Page