Page 64 of The Act of Creation


  Klangbild and Wortschatz

  A similar frontier is found in audition, where perception of sound patterns turns into interpretation of language. Here one face of the entities which pass the frontier is a Klangbild, the other belongs to the Wortschatz -- a distinction between 'sound-picture' and 'word-treasure' (i.e. vocabulary) -- which Wernicke made in 1874. [32]

  There is considerable doubt whether the discrete elements or 'segments' of speech are phonemes, syllables, or even larger units. Let us assume for argument's sake that the segments are characteristic vowel-consonant combinations -- digrams or trigrams* and call these the perceptual units (the 'sound-pictures' of speech). In whatever way you define your unit, when it comes to the transition from perceiving the sound-picture to interpreting its speech-value, a considerable degree of ambiguity creeps in. Thus the speech-value of a vowel (the o-ness of an o) is independent of the frequency of its fundamental, and depends on the characteristic frequency-ranges of its two formants (its dominant partials). But these formant-ranges overlap; and accordingly 'a sound with a particular spectrum will be recognized as /î/ on one occasion and /ê/ on another'. [33] Most consonants, on the other hand, vary their pitch according to the vowel with which they are associated, and are characterized not only by pitch but also by the change, and rate of change of pitch. [34] Thus the identification of language units depends to a considerable extent on their meaning-context; experimental subjects confuse m and n in nonsense syllables more frequently than when listening to meaningful speech; and the ambiguity of the input can only be resolved with reference to the preceding and following inputs in the psychological present. In discussing the practical feasibility of robots for translating speech into typescript, Fry and Denes concluded: 'It is unlikely that the mechanical speech-recognizer will be successful without the use of some form of linguistic information.' [35] It is the same as with ambiguous visual stimuli, whether they are riddles of the face-hidden-in-the-tree kind, or Frauenhofer-lines in the spectroscope. 'Es hört doch jeder nur was er versteht', Goethe noted in his Maximen.

  If scanning is an aid to vision, articulation is an aid to hearing. When we try to remember a tune, we hum it. The decisive factor in the emergence of human speech was not the development of the ear, but of the vocal organs and of the speech area in the motor cortex. The multiple feedbacks of auditory-vocal co-ordination exceed even those of oculo-motor co-operation. The child learns words by articulating them; adults learning a foreign language follow a similar procedure. Reading is more often accompanied by sub-vocal articulation than by images in the ear (except if you know intimately the author of what you are reading). The analysis of speech-sounds by matching them against innervation-patterns of the vocal tracts is a much simpler procedure than the acoustic analysis of the ambiguous sound spectra. However complex and variable the wave form of a vowel which reaches the ear, its identity as a language unit depends on its two formants, which in turn depend only on the resonance effects produced by the alterations of shape of two vocal cavities, mouth and pharynx. Paget [36] proposed that 'in recognizing speech sounds the human ear is . . . listening . . . to indications, due to resonance, of the position and gestures of the organs of articulation'. More recently a team of American experimentalists in the Haskins Laboratories have come to the same conclusion that 'speech is perceived by reference to articulation -- that is, that the articulatory movements and their sensory [proprioceptive] effects mediate between the acoustic stimulus and the event we call perception'. [37] Lastly, Lawrence (1959) has described a method of speech-analysis which specifies such details as the frequencies of resonance of the vocal tract and the vibration frequencies of the vocal chords -- a method of analysis 'which preserves all perceptually valuable features, but is vastly simpler than the acoustic wave form. From an information theory point of view it is a tremendous reduction in the bit rate -- it is a reduction of the order of thirty to one. It may well be that speech is held in the short-term memory in a form like this.' [38]

  Once again we find confirmed that perception is 'something the organism does, not something which happens to the organism' [39]; that responses enter at every level of the hierarchy into the processing of stimuli; and that motor activities intervene to analyse the input long before it has achieved its full status as a 'stimulus' -- before it has, for instance, become a meaningful word capable of stimulating the central process which is to mediate the 'response'. As Drever, Jr., has so nicely put it: 'Associationist learning theory, where it has tried to hold to a strict S --> R pattern, appears to be lapsing into an esoteric scholasticism. Where it has abandoned S --> R in favour of S --> X --> R, there are complaints that it is struggling to say things which must be said, but doing so in a language which is no longer appropriate. [40]

  Perceptual and Conceptual Abstraction

  One last example of a frontier where perceptual organization can do no more for you, and symbolic thought has to take over.

  When a number of objects is projected by lantern slide on a screen just long enough to be fully seen but not long enough to be counted, few people can correctly tell how many objects they have seen if the number exceeds seven; and many reach their limit of 'number perception' at five. [41] Surprisingly enough, the remarkable experiments of Otto Koehler revealed that pigeons, jackdaws, paroquets, and budgerigars can do as well, and that specially gifted jackdaws have a 'number sense' with the upper limit eight -- just as the most gifted humans.

  The experimental procedure consists, briefly, in training birds to open that box among several other boxes whose lid shows the same number of spots as the number of objects shown to the bird on a cue card. The sizes and spatial arrangements of the spots on the lid and of the objects on the card are not related in any way; and the rigorous experimental conditions and controls seem to have established beyond doubt that birds have a 'prelinguistic number sense'; that they 'are able to abstract the concept of numerical identity from groups of up to seven objects of totally different and unfamiliar appearance'. [42] Among mammals, squirrels have been shown to have the same ability. [43] The evidence suggests 'that men and animals may have a prelinguistic counting ability of about the same degree, but that man's superiority in dealing with numbers lies in his ability to use, as symbols for numbers, words and figures which have not the same, or indeed any, numerical attributes.' [44] The symbolic coding of 'number Gestalten' seems indeed a decisive step towards the formation of cardinal numbers; I shall return to the subject later (Chapter XV).

  Sound-pictures, printed letters of the alphabet, number-configurations, are all complex perceptual wholes, and at the same time elementary parts of symbolic thought: one might call them (to change the metaphor) 'amphibian' entities. They signal the transition, in mental evolution, from the 'aquatic' world of perception which keeps the organism submerged in a fluid environment of sounds, shapes, and smells, to the dry land of conceptualized thinking. The highest forms of purely perceptual abstraction on the pre-verbal level are like bubbles of air which aquatic creatures extract from the water; conceptualized thought is dry and inexhaustible, like the atmosphere.

  This is not meant of course to belittle the formidable powers of perceptual abstraction found in some animals. The innate (or imprinted) releasive mechanisms, for instance, may be regarded as phylogenetically acquired skill, which enable the animal to combine the colour, shape, and movement of the stimulus-pattern into a single 'constancy'. The rat learns to make a 'mental map' of the maze in its head (Chapter XII); and it has always been a mystery to me how my dog recognizes another dog on the opposite sidewalk at sight without using his sense of smell -- for the typical reactions of staring, straining at the leash, whining, occurs at the very instant of catching sight. The other dog may be a miniature Peke, a dolled-up Poodle, or a Great Dane; how does my dog identify that apparition as a kinsman -- how did he abstract the universal 'Dog'? Perhaps at a distance he merely reacts to four legs and one or two other Gestalt characteristics common to all canines, which account f
or their 'dogginess' -- though we would be at a loss to define them.

  Generalization, Discrimination, and Association

  We have discussed various forms of 'filtering' codes, both innate and acquired, which de-particularize or strip the input for purposes of recognition and storage according to the criteria of relevance in a given hierarchy. The incoming pattern is thus subjected to 'generalization' and discrimination at the same time; the two are compkmentary aspects of the same process. (The word 'generalization' is often used in two different senses: (a) extracting invariant features from a variety of experiences, (b) the 'spreading' of responses. I am using it in the first sense.)

  Native equipment and early learning provide the basic foundations on which the different hierarchies are built, designed to filter out more and more sharply defined features. The coarse-meshed 'perceptual sieves' of the tyro acquire fine-meshed sub-analysers: perceptual learning progresses 'from the seeing of gross differences to the seeing of fine differences' (pp. 490 f.). All connoisseurship -- from the chicken-sexer's to the handwriting expert's, from the wine-taster's to the art historian's, depends on the hierarchic build-up of analysing, matching, scanning codes which extract subtle similarities and make precise discriminations.

  This leads to the hoary problem of the nature of 'similarity'. The simplest answer would be to eliminate it altogether from the vocabulary of psychology and to substitute 'equipotentiality' for it. Two percepts are equipotential if both can pass a given filter in a given hierarchy -- if they satisfy its criteria according to the rules of the game that is played at the time; in other words, if 'for one intent and purpose' (but not 'for all intents and purposes') they are the same thing. Sultan discovered the 'similarity' between a branch on the castor-oil bush and a stick because they were equipotential for his purpose. A paperclip is 'similar' to a hair-pin when I have to mend a blown fuse. The answer to the old classroom question whether a red circle is more similar to a green circle than to a red triangle, depends on whether I am teaching geometry or colour-theory. In the first case, the two circles are for my purpose 'the same thing'; in the second, the two colours are 'the same thing.'*

  The width of the span within which two stimuli are perceived as 'the same thing' depends on the precision of the analyser -- the gauge of the sieve through which they must pass. To talk of the 'spreading' of response ('generalization' in sense (b)) is confusing; the equipotentiality of circle and ellipse to the naïve Pavlov dog is not due to any spreading of reactions from circle to ellipse, but to the absence of discrimination between two figures which for the 'intents and purposes' of the untrained animal are the same -- as for my intents and purposes one sheep is the same as another. Similar considerations apply to 'transfer' (Chapter XV).

  'Association by similarity' of perceptions would accordingly mean that an input-pattern A at some stage of its ascent in the nervous system initiates the recall of some past experience B which is equipotential to A with respect to the scanning process at that particular stage, but not in other respects. We might say that A and B have one 'partial' in common which causes B to 'resonate'.

  Association by sound and visual form plays, as we have seen, an important part in the dream and in subconscious processes which enter into creativity. But in the ordinary routines of life, association of sensory percepts uncontaminated by conceptual thinking seems to he rare, and whether it occurs at all is anybody's introspective guess. In the Rohrschach test visual association depends on projective dynamics imbued with meaning. Verbal suggestions influence the visual matrix and distort even the eidetic image; the ambiguities of the sound-picture can only be dispelled by reference to vocabulary; children and aphasic patients often confuse p and q, or write s and e as mirror images, because the cognitive glue which holds the true perceptual units together (verticals, loops, etc.) has not yet hardened or has already decayed -- like the aggregate of visual and cognitive elements which constitutes the image of the elephant. Thus hearing is inextricably bound up with interpreting, seeing with knowing, perceiving with naming. By these methods the organism is enabled to build a model of the external world into its own nervous system, without having to store lantern-slides and gramophone records of complex perceptual forms -- animal, vegetable and mineral -- which would not work anyway. All that the model needs in the way of perceptual 'traces' is a modest inventory of elementary root-forms -- much like the cubist painter's austere repertory which Cézanne recommended: 'Everything in nature is modelled on the sphere, the cone and the cylinder. One must teach oneself to base one's painting on these simple figures -- then one can accomplish anything one likes.' [45]

  What we call our visual or auditory memory probably consists of a limited number of such 'root traces' or 'perceptual elements' (in Hebb's sense). These alone may have 'real form' as perceptual wholes, and at the same time enter as parts into the complex, aggregate 'pseudo-images', held together by meaningful association. If it seems to us that such complex aggregates can be 'taken in' and 'recognized at a glance' without scanning and exploration, this is perhaps because we commonly underestimate the span of the psychological present. In his review of the literature on the psychological present, Woodrow found that its maximum span is estimated to lie between 2.3 and 12 seconds. [46] No wonder there is considerable disagreement about the size of the 'discrete units of speech' in perception -- whether the unit is the phoneme, syllable, word, or a whole sentence! A few seconds are ample time for those partly or wholly unconscious operations which make our perceptions into inferential constructions. If the psychological present /P/ be regarded as an elementary quantum of conscious experience, then the processes which go on within /P/ can ex hypothesi not be on a conscious level; they must remain an unanalysable and uncompressible blur.

  Recognition and Recall

  While new matrices are formed by learning, others may decay through disuse like old waterways overgrown by weeds. Apart from generation and decay, the traces left by past events in the nervous system also undergo dynamic changes -- simplification, condensation, distortion on the one hand; elaboration and enrichment through the addition of extraneous material on the other. The 'schemata' of memory, as Bartlett called them, are 'living, constantly developing, affected by every bit of incoming sensational experience of a given kind'. [47] In other words, the past is constantly being re-made by the present.* To quote Bartlett again:

  Remembering is not re-excitation of innumerable, fixed, lifeless and fragmentary traces. It is an imaginative reconstruction, or construction, built out of the relation of our attitude towards a whole active mass of organized past reactions or experience, and to a little outstanding detail which commonly appears in image or in language form. It is thus hardly ever really exact, even in the most rudimentary cases of rote-recapitulation, and it is not at all important that it should be so. [48]

  True recall by imagery would be possible only if the de-particularized memory could be re-particularized, the irreversible process reversed. One may be able to 'hear' -- while shaving, for instance -- the faint, pale ghost of a voice from the past singing a simple song. To make this possible, at least three different systems of 'coloured filters', concerned with melody, timbre, and wording, must each have preserved one aspect of the original experience. One may also recall, more or less distinctly, characteristic combinations of form and motion: the stride of a person, the roll of a boat, the waddle of a tortoise. But the average person's abilities of perceptual imaging are limited to this kind of production. Hence the paradox of what one might call 'negative recognition': I visit a friend whom I have not seen for some time, look round and say: 'Something is changed in this room' -- without being able to say what has been changed. I can only assume that my memory of the room was determined by several complementary matrices -- sketchy, part verbal, part visual schemata, such as 'Regency furniture', 'L-shaped plan', 'subdued colour scheme', etc. -- plus one or two 'vivid details': a picture, a flower-vase. A good many changes could be made in the room which I would not notice
so long as they satisfy these criteria as 'equipotential variations'; only changes which offend against one of the codes will make me register that 'something is wrong'. My inability to name that 'something' indicates that the code was functioning below the level of conscious awareness (Cf. Book One, XIX).*

  The adjectives used to describe a face -- 'soft', 'bony', 'pinched', 'humorous', etc. -- refer to part visual, part verbal schemata, some of which may be as simplified as the surprisingly few linear elements which suffice to indicate emotional expression by the posture and slant of mouth and eyes. The caricaturist can evoke a face by a few strokes which schematize a total impression (Hitler's moustache and lock), or he can pick out a detail which acts as a 'sign-releaser' (Churchill's cigar). It is often easier to remember a face known only from illustrations -- Napoleon or Mona Lisa -- than faces of living persons; perhaps because half of the compressing and coding of the visual information has already been done by the artist. Equally revealing is the police method of reconstructing the likeness of a criminal by the Identi-Kit method. This is based on 'a slide-file of five hundred and fifty facial characteristics containing, among other things, a hundred and two sets of eyes ranging from pop to squinting, thirty-three sets of lips from thin to sensuous, fifty-two chins, from weak to jutting, and even twenty-five sets of wrinkles. Witnesses pick the individual features that most closely resemble their idea of the criminal's look. From their selections a composite picture of all the features is then assembled.' [49]