The Act of Creation
At the auditory projection area we must assume the dismantling process to end and the reassembling to start. When we listen to the symphony we do not hear an ensemble of the pure tones into which it has been broken up in the cochlea, but an ensemble of individual instruments and voices: that is, of organized sub-wholes. The individual timbre of an instrument is determined by its overtones -- the series of partials which accompany the fundamental, and the energy-distribution of them. By superimposition of the sine curves of the partials, we obtain the periodic curve characteristic for each instrument. When we identify the sound of a violin or flute by picking out and bracketing together its partials -- which were 'drowned' among thousands of other partials in the air-pressure wave -- we have achieved 'timbre constancy', comparable to visual figure constancy. This, of course, is based on past experience and involves an act of recognition by the 'trained ear' of instruments previously heard in isolation.
'Coloured Filters'
Since all but the most elementary perceptions interact with past experience, it seems a rather unsound procedure to discuss perception divorced from the problem of memory. The question, then, is how the 'trace' was originally acquired which enables me to recognize an instrument or voice on subsequent occasions. Let us assume that I am hearing an exotic instrument for the first time, and that I am interested at the moment only in its timbre, not in the melody played on it (which, in the case of a Japanese koto or samisen, would be above my head anyway). As I am listening, the mathematical relations between the partials remain constant and enduring, whereas their pitch and loudness are changing all the time. This stable and enduring relation-pattern (the fixed ratios between the part-frequencies) will be treated by my nervous system, which is processing the input, as relevant, whereas the changes in the relata (the absolute frequencies) are discarded as irrelevant. When this filtering-out process is completed, the input will have been finally stripped of all irrelevant detail, according to the demands of parsimony, and reduced to its invariant pattern -- to 'information' purified of 'noise'. If an input has undergone these transformations and was permitted to progress this far without being blocked somewhere on its way (as, for instance, the voices of irrelevant strangers at a cocktail party are) then it will tend to leave a lasting 'trace' -- which will enable the nervous system to recognize in future the same voice or instrument.
We have witnessed, as it were, the formation of the code of a new perceptual skill. The organism feeds on negative entropy; in communication theory, 'entropy' becomes 'noise'. The sensorium abstracts information from the chaotic environment as the mitochondria extract, by a series of dismantling and reassembling processes, a specific form of energy from food. The abstracting and recording of information involves, as we have just seen, the sacrifice of details which are filtered out as irrelevant in a given context. But what is considered as irrelevant in one context, may be relevant in another; and vice versa. We can recognize an instrument regardless of the tune played on it; but we can also recognize a tune regardless of the instrument on which it is played. The tune is abstracted and recorded in a memory-trace de-particularized of timbre; timbre is recorded de-particularized of tune. Thus the filtering-out of redundancies as the input is relayed from periphery to centre does not proceed along a single channel, but along several channels, each with its series of filters of different colour, as it were. The different colours represent the criteria of relevance in different perceptual hierarchies. Each hierarchy analyses the input according to its own criteria of relevance; but the loss of detail incurred in the process of memory-formation along a single channel is partly counteracted by the fact that information rejected as irrelevant by its coloured filters may be admitted as relevant by another channel belonging to a different hierarchy. We shall see that this principle of multi-dimensional analysis is of basic importance in the phenomena of recognition and recall.
A Digression on Engrams
The neurophysiological problems of memory are beyond the scope of this book, but the following remarks may help to forestall possible misunderstandings. Perceptual codes of the type which enables us to recognize an instrument are devices which analyse complex acoustic inputs by some unknown process of 'matching' or 'resonance'. The quotes indicate that these words are used as metaphors only; the process must of course be incomparably more complicated than acoustic or electric resonance. What matters is that a memory-trace cannot be visualized as a mechanical record like a gramophone groove, 'stamped' into the brain and activated by specific pathways. Such an arrangement would be as useless for purposes of auditory analysis as it would be useless for the visual recognition of shapes to have an archive of photographic engrams. Instead of this, we must hypothesize some kind of 'attunement' of a cluster or clusters of neurons, with a hierarchic ogranization and containing sub-wholes which are equipotential in their response to one specific pattern of excitation and to that pattern only. Pringle [7] assumed that memory-traces function like 'coupled resonators'; Hyden's [8] theory of RNA changes which determine selective responsiveness to frequency-modulation sequences of excitation seems more plausible. Whatever the mechanism is, it must 'combine the principle of fixed but partly equipotential spatial connections, with selective responses to specific excitation patterns, to account for the hierarchic organization of perceptual, conceptual, and motor skills. Weiss' excitation clang was an approach in that direction. Hebb's phase sequences in neuron assemblies was another. The Pitt-McCullough model of a scanning analyser to account for figure constancies should show that basically similar principles can be applied in vision as in audition to the problem of analysing and matching the input.
The matrix of a complex skill -- such as the maze-running skill of Lashley's rats -- may be no more 'localized' than the programme of a political party is localized by the addresses of all members who adhere to it. If some members or groups of members are eliminated, other groups may take over. Simpler and more primitive matrices, however, are perhaps rather like the professional guilds of craftsmen concentrated in one area of a medieval town; if that area is destroyed, the skill is lost.
Tracing a Melody
We have seen that recognition of a voice or instrument is based on an invariant relation (the fixed ratios between partials or formants) which has been extracted from the variable relata. Once the instrument is perceived as a recognizable whole, the relation becomes a relatum -- e.g. 'a violin' -- regardless whether a verbal symbol is attached to it or not. This relatum then enters into relations with the sound of other instruments, which are analysed on higher levels according to the criteria of more complex rules of the game -- harmony, melodic, and contrapuntal form -- in which several perceptual hierarchies participate.
A tune is defined by rhythm and pitch. Rhythm derives from the hierarchic organization of beat-cum-accent into measure, measure into phrase. To qualify as a tune, the pitch-variation sequence must conform to certain codes of modality, key, harmony. These codes must also be represented in the listener's perceptual organization, otherwise there would be no musical experience, only the sensation of a medley of sounds -- as when a European listens for the first time to Chinese opera. The melody itself has the structural coherence of a closed figure as distinct from an open, linear chain. It is either 'taken in' as a whole in the specious present, or learned by the integration of sub-wholes, that is of entire phrases -- but never by chaining note to note in the manner of learning nonsense-syllables (though even these tend to form patterns). A chain of notes could not be transposed from one key or instrument to another; nor recognized after transposition.
A tune is a' temporal pattern of notes in a given scale. The notes are the relata; by humming it in a different key, or playing it on a different instrument, the relata are changed but the relation remains invariant in all transformations. On a higher level, the tune as a whole again becomes a relatum which enters into relations with other tonal patterns; or with itself in symmetrical reversal; or contrapuntally with other themes.
Most people are capable of learning and recognizing simple melodies, and equally capable of recognizing the sound of various instruments -- but few mortals share the privilege of 'absolute pitch', of being able to identify single notes. In other words, retention of a pattern of stimuli is the rule, retention of an isolated stimulus the exception. If the pattern is relatively simple, it is 'take in at a glance', as a whole: as a rule, listening to the first two transients is sufficient to identify an instrument. [9] But the more complex the pattern, the more difficult it becomes to 'take the whole in at a glance', and it can be retained only by dint of a certain amount of rote learning.
Conditioning and Insight in Perception
Once more, however, the items memorized are not discrete bits, but organized sub-wholes; and they are not summated in an open chain but interrelated in a closed figure. Thus the first movement of a sonata will fall into three sub-wholes: statement of themes, development, recapitulation; and the first of these is usually subdivided into the exposition of two themes in the order A-B-A; while in the rondo we usually have A B A C A.
Similar considerations apply, for instance, to the learning of a poem. Rhythm, rhyme, grammar, and meaning provide patterns or 'grids' superimposed on each other -- matrices governed by already established codes; and the memorizing that remains to be done is not so much a 'stamping 'in' but a 'filling of gaps'. This is shown by the typical way of' getting stuck' in reciting a poem; e.g.:
' . . . Cannon to left of them / Cannon in front of them / (-- --) and thundered'. A word has fallen out like a piece from a jigsaw puzzle -- but it merely leaves a gap; it does not break the 'chain'. The old-fashioned method of teaching history by reigns and battles is an obvious example of stamping in. Even so, the data often show some rudimentary organization into rhythmic or visual patterm, acquired spontaneously or by some memorizing trick such as rhyming jingles. Calculating prodigies memorize long series of numbers, not by chaining but by ordering them into familiar sub-groups. Nonsense syllables are easier retained by twisting them into a semblance of words, and weaving these into a story. [10] The position of thirty men on a chessboard is easier retained than of five chessmen lying in a heap on the floor.
How do we recognize complex patterns? Take a professional musician who has turned on his radio in the middle of a programme: 'It's a string quartet. . . . Something by Beethoven. . . . It's a quartet of the middle period. . . . It's the second Rasoumovsky. . . . It is probably played by the Amadeus Quartet. . . .' The input has been matched in rapid succession against the very complex coded constancies in several interlocking hierarchies -- timbre, melody, rhythm, accent, phrasing, volume, density, etc. -- until the last drop of 'information' has been extracted from it. Each independent hierarchy of 'coloured filters' activated by the input adds an additional dimension to understanding.
Perception cannot be divorced from past experience. What I have said so far already foreshadows a continuous scale of gradations between opposite methods of perceptual learning. At one end, in classical conditioning, we shall find stamping-in, under artificial conditions, of excitation-patterns which outside the laboratory would be treated as biologically irrelevant and would accordingly leave no trace. Outside the laboratory, edible things do not emit signals by metronome-clicks, or by displaying the figure of an ellipse on a cardboard. The dog's perceptual organization is not 'attuned' to this kind of input-signal; it lies outside all recognized rules of the game; and there will be no inherent tendency in the naïve dog to abstract information from the rate of metronome-clicks. However (see below, Chapter XII), even the artificial stamping-in of a trace in this type of experiment is not purely mechanical, and not comparable to the action of the recording needle on the gramophone disc.
In the intermediary ranges of the scale we find blends of varying proportions between 'bit learning' and 'whole' or pattern-learning; and lastly, at the opposite end, the input is analysed in all its relevant aspects by the various 'competent' perceptual hierarchies, until it is saturated, as it were, with meaning. This, I shall suggest, is what we mean by 'insight-learning'. Insight thus becomes a matter of degrees -- and not, as the Gestalt school seemed to hold, an all-or-nothing process.
The key-word in the previous paragraph was 'competent'. The amount of 'stamping in' needed, and the type of learning which will occur, depends on the animal's (native and acquired) perceptual organization -- in other words, on its 'ripeness' for that particular kind of task. If this sounds like a truism, one still wonders how conditioning -- 'classical' or 'operant' -- could ever have been regarded as the paradigm of all learning.
Abstract and Picture-strip
Memory of a sort is found on every organic level, from protozoa upwards. The human nervous system we assume to be equipped with a hierarchy of memory-systems operating on various levels: from short-lived, unstable modifications in the receptor organs, to stable and enduring central 'engrams' and the codes of complex skills. Since perception and memory-formation proceed in a continuous series, and since perception filters the input, we are led to the apparently paradoxical conclusion that the most enduring memory-traces must be those which have been most thoroughly de-particularized -- it is to say, impoverished. This seems indeed to be the case at least in so far as one important category of hierarchies is concerned: the abstractive category.
When one is watching a play at the theatre the successive sounds emitted by the actors must be retained by short-term memory until they can be bracketed together into words or larger syntactic sub-wholes. The psychological present embraces various-sized chunks of the immediate past (by means of a 'nmemic afterglow', of reverberating circuits, or what-have-you). By the time the actor utters his next line, the perceptual relata -- the speech-units -- of the previous line have already been forgotten, and only the wording is still retained. A few lines further the exact wording of the first phrase is also wiped off the memory slate, and only its content is still stored on some higher level of the hierarchy. The next day one still has a fairly detailed recollection of the actual sequence of scenes in the play; a few months later only an outline of the plot as a whole remains in the 'store'. Parsimony in memory-formation demands that only a mere skeleton of the complex original experience should be retained on the highest level of a given hierarchy; and vice versa, that the trace which an input leaves shall be the more enduring, the higher the level to which it has attained by successive stages of de-particularization and re-coding.
The example I described was of an abstractive hierarchy governed exclusively by logical analysis. A computer built on these lines, after being fed a number of West End plays, would probably filter down all that it found worth retaining, to a formula such as: isosceles marriage triangle with pet-dog at centre of gravity, or: whodunit with five independent variables (suspects).
Let us assume that at each stage of this serial abstractive process, the input activates some particular scanning- or filtering-device which is 'attuned' to that particular input. The receiving end of that device corresponds to its 'matrix' aspect: it is potentially responsive to a great many inputs which have one specific feature or pattern in common, and are thus equipotential in that respect. When an input is 'recognized' by the matrix as conforming to that pattern, it will emit a code-signal to the higher echelons. But while the matrix is 'attuned' to a great number of variations in the input pattern, the code merely signals the invariant aspect of it, e.g. 'a triangle', 'an octave', 'a fly', 'a denial'. The size and position of the triangle, the particulars of the fly, the wording of the denial are lost in the coding, and cannot be retrieved by reversing the process within that particular hierarchy (though it may have been preserved by another).
Thus the analysing-devices behave' like analogue-to-digital computers, and in other respects, too, the order of events is the exact reverse -- as one would expect -- of the processes we have observed in motor-hierarchies. When an animal engages in some skilled action, the coordinating centre activates a matrix of equi-final motor patterns; which p
articular sub-skill will be called into activity depends on circumstances. Thus the 'roughed-in' action-programme becomes more and more particularized in the course of its descent to the periphery -- while contrariwise, the peripheral input is more and more de-particularized or 'skeletonized' in its ascent towards the centre. The first is a process of progressively spelling out implicit orders; the second an equally stepwise process of abstracting the meaning implied in the mosaic of sensations. Both processes are irreversible: the exact words of the actors in the play cannot be retrieved.
It can also happen, however, that one has quite forgotten what that play, seen years ago, was about -- except for one particular detail, an inflection of voice, an imploring gesture of the heroine which, torn from its context, remains engraved on one's memory. There exists, indeed, a method of retention which seems to be the direct opposite of memory-formation in abstractive hierarchies. It is characterized by the preservation of vivid details, which, from a purely logical point of view, are often quite irrelevant; and yet these quasi-cinematographic details or 'close-ups', which seem to contradict the demands of parsimony, are both enduring, strikingly sharp, and add texture and flavour to memory.
Bartlett, in a classic experiment, made his subjects read an Indian legend and then reproduce it on repeated occasions at intervals of increasing length -- ranging from fifteen minutes after the first reading to several months or years. The story was about thirty lines long; it concerned a young Indian who got involved in the 'War of the Ghosts', and was wounded in the process. The last paragraph read: