What is a neural network? Connectionists use the term to refer not to real neural circuitry in the brain but to a kind of computer program based on the metaphor of neurons and neural circuits. In the most common approach, a “neuron” carries information by being more or less active. The activity level indicates the presence or absence (or intensity or degree of confidence) of a simple feature of the world. The feature may be a color, a line with a certain slant, a letter of the alphabet, or a property of an animal such as having four legs.
A network of neurons can represent different concepts, depending on which ones are active. If neurons for “yellow,” “flies,” and “sings” are active, the network is thinking about a canary; if neurons for “silver,” “flies,” and “roars” are active, it is thinking about an airplane. An artificial neural network computes in the following manner. Neurons are linked to other neurons by connections that work something like synapses. Each neuron counts up the inputs from other neurons and changes its activity level in response. The network learns by allowing the input to change the strengths of the connections. The strength of a connection determines the likelihood that the input neuron will excite or inhibit the output neuron.
Depending on what the neurons stand for, how they are innately wired, and how the connections change with training, a connectionist network can learn to compute various things. If everything is connected to everything else, a network can soak up the correlations among features in a set of objects. For example, after exposure to descriptions of many birds it can predict that feathered singing things tend to fly or that feathered flying things tend to sing or that singing flying things tend to have feathers. If a network has an input layer connected to an output layer, it can learn associations between ideas, such as that small soft flying things are animals but large metallic flying things are vehicles. If its output layer feeds back to earlier layers, it can crank out ordered sequences, such as the sounds making up a word.
The appeal of neural networks is that they automatically generalize their training to similar new items. If a network has been trained that tigers eat Frosted Flakes, it will tend to generalize that lions eat Frosted Flakes, because “eating Frosted Flakes” has been associated not with “tigers” but with simpler features like “roars” and “has whiskers,” which make up part of the representation of lions, too. The school of connectionism, like the school of associationism championed by Locke, Hume, and Mill, asserts that these generalizations are the crux of intelligence. If so, highly trained but otherwise generic neural networks can explain intelligence.
Computer modelers often set their models on simplified toy problems to prove that they can work in principle. The question then becomes whether the models can “scale up” to more realistic problems, or whether, as skeptics say, the modeler “is climbing trees to get to the moon.” Here we have the problem with connectionism. Simple connectionist networks can manage impressive displays of memory and generalization in circumscribed problems like reading a list of words or learning stereotypes of animals. But they are simply too underpowered to duplicate more realistic feats of human intelligence like understanding a sentence or reasoning about living things.
Humans don’t just loosely associate things that resemble each other, or things that tend to occur together. They have combinatorial minds that entertain propositions about what is true of what, and about who did what to whom, when and where and why. And that requires a computational architecture that is more sophisticated than the uniform tangle of neurons used in generic connectionist networks. It requires an architecture equipped with logical apparatus like rules, variables, propositions, goal states, and different kinds of data structures, organized into larger systems. Many cognitive scientists have made this point, including Gary Marcus, Marvin Minsky, Seymour Papert, Jerry Fodor, Zenon Pylyshyn, John Anderson, Tom Bever, and Robert Hadley, and it is acknowledged as well by neural network modelers who are not in the connectionist school, such as John Hummel, Lokendra Shastri, and Paul Smolensky.13 I have written at length on the limits of connectionism, both in scholarly papers and in popular books; here is a summary of my own case.14
In a section called “Connectoplasm” in How the Mind Works, I laid out some simple logical relationships that underlie our understanding of a complete thought (such as the meaning of a sentence) but that are difficult to represent in generic networks.15 One is the distinction between a kind and an individual: between ducks in general and this duck in particular. Both have the same features (swims, quacks, has feathers, and so on), and both are thus represented by the same set of active units in a standard connectionist model. But people know the difference.
A second talent is compositionality: the ability to entertain a new, complex thought that is not just the sum of the simple thoughts composing it but depends on their relationships. The thought that cats chase mice, for example, cannot be captured by activating a unit each for “cats,” “mice,” and “chase,” because that pattern could just as easily stand for mice chasing cats.
A third logical talent is quantification (or the binding of variables): the difference between fooling some of the people all of the time and fooling all of the people some of the time. Without the computational equivalent of x’s, y’s, parentheses, and statements like “For all x,” a model cannot tell the difference.
A fourth is recursion: the ability to embed one thought inside another, so that we can entertain not only the thought that Elvis lives, but the thought that the National Enquirer reported that Elvis lives, that some people believe the National Enquirer report that Elvis lives, that it is amazing that some people believe the National Enquirer report that Elvis lives, and so on. Connectionist networks would superimpose these propositions and thereby confuse their various subjects and predicates.
A final elusive talent is our ability to engage in categorical, as opposed to fuzzy, reasoning: to understand that Bob Dylan is a grandfather, even though he is not very grandfatherly, or that shrews are not rodents, though they look just like mice. With nothing but a soup of neurons to stand for an object’s properties, and no provision for rules, variables, and definitions, the networks fall back on stereotypes and are bamboozled by atypical examples.
In Words and Rules I aimed a microscope on a single phenomenon of language that has served as a test case for the ability of generic associative networks to account for the essence of language: assembling words, or pieces of words, into new combinations. People don’t just memorize snatches of language but create new ones. A simple example is the English past tense. Given a neologism like to spam or to snarf, people don’t have to run to the dictionary to look up their past-tense forms; they instinctively know that they are spammed and snarfed. The talent for assembling new combinations appears as early as age two, when children overapply the past-tense suffix to irregular verbs, as in We holded the baby rabbits and Horton heared a Who.16
The obvious way to explain this talent is to appeal to two kinds of computational operations in the mind. Irregular forms like held and heard are stored in and retrieved from memory, just like any other word. Regular forms like walk-walked can be generated by a mental version of the grammatical rule “Add –ed to the verb.” The rule can apply whenever memory fails. It may be used when a word is unfamiliar and no past-tense form had been stored in memory, as in to spam, and it may be used by children when they cannot recall an irregular form like heard and need some way of marking its tense. Combining a suffix with a verb is a small example of an important human talent: combining words and phrases to create new sentences and thereby express new thoughts. It is one of the new ideas of the cognitive revolution introduced in Chapter 3, and one of the logical challenges for connectionism I listed in the preceding discussion.
Connectionists have used the past tense as a proving ground to see if they could duplicate this textbook example of human creativity without using a rule and without dividing the labor between a system for memory and a system for grammatical combination. A ser
ies of computer models have tried to generate past-tense forms using simple pattern associator networks. The networks typically connect the sounds in verbs with the sounds in the past-tense form: –am with –ammed, -ing with -ung, and so on. The models can then generate new forms by analogy, just like the generalization from tigers to lions: trained on crammed, a model can guess spammed; trained on folded, it tends to say holded.
But human speakers do far more than associate sounds with sounds, and the models thus fail to do them justice. The failures come from the absence of machinery to handle logical relationships. Most of the models are baffled by new words that sound different from familiar words and hence cannot be generalized by analogy. Given the novel verb to frilg, for example, they come up not with frilged, as people do, but with an odd mishmash like freezled. That is because they lack the device of a variable, like x in algebra or “verb” in grammar, which can apply to any member of a category, regardless of how familiar its properties are. (This is the gadget that allows people to engage in categorical rather than fuzzy reasoning.) The networks can only associate bits of sound with bits of sound, so when confronted with a new verb that does not sound like anything they were trained on, they assemble a pastiche of the most similar sounds they can find in their network.
The models also cannot properly distinguish among verbs that have the same sounds but different past-tense forms, such as ring the bell-rang the bell and ring the city-ringed the city. That is because the standard models represent only sound and are blind to the grammatical differences among verbs that call for different conjugations. The key difference here is between simple roots like ring in the sense of “resonate” (past tense rang) and complex verbs derived from nouns like ring in the sense of “form a ring around” (past tense ringed). To register that difference, a language-using system has to be equipped with compositional data structures (such as “a verb made from the noun ring”) and not just a beanbag of units.
Yet another problem is that connectionist networks track the statistics of the input closely: how many verbs of each sound pattern they have encountered. That leaves them unable to account for the epiphany in which young children discover the -ed rule and start making errors like holded and heared. Connectionist modelers can induce these errors only by bombarding the network with regular verbs (so as to burn in the -ed) in a way that is unlike anything real children experience. Finally, a mass of evidence from cognitive neuroscience shows that grammatical combination (including regular verbs) and lexical lookup (including irregular verbs) are handled by different systems in the brain rather than by a single associative network.
It’s not that neural networks are incapable of handling the meanings of sentences or the task of grammatical conjugation. (They had better not be, since the very idea that thinking is a form of neural computation requires that some kind of neural network duplicate whatever the mind can do.) The problem lies in the credo that one can do everything with a generic model as long as it is sufficiently trained. Many modelers have beefed up, retrofitted, or combined networks into more complicated and powerful systems. They have dedicated hunks of neural hardware to abstract symbols like “verb phrase” and “proposition” and have implemented additional mechanisms (such as synchronized firing patterns) to bind them together in the equivalent of compositional, recursive symbol structures. They have installed banks of neurons for words, or for English suffixes, or for key grammatical distinctions. They have built hybrid systems, with one network that retrieves irregular forms from memory and another that combines a verb with a suffix.17
A system assembled out of beefed-up subnetworks could escape all the criticisms. But then we would no longer be talking about a generic neural network! We would be talking about a complex system innately tailored to compute a task that people are good at. In the children’s story called “Stone Soup,” a hobo borrows the use of a woman’s kitchen ostensibly to make soup from a stone. But he gradually asks for more and more ingredients to balance the flavor until he has prepared a rich and hearty stew at her expense. Connectionist modelers who claim to build intelligence out of generic neural networks without requiring anything innate are engaged in a similar business. The design choices that make a neural network system smart—what each of the neurons represents, how they are wired together, what kinds of networks are assembled into a bigger system, in which way—embody the innate organization of the part of the mind being modeled. They are typically hand-picked by the modeler, like an inventor rummaging through a box of transistors and diodes, but in a real brain they would have evolved by natural selection (indeed, in some networks, the architecture of the model does evolve by a simulation of natural selection).18 The only alternative is that some previous episode of learning left the networks in a state ready for the current learning, but of course the buck has to stop at some innate specification of the first networks that kick off the learning process.
So the rumor that neural networks can replace mental structure with statistical learning is not true. Simple, generic networks are not up to the demands of ordinary human thinking and speaking; complex, specialized networks are a stone soup in which much of the interesting work has been done in setting up the innate wiring of the network. Once this is recognized, neural network modeling becomes an indispensable complement to the theory of a complex human nature rather than a replacement for it.19 It bridges the gap between the elementary steps of cognition and the physiological activity of the brain and thus serves as an important link in the long chain of explanation between biology and culture.
FOR MOST OF its history, neuroscience was faced with an embarrassment: the brain looked as if it were innately specified in every detail. When it comes to the body, we can see many of the effects of a person’s life experience: it may be tanned or pale, callused or soft, scrawny or plump or chiseled. But no such marks could be found in the brain. Now, something has to be wrong with this picture. People learn, and learn massively: they learn their language, their culture, their know-how, their database of facts. Also, the hundred trillion connections in the brain cannot possibly be specified individually by a 750-megabyte genome. The brain somehow must change in response to its input; the only question is how.
We are finally beginning to understand how. The study of neural plasticity is hot. Almost every week sees a discovery about how the brain gets wired in the womb and tuned outside it. After all those decades in which no one could find anything that changed in the brain, it is not surprising that the discovery of plasticity has given the nature-nurture pendulum a push. Some people describe plasticity as a harbinger of an expansion of human potential in which the powers of the brain will be harnessed to revolutionize childrearing, education, therapy, and aging. And several manifestos have proclaimed that plasticity proves that the brain cannot have any significant innate organization.20 In Rethinking Innateness, Jeffrey Elman and a team of West Pole connectionists write that predispositions to think about different things in different ways (language, people, objects, and so on) may be implemented in the brain only as “attention-grabbers” that ensure that the organism will receive “massive experience of certain inputs prior to subsequent learning.”21 In a “constructivist manifesto,” the theoretical neuroscientists Stephen Quartz and Terrence Sejnowski write that “although the cortex is not a tabula rasa… it is largely equipotential at early stages,” and therefore that innatist theories “appear implausible.”22
Neural development and plasticity unquestionably make up one of the great frontiers of human knowledge. How a linear string of DNA can direct the assembly of an intricate three-dimensional organ that lets us think, feel, and learn is a problem to stagger the imagination, to keep neuroscientists engaged for decades, and to belie any suggestion that we are approaching “the end of science.”
And the discoveries themselves are fascinating and provocative. The cerebral cortex (outer gray matter) of the brain has long been known to be divided into areas with different functions. Some represent particular b
ody parts; others represent the visual field or the world of sound; still others concentrate on aspects of language or thinking. We now know that with learning and practice some of their boundaries can move around. (This does not mean that the brain tissue literally grows or shrinks, only that if the cortex is probed with electrodes or monitored with a scanner, the boundary where one ability leaves off and the next one begins can shift.) Violinists, for example, have an expanded region of cortex representing the fingers of the left hand.23 If a person or a monkey is trained on a simple task like recognizing shapes or attending to a location in space, neuroscientists can watch as parts of the cortex, or even individual neurons, take on the job.24
The reallocation of brain tissue to new tasks is especially dramatic when people lose the use of a sense or body part. Congenitally blind people use their visual cortex to read Braille.25 Congenitally deaf people use part of their auditory cortex to process sign language.26 Amputees use the part of the cortex formerly serving the missing limb to represent other parts of their bodies.27 Young children can grow up relatively normal after traumas to the brain that would turn adults into basket cases—even removal of the entire left hemisphere, which in adults underlies language and logical reasoning.28 All this suggests that the allocation of brain tissue to perceptual and cognitive processes is not done permanently and on the basis of the exact location of the tissue in the skull, but depends on how the brain itself processes information.