The Age of Spiritual Machines: When Computers Exceed Human Intelligence
In the simple schema above, the designer of the recursive algorithm needs to determine the following at the outset:
• The key to a recursive algorithm is the determination in PICK BEST NEXT STEP when to abandon the recursive expansion. This is easy when the program has achieved clear success (e.g., checkmate in chess, or the requisite solution in a math or combinatorial problem) or clear failure. It is more difficult when a clear win or loss has not yet been achieved. Abandoning a line of inquiry before a well-defined outcome is necessary because otherwise the program might run for billions of years (or at least until the warranty on your computer runs out).
• The other primary requirement for the recursive algorithm is a straightforward codification of the problem. In a game like chess, that’s easy. But in other situations, a clear definition of the problem is not always so easy to come by.
Happy Recursive Searching!
Human players are very complicated minded. That seems to be the human condition. As a result, even the best chess players are unable to consider more than a hundred moves, compared to a few billion for Deep Blue. But each human move is deeply considered. However, in 1997, Gary Kasparov, the world’s best example of the complicated-minded school, was defeated by a simple-minded computer.
Personally, I am of a third school of thought. It’s not much of a school, really. To my knowledge, no one has tried this idea. It involves combining the recursive and neural net paradigms, and I describe it in the discussion on neural nets that follows.
Neural Nets
In the early and mid-1960s, AI researchers became enamored with the Perceptron, a machine constructed from mathematical models of human neurons. Early Perceptrons were modestly successful in such pattern-recognition tasks as identifying printed letters and speech sounds. It appeared that all that was needed to make the Perceptron more intelligent was to add more neurons and more wires.
Then came Marvin Minsky and Seymour Papert’s 1969 book, Perceptrons, which proved a set of theorems apparently demonstrating that a Perceptron could never solve the simple problem of determining whether or not a line drawing is “connected” (in a connected drawing all parts are connected to one another by lines). The book had a dramatic effect, and virtually all work on Perceptrons came to a halt.2
In the late 1970s and 1980s, the paradigm of building computer simulations of human neurons, then called neural nets, began to regain it’s popularity. One observer wrote in 1988:
Once upon a time two daughter sciences were born to the new science of cybernetics. One sister was natural, with features inherited from the study of the brain, from the way nature does things. The other was artificial, related from the beginning to the use of computers. Each of the sister sciences tried to build models of intelligence, but from very different materials. The natural sister built models (called neural networks) out of mathematically purified neurones. The artificial sister built her models out of computer programs.
In their first bloom of youth the two were equally successful and equally pursued by suitors from other fields of knowledge. They got on very well together. Their relationship changed in the early sixties when a new monarch appeared, one with the largest coffers ever seen in the kingdom of the sciences: Lord DARPA, the Defense Department’s Advanced Research Projects Agency. The artificial sister grew jealous and was determined to keep for herself the access to Lord DARPA’s research funds. The natural sister would have to be slain.
The bloody work was attempted by two staunch followers of the artificial sister, Marvin Minsky and Seymour Papert, cast in the role of the huntsman sent to slay Snow White and bring back her heart as proof of the deed. Their weapon was not the dagger but the mightier pen, from which came a book—Perceptrons—purporting to prove that neural nets could never fill their promise of building models of mind: only computer programs could do this. Victory seemed assured for the artificial sister. And indeed, for the next decade all the rewards of the kingdom came to her progeny, of which the family of expert systems did best in fame and fortune.
But Snow White was not dead. What Minsky and Papert had shown the world as proof was not the heart of the princess, it was the heart of a pig.
The author of the above statement was Seymour Papert.3 His sardonic allusion to bloody hearts reflects a widespread misunderstanding of the implications of the pivotal theorem in his and Minsky’s 1969 book. The theorem demonstrated limitations in the capabilities of a single layer of simulated neurons. If, on the other hand, we place neural nets at multiple levels—having the output of one neural net feed into the next—the range of its competence greatly expands. Moreover, if we combine neural nets with other paradigms, we can make yet greater progress. The heart that Minsky and Papert extracted belonged primarily to the single layer neural net.
Papert’s irony also reflects his and Minsky’s own considerable contributions to the neural net field. In fact, Minsky started his career with seminal contributions to the concept at Harvard in the 1950s.4
But enough of politics. What are the main issues in designing a neural net?
One key issue is the net’s topology: the organization of the interneuronal connections. A net organized with multiple levels can make more complex discriminations but is harder to train.
Training the net is the most critical issue. This requires an extensive library of examples of the patterns the net will be expected to recognize, along with the correct identification of each pattern. Each pattern is presented to the net. Typically, those connections that contributed to a correct identification are strengthened (by increasing their associated weight), and those that contributed to an incorrect identification are weakened. This method of strengthening and weakening the connection weights is called back-propagation and is one of several methods used. There is controversy as to how this learning is accomplished in the human brain’s neural nets, as there does not appear to be any mechanism by which back-propagation can occur. One method that does appear to be implemented in the human brain is that the mere firing of a neuron increases the neurotransmitter strengths of the synapses it is connected to. Also, neurobiologists have recently discovered that primates, and in all likelihood humans, grow new brain cells throughout life, including adulthood, contradicting an earlier dogma that this was not possible.
Little and Big Hills
A key issue in adaptive algorithms—neural nets and evolutionary algorithms—is often referred to as local versus global optimality: in other words, climbing the closest hill versus finding and climbing the biggest hill. As a neural net learns (by adjusting the connection strengths), or as an evolutionary algorithm evolves (by adjusting the “genetic” code of the simulated organisms), the fit of the solution will improve, until a “locally optimal” solution is found. If we compare this to climbing a hill, these methods are very good at finding the top of a nearby hill, which is the best possible solution within a local area of possible solutions. But sometimes these methods may become trapped at the top of a small hill and fail to see a higher mountain in a different area. In the neural net context, if the neural net has converged on a locally optimal solution, as it tries adjusting any of the connection strengths, the fit becomes worse. But just as a climber might need to come down a small elevation to ultimately climb to a higher point on a different hill, the neural net (or evolutionary algorithm) might need to make the solution temporarily worse to ultimately find a better one.
One approach to avoiding such a “false” optimal solution (little hill) is to force the adaptive method to do the analysis multiple times starting with very different initial conditions—in other words, force it to climb lots of hills, not just one. But even with this approach, the system designer still needs to make sure that the adaptive method hasn’t missed an even higher mountain in a yet more distant land.
The Laboratory of Chess
We can gain some insight into the comparison of human thinking and conventional computer approaches by again examining the human and machine approaches
to chess. I do this not to belabor the issue of chess playing, but rather because it illustrates a clear contrast. Raj Reddy, Carnegie Mellon University’s AI guru, cites studies of chess as playing the same role in artificial intelligence that studies of E. coli play in biology: an ideal laboratory for studying fundamental questions.5 Computers use their extreme speed to analyze the vast combinations created by the combinatorial explosion of moves and countermoves. While chess programs may use a few other tricks (such as storing the openings of all master chess games in this century and precomputing endgames), they essentially rely on their combination of speed and precision. In comparison, humans, even chess masters, are extremely slow and imprecise. So we precompute all of our chess moves. That’s why it takes so long to become a chess master, or the master of any pursuit. Gary Kasparov has spent much of his few decades on the planet studying—and experiencing—chess moves. Researchers have estimated that masters of a nontrivial subject have memorized about fifty thousand such “chunks” of insight.
When Kasparov plays, he, too, generates a tree of moves and countermoves in his head, but limitations in human mental speed and short-term memory limit his mental tree (for each actually played move) to no more than a few hundred board positions, if that. This compares to billions of board positions for his electronic antagonist. So the human chess master is forced to drastically prune his mental tree, eliminating fruitless branches by using his intense pattern-recognition faculties. He matches each board position—actual and imagined—to this database of tens of thousands of previously analyzed situations.
After Kasparov’s 1997 defeat, we read a lot about how Deep Blue was just doing massive number crunching, not really “thinking” the way its human rival was doing. One could say that the opposite is the case, that Deep Blue was indeed thinking through the implications of each move and countermove, and that it was Kasparov who did not have time to really think very much during the tournament. Mostly he was just drawing upon his mental database of situations he had thought about long ago. (Of course, this depends on one’s notion of thinking, as I discussed in chapter 3.) But if the human approach to chess—neural network-based pattern recognition used to identify situations from a library of previously analyzed situations—is to be regarded as true thinking, then why not program our machines to work the same way?
The Third Way
And that’s my idea that I alluded to earlier as the third school of thought in evaluating the terminal leaves in a recursive search. Recall that the simple-minded school uses an approach such as adding up piece values to evaluate a particular board position. The complicated-minded school advocates a more elaborate and time-consuming logical analysis. I advocate a third way: combine two simple paradigms—recursive and neural net—by using the neural net to evaluate the board positions at each terminal leaf. The training of a neural net is time-consuming and requires a great deal of computing, but performing a single recognition task on a neural net that has already learned its lessons is very quick, comparable to a simple-minded evaluation. Although fast, the neural net is drawing upon the very extensive amount of time it previously spent learning the material. Since we have every master chess game in this century online, we can use this massive amount of data to train the neural net. This training is done once and offline (that is, not during an actual game). The trained neural net would then be used to evaluate the board positions at each terminal leaf. Such a system would combine the millionfold advantage in speed that computers have with the more humanlike ability to recognize patterns against a lifetime of experience.
I proposed this approach to Murray Campbell, head of the Deep Blue team, and he found it intriguing and appealing. He was getting tired anyway, he admitted, of tuning the leaf evaluation algorithm by hand. We talked about setting up an advisory team to implement this idea, but then IBM canceled the whole chess project. I do believe that one of the keys to emulating the diversity of human intelligence is optimally to combine fundamental paradigms. We’ll talk about how to fold in the paradigm of evolutionary algorithms below.
MATH LESS “PSEUDO CODE” FOR THE NEURAL NET ALGORITHM
Here is the basic schema for a neural net algorithm. Many variations are possible, and the designer of the system needs to provide certain critical parameters and methods, detailed below.
The Neural Net Algorithm
Creating a neural net solution to a problem involves the following steps:
• Define the input.
• Define the topology of the neural net (i.e., the layers of neurons and the connections between the neurons).
• Train the neural net on examples of the problem.
• Run the trained neural net to solve new examples of the problem.
• Take your neural net company public.
These steps (except for the last one) are detailed below:
The Problem Input
The problem input to the neural net consists of a series of numbers. This input can be:
• in a visual pattern-recognition system: a two-dimensional array of numbers representing the pixels of an image; or
• in an auditory (e.g., speech) recognition system: a two-dimensional array of numbers representing a sound, in which the first dimension represents parameters of the sound (e.g., frequency components) and the second dimension represents different points in time; or
• in an arbitrary pattern recognition system: an n-dimensional array of numbers representing the input pattern.
Defining the Topology
To set up the neural net:
The architecture of each neuron consists of:
• Multiple inputs in which each input is “connected” to either the output of another neuron or one of the input numbers.
• Generally, a single output, which is connected either to the input of another neuron (which is usually in a higher layer) or to the final output.
Set up the first layer of neurons:
• Create No neurons in the first layer. For each of these neurons, “connect” each of the multiple inputs of the neuron to “points” (i.e., numbers) in the problem input. These connections can be determined randomly or using an evolutionary algorithm (see below).
• Assign an initial “synaptic strength” to each connection created. These weights can start out all the same, can be assigned randomly, or can be determined in another way (see below).
Set up the additional layers of neurons:
Set up a total of M layers of neurons. For each layer, set up the neurons in that layer. For layeri:
• Create Ni neurons in layeri. For each of these neurons, “connect” each of the multiple inputs of the neuron to the outputs of the neurons in layeri-1 (see variations below).
• Assign an initial “synaptic strength” to each connection created. These weights can start out all the same, can be assigned randomly,or can be determined in another way (see below).
• The outputs of the neurons in layerM are the outputs of the neural net (see variations below).
The Recognition Trials
How each neuron works:
Once the neuron is set up, it does the following for each recognition trial.
• Each weighted input to the neuron is computed by multiplying the output of the other neuron (or initial input) that the input to this neuron is connected to by the synaptic strength of that connection.
• All of these weighted inputs to the neuron are summed.
• If this sum is greater than the firing threshold of this neuron, then this neuron is considered to “fire” and its output is 1. Otherwise, its output is 0 (see variatioris below).
Do the following for each recognition trial:
For each layer, from layer0 to layerM:
And for each neuron in each layer:
• Sum its weighted inputs (each weighted input = the output of the other neuron [or initial input] that the input to this neuron is connected to, multiplied by the synaptic strength of that connection).
• If t
his sum of weighted inputs is greater than the firing threshold for this neuron, set the output of this neuron = 1, otherwise set it to 0.
To Train the Neural Net
• Run repeated recognition trials on sample problems.
• After each trial, adjust the synaptic strengths of all the interneuronal connections to improve the performance of the neural net on this trial (see the discussion below on how to do this).
• Continue this training until the accuracy rate of neural net is no longer improving (i.e., reaches an asymptote).
Key Design Decisions
In the simple schema above, the designer of this neural net algorithm needs to determine at the outset:
• What the input numbers represent.
• The number of layers of neurons.
• The number of neurons in each layer (each layer does not necessarily need to have the same number of neurons).
• The number of inputs to each neuron, in each layer. The number of inputs (i.e., interneuronal connections) can also vary from neuron to neuron, and from layer to layer.
• The actual “wiring” (i.e., the connections). For each neuron, in each layer, this consists of a list of other neurons, the outputs of which constitute the inputs to this neuron. This represents a key design area. There are a number of possible ways to do this: