The Age of Spiritual Machines: When Computers Exceed Human Intelligence
—Rolf Landauer
As I mentioned in chapter 6, “Building New Brains,” understanding intelligence is a bit like peeling an onion—penetrating each layer reveals yet another onion. At the end of the process, we have a lot of onion peels, but no onion. In other words, intelligence—particularly human intelligence—operates at many levels. We can penetrate and understand each level, but the whole process requires all the levels working together in just the right way.
Presented here are some further perspectives on the three paradigms I discussed in chapter 4, “A New Form of Intelligence on Earth.” Each of these methods can provide “intelligent” solutions to carefully defined problems. But to create systems that can respond flexibly in the complex environments that intelligent entities often find themselves, these approaches need to be combined in appropriate ways. This is particularly true when interacting with phenomena that.incorporate multiple levels of understanding. For example, if we build a single grand neural network and attempt to train it to understand all the complexities of speech and language, the results will be limited at best. More encouraging results are obtained if we break down the problem in a way that corresponds to the multiple levels of meaning that we find in this uniquely human form of communication.
The human brain is organized the same way: as an intricate assemblage of specialized regions. And as we learn the brain’s parallel algorithms, we will have the means to vastly extend them. As just one example, the brain region responsible for logical and recursive thinking—the cerebral cortex—has a mere 8 million neurons.1 We are already building neural nets thousands of times larger and that operate millions of times faster. The key issue in designing intelligent machines (until they take over that chore from us) will be designing clever architectures to combine the relatively simple methods that comprise the building blocks of intelligence.
The Recursive Formula
Here’s a really simple formula to create intelligent solutions to difficult problems. Listen carefully or you might miss it.
The recursive formula is:
For my next step, take my best next step. If I’m done, I’m done.
It may seem too simple, and I’ll admit there’s not much content at first glance. But its power is surprising.
Let’s consider the classical example of a problem addressed by the recursive formula: the game of chess. Chess is considered an intelligent game, at least it was until recently. Most observers are still of the view that it requires intelligence to play a good game. So how does our recursive formula fare in this arena?
Chess is a game played one move at a time. The goal is to make “good” moves. So let’s define a program that makes good moves. By applying the recursive formula to chess, we rephrase it as follows:
PICK MY BEST MOVE: Pick my best move. If I’ve won, I’m done.
Hang in there; this will make sense in a moment. I need to factor in one more aspect of chess, which is that I am not in this alone. I have an opponent. She makes moves, too. Let’s give her the benefit of the doubt and assume that she also makes good moves. If this proves to be wrong, it will be an opportunity, not a problem. So now we have:
PICK MY BEST MOVE: Pick my best move, assuming my opponent will do the same. If I’ve won, I’m done.
At this point, we need to consider the nature of recursion. A recursive rule is one that is defined in terms of itself. A recursive rule is circular, but to be useful we don’t want to go around in circles forever. We need an escape hatch.
To illustrate recursion, let’s consider an example: the simple “factorial” function. To compute factorial of n, we multiply n by factorial of (n—1). That’s the circular part—we have defined this function in terms of itself. We also need to specify that factorial of 1 = 1. That’s our escape hatch.
As an example, let’s compute factorial of 2. According to our definition,
factorial of 2 = 2 times (factorial of 1).
We know directly what (factorial of 1) is, so there’s our escape from infinite recursion. Plugging in (factorial of 1) = 1, we now can write,
factorial of 2 = 2 times 1 = 2.
Returning to chess, we can see that the PICK MY BEST MOVE function is recursive, since we have defined the best move in terms of itself. The deceptively innocuous “If I’ve won, I’m done” part of the strategy is our escape hatch.
Let’s factor in what we know about chess. This is where we carefully consider the definition of the problem. We realize that to pick the best move, we need to start by listing the possible moves. This is not very complicated. The legal moves at any point in the game are defined by the rules. While more complicated than some other games, the rules of chess are straightforward and easily programmed. So we list the moves and pick the best one.
But which is best? If the move results in a win, that will do nicely. So again we merely consult the rules and pick one of the moves that yields an immediate checkmate. Perhaps we are not so lucky and none of the possible moves provides an immediate win. We still need to consider whether or not the move enables me to win or lose. At this point we need to consider the subtle addition we made to our rule, “assuming my opponent will do the same.” After all, my winning or losing is affected by what my opponent might do. I need to put myself in her shoes and pick her best move. How can I do that? This is where the power of recursion comes in. We have a program that does exactly this, called PICK MY BEST MOVE. So we call it to determine my opponent’s best move.
Our program is now structured as follows. PICK MY BEST MOVE generates a list of all possible moves allowed by the rules. It examines each possible move in turn. For each move, it generates a hypothetical board representing what the placement of the pieces would be if that move were selected. Again, this just requires applying the definition of the problem as embodied in the rules of chess. PICK MY BEST MOVE now puts itself in my opponent’s place and calls itself to determine her best move. It then starts to generate all of her possible moves from that board position.
The program thus keeps calling itself, continuing to expand possible moves and countermoves in an ever expanding tree of possibilities. This process is often called a minimax search, because we are alternatively attempting to minimize my opponent’s ability to win and to maximize my own.
Where does this all end? The program just keeps calling itself until every branch of the tree of possible moves and countermoves results in an end of game. Each end of game provides the answer: Win, tie, or draw. At the furthest point of expansion of moves and countermoves, the program encounters moves that finish the game. If a move results in a win, we pick that move. If there are no win moves, then we settle for a tie. If there are no win or tie moves, I continue playing anyway in the hope that my opponent is not perfect like I am.
These final moves are the final branches—called leaves—in our tree of move sequences. Now, instead of continuing to call PICK MY BEST MOVE, the program begins to return from its calls to itself. As it begins to return from all of the nested PICK BEST MOVE calls, it has determined the best move at each point (including the best move for my opponent), and so it can finally select the correct move for the current actual board situation.
So how well a game does this simple program play? The answer is perfect chess. I can’t lose, unless possibly if my opponent goes first and is also perfect. Perfect chess is very good indeed, much better than any mere human. The most complicated part of the PICK MY BEST MOVE function—the only aspect that is not extremely simple—is generating the allowable moves at each point. And this is just a matter of codifying the rules. Essentially, we have determined the answer by carefully defining the problem.
But we’re not done. While playing perfect chess might be considered impressive, it is not good enough. We need to consider how responsive a player PICK MY BEST MOVE will be. If we assume that there are, on average, about 8 possible moves for each board situation, and that a typical game lasts about 30 moves, we need to consider 830 possible move sequences to fully ex
pand the tree of all move-countermove possibilities. If we assume that we can analyze 1 billion board positions per second (a good deal faster than any chess computer today), it would take 1018 seconds, or about 40 billion years, to select each move.
Unfortunately, that’s not regulation play. This approach to recursion is, a bit like evolution—both do great work but are incredibly slow. That’s really not surprising if you think about it. Evolution represents another very simple paradigm, and indeed is another of our simple formulas.
However, before we throw out the recursive formula, let’s attempt to modify it to take into account our human patience and, for the time being, our mortality.
Clearly we need to put limits on how deep we allow the recursion to take place. How large we allow the move-countermove tree to grow needs to depend on how much computation we have available. In this way, we can use the recursive formula on any computer, from a wristwatch computer to a supercomputer.
Limiting the size of this tree means of course that we cannot expand each branch until the end of the game. We need to arbitrarily stop the expansion and have a method of evaluating the “terminal leaves” of an unfinished tree. When we considered fully expanding each move sequence to the end of the game, evaluation was simple: Winning is better than tying, and losing is no good at all. Evaluating a board position in the middle of the game is slightly more complicated. Rather, it is more controversial because here we encounter multiple schools of thought.
The cat in Alice’s Adventures in Wonderland who tells Alice that it doesn’t much matter which way she goes must have been an expert in recursive algorithms. Any halfway reasonable approach works rather well. If, for example, we just add up the piece values (that is, 10 for the queen, 5 for the rook, and so on), we will obtain rather respectable results. Programming the recursive minimax formula using the piece value method of evaluating terminal leaves, as run on your average personal computer circa 1998, will defeat all but a few thousand humans on the planet.
This is what I call the “simple minded” school. This school of thought says: Use a simple method of evaluating the terminal leaves and put whatever computational power we have available into expanding the moves and countermoves as deeply as possible. Another approach is the “complicated minded” school, which says that we need to use sophisticated procedures to evaluate the “quality” of the board at each terminal leaf position.
IBM’s Deep Blue, the computer that crossed this historic threshold, uses a leaf evaluation method that is a good deal more refined than just adding up piece values. However, in a discussion I had with Murray Campbell, head of the Deep Blue team, just weeks prior to its 1997 historic victory, Campbell agreed that Deep Blue’s evaluation method was more simple minded than complicated minded.
MATH LESS “PSEUDO CODE” FOR THE RECURSIVE ALGORITHM
Here is the basic schema for the recursive algorithm. Many variations are possible, and the designer of the system needs to provide certain critical parameters and methods, detailed below.
The Recursive Algorithm
Define a function (program), “PICK BEST NEXT STEP.” The function returns a value of “SUCCESS” (we’ve solved the problem) or “FAILURE” (we didn’t solve it). If it returns with a value of SUCCESS, then the function also returns the sequence of selected steps that solved the problem. PICK BEST NEXT STEP does the foliowing:
PICK BEST NEXT STEP:
• Determine if the program can escape from continued recursion at this point. This bullet and the next two bullets deal with this escape decision. First, determine if the problem has now been solved. Since this call to PICK BEST NEXT STEP probably came from the program calling itself, we may now have a satisfactory solution. Examples are:
(i) In the context of a game (e.g., chess), the last move allows us to win (e.g., checkmate).
(ii) In the context of solving a mathematical theorem, the last step proves the theorem.
(iii) In the context of an artistic program (eg., cybernetic poet or composer), the last step matches the goals for the next word or note.
If the problem has been satisfactorily solved, the program returns with a value of SUCCESS. In this case, PICK BEST NEXT STEP also returns the sequence of steps that caused the success.
• If If the problem has not been solved, determine if a solution is now hopeless. Examples are:
(i) In the context of a game (e.g., chess), this move causes us to lose (e.g., checkmate for the other side).
(ii) In the context of solving a mathematical theorem, this step violates the theorem.
(iii) In the context of an artistic program (e.g., cybernetic poet or composer), this step violates the goals for the next word or note.
If the solution at this point has been deemed hopeless, the program returns with a value of FAILURE.
• If the problem has been neither solved nor deemed hopeless at this point of recursive expansion, determine whether or not the expansion should be abandoned anyway. This is a key aspect of the design and takes into consideration the limited amount of computer time we have to spend. Examples are:
(i) In-the context of a game (e.g., chess), this move puts our side sufficiently “ahead” or “behind.” Making this determination may not be straightforward and is the primary design decision. However, simple approaches (e.g., adding up piece values) can still provide good results. lf the program determines that our side is sufficiently ahead, then PICK BEST NEXT STEP returns in a similar manner to a deter mination that our side has won (i.e, with a value of SUCCESS). If the program determines that our side is sufficiently behind, then Pick BEST NEXT STEP returns in a similar manner to a determination that our side has lost (i.e., with a value of FAILURE).
(ii)-In the context of solving a mathematical theorem, this step involves determining if the sequence of steps in the proof are unlikely to yield a proof .lf so ,then this path should be abandoned, and PICK BEST NEXT STEP returns in a similar manner to a determination that this step violates the theorem (i.e.,with a value of FILURE).There is no “spft” equivalent of success. We can’t return with a Value of SUCCESS until we have actually solved the problem. That’s the nature of math.
(iii) In the context of an artistic program (e.g., cybernetic poet or composer), this step involves determining if the sequence of steps (e.g., words in a poem, notes in a song) is unlikely to satisfy the goals for the next step. If so, then this path should be abandoned, and PICK BEST NEXT STEP returns in a similar manner to a determination that this step violates the goals for the next step (i.e., with a value of FAILURE).
• If PICK BEST NEXT STEP has not returned (because the program has neither determined success nor failure nor made a determination that this path should be abandoned at this point), then we have not escaped from continued recursive expansion. In this case, we now generate a list of all possible next steps at this point. This is where the precise statement of the problem comes in:
(i) In the context of a game (e.g., chess), this involves generating all possible moves for “our” side for the current state of the board. This involves a straightforward codification of the rule of the game.
(ii) In the context of finding a proof for a mathematical theorem, this involves listing the possible axioms or previously proved theorems that can be applied at this point in the solution.
(iii) In the context of a cybernetic art program, this involves listing the possible words/notes/line segments that could be used at this point.
For each such possible next step :
(i) Create the hypothetical situation that would exist if th¡s step were implemented. In a game, this means the hypothetical state of the board. In a mathematical proof, this means adding this step (e.g., axiom) to the proof. In an art program, this means adding this word/note/line segrnent.
(ii) Now call PICK BEST NEXT STEP to examine this hypothetical situation. This is, of course, where the recursion comes in because the program is now calling itself.
(iii) If the above call to PICK BEST
NEXT STEP returns with a value of SUCCESS, then return from the call to PICK BEST NEXT STEP (that we are now in), also with a value of SUCCESS. Otherwise consider the next possible step.
If all the possible next steps have been considered without finding a step that resulted in a return from the call to PICK BEST NEXT STEP with a value of SUCCESS, then return from this call to PICK BEST NEXT STEP (that we are now in) with a value of FAILURE.
END OF PICK BEST NEXT STEP If the original call to PICK BEST NEXT STEP returns with a value of SUCCESS, it will also return the correct sequence of steps:
(i) In the context of a game, the first step in this sequence is the next move you should make.
(ii) In the context of a mathematical proof, the full sequence of steps is the proof.
(iii) In the context of a cybernetic art program, the sequence of steps is your work of art.
If the original call to PICK BEST NEXT STEP is FAILURE, then you need to go back to the drawing board.
Key Design Decisions