The Age of Spiritual Machines: When Computers Exceed Human Intelligence
The recursive formula is also a rather good mathematician. Here the goal is to solve a mathematical problem, such as proving a theorem. The rules then become the axioms of the field of math being addressed, as well as previously proved theorems. The expansion at each point is the possible axioms (or previous proved theorems) that can be applied to a proof at each step. This was the approach used by Allen Newell, J. C. Shaw, and Herbert Simon for their 1957 General Problem Solver. Their program outdid Russell and Whitehead on some hard math problems, and thereby fueled the early optimism of the artificial intelligence field.
From these examples, it may appear that recursion is well suited only for problems in which we have crisply defined rules and objectives. But it has also shown promise in computer generation of artistic creations. Ray Kurzweil’s Cybernetic Poet, for example, uses a recursive approach.19 The program establishes a set of goals for each word—achieving a certain rhythmic pattern, poem structure, and word choice that is desirable at that point in the poem. If the program is unable to find a word that meets these criteria, then it backs up and erases the previous word it has written, re-establishes the criteria it had originally set for the word just erased, and goes from there. If that also leads to a dead end, it backs up again. It thus goes backward and forward, hopefully making up its “mind” at some point. Eventually, it forces itself to make up its mind by relaxing some of the constraints if all paths lead to dead ends. After all, no one will ever know if it breaks its own rules.
Recursion is also popular in programs that compose music.20 In this case the “moves” are well defined. We call them notes, which have properties such as pitch, duration, loudness, and playing style. The objectives are less easy to come by but are still feasible by defining them in terms of rhythmic and melodic structures. The key to recursive artistic programs is how we define the terminal leaf evaluation. Simple approaches do not always work well here, and some of the cybernetic art and music programs we will talk about later use complex methods to evaluate the terminal leaves. While we have not yet captured all of intelligence in a simple formula, we have made a lot of progress with this simple combination: recursively defining a solution through a precise statement of the problem and massive computation. For many problems, a personal computer circa end of the twentieth century is massive enough.
Neural Nets: Self-Organization and Human Computing
The neural net paradigm is an attempt to emulate the computing structure of neurons in the human brain. We start with a set of inputs that represents a problem to be solved.21 For example, the input may be a set of pixels representing an image that needs to be identified. These inputs are randomly wired to a layer of simulated neurons. Each of these simulated neurons can be simple computer programs that simulate a model of a neuron in software, or they can be electronic implementations.
Each point of the input (for example, each pixel in an image) is randomly connected to the inputs of the first layer of simulated neurons. Each connection has an associated synaptic strength that represents the importance of this connection. These strengths are also set at random values. Each neuron adds up the signals coming into it. If the combined signal exceeds a threshold, then the neuron fires and sends a signal to its output connection. If the combined input signal does not exceed the threshold, then the neuron does not fire and its output is zero. The output of each neuron is randomly connected to the inputs of the neurons in the next layer. At the top layer, the output of one or more neurons, also randomly selected, provides the answer.
A problem, such as an image of a printed character to be identified, is presented to the input layer, and the output neurons produce an answer. And the responses are remarkably accurate for a wide range of problems.
Actually, the answers are not accurate at all. Not at first, anyway. Initially, the output is completely random. What else would you expect, given that the whole system is set up in a completely random fashion?
I left out an important step, which is that the neural net needs to learn its subject matter. Like the mammalian brains on which it is modeled, a neural net starts out ignorant. The neural net’s teacher, which may be a human, a computer program, or perhaps another, more mature neural net that has already learned its lessons, rewards the student neural net when it is right and punishes it when it is wrong. This feedback is used by the student neural net to adjust the strengths of each interneuronal connection. Connections that were consistent with the right answer are made stronger. Those that advocated a wrong answer are weakened. Over time, the neural net organizes itself to provide the right answers without coaching. Experiments have shown that neural nets can learn their subject matter even with unreliable teachers. If the teacher is correct only 60 percent of the time, the student neural net will still learn its lessons.
If we teach the neural net well, this paradigm is powerful and can emulate a wide range of human pattern-recognition faculties. Character-recognition systems using multilayer neural nets come very close to human performance in identifying sloppily handwritten print.22 Recognizing human faces has long been thought to be an impressive human task beyond the capabilities of a computer, yet there are now automated check-cashing machines, using neural net software developed by a small New England company called Miros, that verify the identity of the customer by recognizing his or her face.23 Don’t try to fool these machines by holding someone else’s picture over your face—the machine takes a three-dimensional picture of you using two cameras. The machines are evidently reliable enough that the banks are willing to have users walk away with real cash.
Neural nets have been applied to medical diagnoses. Using a system called BrainMaker, from California Scientific Software, doctors can quickly recognize heart attacks from enzyme data, and classify cancer cells from images. Neural nets are also adept at prediction—LBS Capital Management uses BrainMaker’s neural nets to predict the Standard & Poor’s 500.24 Their “one day ahead” and “one week ahead” predictions have consistently outperformed traditional, formula-based methods.
There is a variety of self-organizing methods in use today that are mathematical cousins of the neural net model discussed above. One of these techniques, called markov models, is widely used in automatic speech-recognition systems. Today, such systems can accurately understand humans speaking a vocabulary of up to sixty thousand words spoken in a natural continuous manner.
Whereas recursion is proficient at searching through vast combinations of possibilities, such as sequences of chess moves, the neural network is a method of choice for recognizing patterns. Humans are far more skilled at recognizing patterns than in thinking through logical combinations, so we rely on this aptitude for almost all of our mental processes. Indeed, pattern recognition comprises the bulk of our neural circuitry. These faculties make up for the extremely slow speed of human neurons. The reset time on neural firing is about five milliseconds, permitting only about two hundred calculations per second in each neural connection.25 We don’t have time, therefore, to think too many new thoughts when we are pressed to make a decision. The human brain relies on precomputing its analyses and storing them for future reference. We then use our pattern-recognition capability to recognize a situation as comparable to one we have thought about and then draw upon our previously considered conclusions. We are unable to think about matters that we have not thought through many times before.
Destruction of Information: The Key to Intelligence
There are two types of computing transformations, one in which information is preserved and one in which information is destroyed. An example of the former is multiplying one number by another constant number other than zero. Such a conversion is reversible: just divide by the constant and you get back the original number. If, on the other hand, we multiply a number by zero, then the original information cannot be restored. We can’t divide by zero to get the original number back because zero divided by zero is indeterminate. Therefore, this type of transformation destroys its input.
&nb
sp; This is another example of the irreversibility of time (the first was the Law of Increasing Entropy) because there is no way to reverse an information-destroying computation.
The irreversibility of computation is often cited as a reason that computation is useful: It transforms information in a unidirectional, “purposeful” manner. Yet the reason that computation is irreversible is based on its ability to destroy information, not to create it. The value of computation is precisely in its ability to destroy information selectively. For example, in a pattern-recognition task such as recognizing faces or speech sounds, preserving the information-bearing features of a pattern while “destroying” the enormous flow of data in the original image or sound is essential to the process. Intelligence is precisely this process of selecting relevant information carefully so that it can skillfully and purposefully destroy the rest.
That is exactly what the neural net paradigm accomplishes. A neuron—human or machine—receives hundreds or thousands of continuous signals representing a great deal of information. In response to this, the neuron either fires or does not fire, thereby reducing the babble of its input to a single bit of information. Once the neural net has been well trained, this reduction of information is purposeful, useful, and necessary .
We see this paradigm—reducing enormous streams of complex information into a single response of yes or no—at many levels in human behavior and society. Consider the torrent of information that flows into a legal trial. The outcome of all this activity is essentially a single bit of information—guilty or not guilty, plaintiff or defendant. A trial may involve a few such binary decisions, but my point is unaltered. These simple yes-or-no results then flow into other decisions and implications. Consider an election—same thing—each of us receives a vast flow of data (not all of it pertinent, perhaps) and renders a 1-bit decision: incumbent or challenger. That decision then flows in with similar decisions from millions of other voters and the final tally is again a single bit of data.
There is too much raw data in the world to continue to keep all of it around. So we continually destroy most of it, feeding those results to the next level. This is the genius behind the all-or-nothing firing of the neuron.
Next time you do some spring cleaning and attempt to throw away old objects and files, you will know why this is so difficult—the purposeful destruction of information is the essence of intelligent work.
How to Catch a Fly Ball
When a batter hits a fly ball, it follows a path that can be predicted from the ball’s initial trajectory, spin, and speed, as well as wind conditions. The outfielder, however, is unable to measure any of these properties directly and has to infer them from his angle of observation. To predict where the ball will go, and where the fielder should also go, would appear to require the solution of a rather overwhelming set of complex simultaneous equations. These equations need to be constantly recomputed as new visual data streams in. How does a ten-year-old Little Leaguer accomplish this, with no computer, no calculator, no pen and paper, having taken no calculus classes, and having only a few seconds of time?
The answer is, she doesn’t. She uses her neural nets’ pattern-recognition abilities, which provide the foundation for much of skill formation. The neural nets of the ten-year-old have had a lot of practice in comparing the observed flight of the ball to her own actions. Once she has learned the skill, it becomes second nature, meaning that she has no idea how she does it. Her neural nets have gained all the insights needed: Take a step back if the ball has gone above my field of view; take a step forward if the ball is below a certain level in my field of view and no longer rising, and so on. The human ballplayer is not mentally computing equations. Nor is there any such computation going on unconsciously in the player’s brain. What is going on is pattern recognition, the foundation of most human thought.
One key to intelligence is knowing what not to compute. A successful person isn’t necessarily better than her less successful peers at solving problems; her pattern-recognition facilities have just learned what problems are worth solving.
Building Silicon Nets
Most computer-based neural net applications today simulate their neuron models in software. This means that computers are simulating a massively parallel process on a machine that does only one calculation at a time. Today’s neural net software running on inexpensive personal computers can emulate about a million neuron connection calculations per second, which is more than a billion times slower than the human brain (although we can improve on this figure significantly by coding directly in the computer’s machine language). Even so, software using a neural net paradigm on personal computers circa end of the twentieth century comes very close to matching human ability in such tasks as recognizing print, speech, and faces.
There is a genre of neural computer hardware that is optimized for running neural nets. These systems are modestly, not massively, parallel and are about a thousand times faster than neural net software on a personal computer. That’s still about a million times slower than the human brain.
There is an emerging community of researchers who intend to build neural nets the way nature intended: massively parallel, with a dedicated little computer for each neuron. The Advanced Telecommunications Research Lab (ATR), a prestigious research facility in Kyoto, Japan, is building such an artificial brain with a billion electronic neurons. That’s about 1 percent of the number in the human brain, but these neurons will run at electronic speeds, which is about a million times faster than human neurons. The overall computing speed of ATR’s artificial brain will be, therefore, thousands of times greater than the human brain. Hugo de Garis, director of ATR’s Brain Builder Group, hopes to educate his artificial brain in the basics of human language and then set the device free to read—at electronic speeds—all the literature on the Web that interests it.26
Does the simple neuron model we have been discussing match the way human neurons work? The answer is yes and no. On the one hand, human neurons are more complex and more varied than the model suggests. The connection strengths are controlled by multiple neurotransmitters and are not sufficiently characterized by a single number. The brain is not a single organ, but a collection of hundreds of specialized information-processing organs, each having different topologies and organizations. On the other hand, as we begin to examine the parallel algorithms behind the neural organization in different regions, we find that much of the complexity of neuron design and structure has to do with supporting the neuron’s life processes and is not directly relevant to the way it handles information. The salient computing methods are relatively straightforward, although varied. For example, a vision chip developed by researcher Carver Mead appears to realistically capture the early stages of human image processing. 27 Although the methods of this and other similar chips differ in a number of respects from the neuron models discussed above, the methods are understood and readily implemented in silicon. Developing a catalog of the basic paradigms that the neural nets in our brain are using—each relatively simple in its own way—will represent a great advance in our understanding of human intelligence and in our ability to re-create and surpass it.
The Search for Extra Terrestrial Intelligence. (SETI) project is motivated by the idea that exposure to the intelligent designs of intelligent entities that evolved elsewhere will provide a vast resource to advancing scientific understanding.28 But we have an impressive and poorly understood piece of intelligent machinery right here on Earth. One such entity—this author—is no more than three feet from the notebook computer to which I am dictating this book.29 We can—and will—learn a lot by probing its secrets.
Evolutionary Algorithms: Speeding Up Evolution a Millionfold
Here’s an investment tip: Before you invest in a company, be sure to check the track record of the management, the stability of its balance sheet, the company’s earnings history, relevant industry trends, and analyst opinions. On second thought, that’s too much work. Here’s a simpler ap
proach:
First randomly generate (on your personal computer, of course) a million sets of rules for making investment decisions. Each set of rules should define a set of triggers for buying and selling stocks (or any other security) based on available financial data. This is not hard, as each set of rules does not need to make a lot of sense. Embed each set of rules in a simulated software “organism” with the rules encoded in a digital “chromosome.” Now evaluate each simulated organism in a simulated environment by using real-world financial data—you’ll find plenty on the Web. Let each software organism invest some simulated money and see how it fares based on actual historic data. Allow the ones that do a bit better than industry averages to survive into the next generation. Kill off the rest (sorry). Now have each of the surviving ones multiply themselves until we’re back to a million such creatures. As they multiply, allow some mutation (random change) in the chromosomes to occur. Okay, that’s one generation of simulated evolution. Now repeat these steps for another hundred thousand generations. At the end of this process, the surviving software creatures should be darn smart investors. After all, their methods have survived for a hundred thousand generations of evolutionary pruning.
In the real world, a number of successful investment funds now believe that the surviving “creatures” from just such a simulated evolution are smarter than mere human financial analysts. State Street Global Advisors, which manages $3.7 trillion in funds, has made major investments in applying both neural nets and evolutionary algorithms to making purchase-and-sale decisions. This includes a majority stake in Advanced Investment Technologies, which runs a successful fund in which buy-and-sell decisions are made by a program combining these methods.30 Evolutionary and related techniques guide a $95 billion fund managed by Barclays Global Investors, as well as funds run by Fidelity and PanAgora Asset Management.