The Singularity Is Near: When Humans Transcend Biology
36. The plots in this chapter labeled “Logarithmic Plot” are technically semilogarithmic plots in that one axis (time) is on a linear scale, and the other axis is on a logarithmic scale. However, I am calling these plots “logarithmic plots” for simplicity.
37. See the appendix, “The Law of Accelerating Returns Revisited,” which provides a mathematical derivation of why there are two levels of exponential growth (that is, exponential growth over time in which the rate of the exponential growth—the exponent—is itself growing exponentially over time) in computational power as measured by MIPS per unit cost.
38. Hans Moravec, “When Will Computer Hardware Match the Human Brain?” Journal of Evolution and Technology 1 (1998), http://www.jetpress.org/volume1/moravec.pdf.
39. See note 35 above.
40. Achieving the first MIPS per $1,000 took from 1900 to 1990. We’re now doubling the number of MIPS per $1,000 in about 400 days. Because current price-performance is about 2,000 MIPS per $1,000, we are adding price-performance at the rate of 5 MIPS per day, or 1 MIPS about every 5 hours.
41. “IBM Details Blue Gene Supercomputer,” CNET News, May 8, 2003, http://news.com.com/2100-1008_3-1000421.html.
42. See Alfred North Whitehead, An Introduction to Mathematics (London: Williams and Norgate, 1911), which he wrote at the same time he and Bertrand Russell were working on their seminal three-volume Principia Mathematica.
43. While originally projected to take fifteen years, “the Human Genome Project was finished two and a half years ahead of time and, at $2.7 billion in FY 1991 dollars, significantly under original spending projections”: http://www.ornl.gov/sci/techresources/Human_Genome/project/50yr/
press4_2003.shtml.
44. Human Genome Project Information, http://www.ornl.gov/sci/techresources/Human_Genome/project/
privatesector.shtml; Stanford Genome Technology Center, http://sequence-www.stanford.edu/group/techdev/auto.html; National Human Genome Research Institute, http://www.genome.gov; Tabitha Powledge, “How Many Genomes Are Enough?” Scientist, November 17, 2003, http://www.biomedcentral.com/news/20031117/07.
45. Data from National Center for Biotechnology Information, “GenBank Statistics,” revised May 4, 2004, http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html.
46. Severe acute respiratory syndrome (SARS) was sequenced within thirty-one days of the virus being identified by the British Columbia Cancer Agency and the American Centers for Disease Control. The sequencing from the two centers differed by only ten base pairs out of twenty-nine thousand. This work identified SARS as a coronavirus. Dr. Julie Gerberding, director of the CDC, called the quick sequencing “a scientific achievement that I don’t think has been paralleled in our history.” See K. Philipkoski, “SARS Gene Sequence Unveiled,” Wired News, April 15, 2003, http://www.wired.com/news/medtech/0,1286,58481,00.html?tw=wn_story_related.
In contrast, the efforts to sequence HIV began in the 1980s. HIV 1 and HIV 2 were completely sequenced in 2003 and 2002 respectively. National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/genomes/framik.cgi?db=genome&gi=12171; HIV Sequence Database maintained by the Los Alamos National Laboratory, http://www.hiv.lanl.gov/content/hiv-db/HTML/outline.html.
47. Mark Brader, “A Chronology of Digital Computing Machines (to 1952),” http://www.davros.org/misc/chronology.html; Richard E. Matick, Computer Storage Systems and Technology (New York: John Wiley and Sons, 1977); University of Cambridge Computer Laboratory, EDSAC99, http://www.cl.cam.ac.uk/UoCCL/misc/EDSAC99/statistics.html; Mary Bellis, “Inventors of the Modern Computer: The History of the UNIVAC Computer—J. Presper Eckert and John Mauchly,” http://inventors.about.com/library/weekly/aa062398.htm; “Initial Date of Operation of Computing Systems in the USA (1950–1958),” compiled from 1968 OECD data, http://members.iinet.net.au/~dgreen/timeline.html; Douglas Jones, “Frequently Asked Questions about the DEC PDP-8 computer,” ftp://rtfm.mit.edu/pub/usenet/alt.sys.pdp8/PDP-8_Frequently_Asked_
Questions_%28posted_every_other_month%29; Programmed Data Processor-1 Handbook, Digital Equipment Corporation (1960–1963), http://www.dbit.com/~greeng3/pdp1/pdp1.html #INTRODUCTION; John Walker, “Typical UNIVAC® 1108 Prices: 1968,” http://www.fourmilab.ch/documents/univac/config1108.html; Jack Harper, “LISP 1.5 for the Univac 1100 Mainframe,” http://www.frobenius.com/univac.htm; Wikipedia, “Data General Nova,” http://www.answers.com/topic/data-general-nova; Darren Brewer, “Chronology of Personal Computers 1972–1974,” http://uk.geocities.com/magoos_universe/comp1972.htm; www.pricewatch.com; http://www.jc-news.com/parse.cgi?news/pricewatch/raw/pw-010702; http://www.jc-news.com/parse.cgi?news/pricewatch/raw/pw-020624; http://www.pricewatch.com (11/17/04); http://sharkyextreme.com/guides/WMPG/article.php/10706_2227191_2; Byte advertisements, September 1975–March 1998; PC Computing advertisements, March 1977–April 2000.
48. Seagate, “Products,” http://www.seagate.com/cda/products/discsales/index; Byte advertisements, 1977–1998; PC Computing advertisements, March 1999; Editors of Time-Life Books, Understanding Computers: Memory and Storage, rev. ed. (New York:Warner Books, 1990); “Historical Notes about the Cost of Hard Drive Storage Space,” http://www.alts.net/ns1625/winchest.html; “IBM 305 RAMAC Computer with Disk Drive,” http://www.cedmagic.com/history/ibm-305-ramac.html; John C. McCallum,“Disk Drive Prices (1955–2004),” http://www.jcmit.com/diskprice.htm.
49. James DeRose, The Wireless Data Handbook (St. Johnsbury, Vt.: Quantrum, 1996); First Mile Wireless, http://www.firstmilewireless.com/; J. B. Miles,“Wireless LANs,” Government Computer News 18.28 (April 30, 1999), http://www.gcn.com/vol18_no28/guide/514-1.html; Wireless Week (April 14, 1997), http://www.wirelessweek.com/toc/4%2F14%2F1997; Office of Technology Assessment, “Wireless Technologies and the National Information Infrastructure,” September 1995, http://infoventures.com/emf/federal/ota/ota95-tc.html; Signal Lake, “Broadband Wireless Network Economics Update,” January 14, 2003, http://www.signallake.com/publications/broadbandupdate.pdf; BridgeWave Communications communication, http://www.bridgewave.com/050604.htm.
50. Internet Software Consortium (http://www.isc.org), ISC Domain Survey: Number of Internet Hosts, http://www.isc.org/ds/host-count-history.html.
51. Ibid.
52. Average traffic on Internet backbones in the U.S. during December of each year is used to estimate traffic for the year. A. M. Odlyzko, “Internet Traffic Growth: Sources and Implications,” Optical Transmission Systems and Equipment for WDM Networking II, B. B. Dingel, W. Weiershausen, A. K. Dutta, and K.-I. Sato, eds., Proc. SPIE (The International Society for Optical Engineering) 5247 (2003): 1–15, http://www.dtc.umn.edu/~odlyzko/doc/oft.internet.growth.pdf; data for 2003–2004 values: e-mail correspondence with A. M. Odlyzko.
53. Dave Kristula, “The History of the Internet” (March 1997, update August 2001), http://www.davesite.com/webstation/net-history.shtml; Robert Zakon, “Hobbes’ Internet Timeline v8.0,” http://www.zakon.org/robert/internet/timeline; Converge Network Digest, December 5, 2002, http://www.convergedigest.com/Daily/daily.asp?vn=v9n229&fecha=December%2005,%202002; V. Cerf, “Cerf’s Up,” 2004, http://global.mci.com/de/resources/cerfs_up/.
54. H. C. Nathanson et al., “The Resonant Gate Transistor,” IEEE Transactions on Electron Devices 14.3 (March 1967): 117–33; Larry J. Hornbeck, “128 × 128 Deformable Mirror Device,” IEEE Transactions on Electron Devices 30.5 (April 1983): 539–43; J. Storrs Hall, “Nanocomputers and Reversible Logic,” Nanotechnology 5 (July 1994): 157–67; V. V. Aristov et al., “A New Approach to Fabrication of Nanostructures,” Nanotechnology 6 (April 1995): 35–39; C. Montemagno et al., “Constructing Biological Motor Powered Nanomechanical Devices,” Nano-technology 10 (1999): 225–31, http://www.foresight.org/Conferences/MNT6/Papers/Montemagno/; Celeste Biever, “Tiny ‘Elevator’ Most Complex Nanomachine Yet,” NewScientist.com News Service, March 18, 2004, http://www.newscientist.com/article.ns?id=dn4794.
55. ETC Group, “From Genomes to Atoms: The Big Down,” p. 39, http://www.etcgroup.org/documents/TheBigD
own.pdf.
56. Ibid., p. 41.
57. Although it is not possible to determine precisely the information content in the genome, because of the repeated base pairs it is clearly much less than the total uncompressed data. Here are two approaches to estimating the compressed information content of the genome, both of which demonstrate that a range of thirty to one hundred million bytes is conservatively high.
1. In terms of the uncompressed data, there are three billion DNA rungs in the human genetic code, each coding two bits (since there are four possibilities for each DNA base pair). Thus, the human genome is about 800 million bytes uncompressed. The noncoding DNA used to be called “junk DNA,” but it is now clear that it plays an important role in gene expression. However, it is very inefficiently coded. For one thing, there are massive redundancies (for example, the sequence called “ALU” is repeated hundreds of thousands of times), which compression algorithms can take advantage of.
With the recent explosion of genetic data banks, there is a great deal of interest in compressing genetic data. Recent work on applying standard data compression algorithms to genetic data indicates that reducing the data by 90 percent (for bit-perfect compression) is feasible: Hisahiko Sato et al., “DNA Data Compression in the Post Genome Era,” Genome Informatics 12 (2001): 512–14, http://www.jsbi.org/journal/GIW01/GIW01P130.pdf.
Thus we can compress the genome to about 80 million bytes without loss of information (meaning we can perfectly reconstruct the full 800-million-byte uncompressed genome).
Now consider that more than 98 percent of the genome does not code for proteins. Even after standard data compression (which eliminates redundancies and uses a dictionary lookup for common sequences), the algorithmic content of the noncoding regions appears to be rather low, meaning that it is likely that we could code an algorithm that would perform the same function with fewer bits. However, since we are still early in the process of reverse engineering the genome, we cannot make a reliable estimate of this further decrease based on a functionally equivalent algorithm. I am using, therefore, a range of 30 to 100 million bytes of compressed information in the genome. The top part of this range assumes only data compression and no algorithmic simplification.
Only a portion (although the majority) of this information characterizes the design of the brain.
2. Another line of reasoning is as follows. Though the human genome contains around 3 billion bases, only a small percentage, as mentioned above, codes for proteins. By current estimates, there are 26,000 genes that code for proteins. If we assume those genes average 3,000 bases of useful data, those equal only approximately 78 million bases. A base of DNA requires only two bits, which translate to about 20 million bytes (78 million bases divided by four). In the protein-coding sequence of a gene, each “word” (codon) of three DNA bases translates into one amino acid. There are, therefore, 43 (64) possible codon codes, each consisting of three DNA bases. There are, however, only 20 amino acids used plus a stop codon (null amino acid) out of the 64. The rest of the 43 codes are used as synonyms of the 21 useful ones. Whereas 6 bits are required to code for 64 possible combinations, only about 4.4 (log2 21) bits are required to code for 21 possibilities, a savings of 1.6 out of 6 bits (about 27 percent), bringing us down to about 15 million bytes. In addition, some standard compression based on repeating sequences is feasible here, although much less compression is possible on this protein-coding portion of the DNA than in the so-called junk DNA, which has massive redundancies. So this will bring the figure probably below 12 million bytes. However, now we have to add information for the noncoding portion of the DNA that controls gene expression. Although this portion of the DNA comprises the bulk of the genome, it appears to have a low level of information content and is replete with massive redundancies. Estimating that it matches the approximately 12 million bytes of protein-coding DNA, we again come to approximately 24 million bytes. From this perspective, an estimate of 30 to 100 million bytes is conservatively high.
58. Continuous values can be represented by floating-point numbers to any desired degree of accuracy. A floating-point number consists of two sequences of bits. One “exponent” sequence represents a power of 2. The “base” sequence represents a fraction of 1. By increasing the number of bits in the base, any desired degree of accuracy can be achieved.
59. Stephen Wolfram, A New Kind of Science (Champaign, Ill.: Wolfram Media, 2002).
60. Early work on a digital theory of physics was also presented by Frederick W. Kantor, Information Mechanics (New York: John Wiley and Sons, 1977). Links to several of Kantor’s papers can be found at http://w3.execnet.com/kantor/pm00.htm (1997); http://w3.execnet.com/kantor/1b2p.htm (1989); and http://w3.execnet.com/kantor/ipoim.htm (1982). Also see at http://www.kx.com/listbox/k/msg 05621.html.
61. Konrad Zuse, “Rechnender Raum,” Elektronische Datenverarbeitung, 1967, vol. 8, pp. 336–44. Konrad Zuse’s book on a cellular automaton–based universe was published two years later: Rechnender Raum, Schriften zur Datenverarbeitung (Braunschweig, Germany: Friedrich Vieweg & Sohn, 1969). English translation: Calculating Space, MIT Technical Translation AZT-70-164-GEMIT, February 1970. MIT Project MAC, Cambridge, MA 02139. PDF.
62. Edward Fredkin quoted in Robert Wright, “Did the Universe Just Happen?” Atlantic Monthly, April 1988, 29–44, http://digitalphysics.org/Publications/Wri88a/html.
63. Ibid.
64. Many of Fredkin’s results come from studying his own model of computation, which explicitly reflects a number of fundamental principles of physics. See the classic article Edward Fredkin and Tommaso Toffoli, “Conservative Logic,” International Journal of Theoretical Physics 21.3–4 (1982): 219–53, http://www.digitalphilosophy.org/download_documents/
ConservativeLogic.pdf. Also, a set of concerns about the physics of computation analytically similar to those of Fredkin’s may be found in Norman Margolus,“Physics and Computation,” Ph.D. thesis, MIT/LCS/TR-415, MIT Laboratory for Computer Science, 1988.
65. I discussed Norbert Wiener and Ed Fredkin’s view of information as the fundamental building block for physics and other levels of reality in my 1990 book, The Age of Intelligent Machines.
The complexity of casting all of physics in terms of computational transformations proved to be an immensely challenging project, but Fredkin has continued his efforts. Wolfram has devoted a considerable portion of his work over the past decade to this notion, apparently with only limited communication with some of the others in the physics community who are also pursuing the idea. Wolfram’s stated goal “is not to present a specific ultimate model for physics,” but in his “Note for Physicists” (which essentially equates to a grand challenge), Wolfram describes the “features that [he] believe[s] such a model will have” (A New Kind of Science, pp. 1043–65, http://www.wolframscience.com/nksonline/page-1043c-text).
In The Age of Intelligent Machines, I discuss “the question of whether the ultimate nature of reality is analog or digital” and point out that “as we delve deeper and deeper into both natural and artificial processes, we find the nature of the process often alternates between analog and digital representations of information.” As an illustration, I discussed sound. In our brains, music is represented as the digital firing of neurons in the cochlea, representing different frequency bands. In the air and in the wires leading to loudspeakers, it is an analog phenomenon. The representation of sound on a compact disc is digital, which is interpreted by digital circuits. But the digital circuits consist of thresholded transistors, which are analog amplifiers. As amplifiers, the transistors manipulate individual electrons, which can be counted and are, therefore, digital, but at a deeper level electrons are subject to analog quantum-field equations. At a yet deeper level, Fredkin and now Wolfram are theorizing a digital (computational) basis to these continuous equations.
It should be further noted that if someone actually does succeed in establishing such a digital theory of physics, we would then be tempted to examine what sorts of deepe
r mechanisms are actually implementing the computations and links of the cellular automata. Perhaps underlying the cellular automata that run the universe are yet more basic analog phenomena, which, like transistors, are subject to thresholds that enable them to perform digital transactions. Thus, establishing a digital basis for physics will not settle the philosophical debate as to whether reality is ultimately digital or analog. Nonetheless, establishing a viable computational model of physics would be a major accomplishment.
So how likely is this? We can easily establish an existence proof that a digital model of physics is feasible, in that continuous equations can always be expressed to any desired level of accuracy in the form of discrete transformations on discrete changes in value. That is, after all, the basis for the fundamental theorem of calculus. However, expressing continuous formulas in this way is an inherent complication and would violate Einstein’s dictum to express things “as simply as possible, but no simpler.” So the real question is whether we can express the basic relationships that we are aware of in more elegant terms, using cellular-automata algorithms. One test of a new theory of physics is whether it is capable of making verifiable predictions. In at least one important way, that might be a difficult challenge for a cellular automata–based theory because lack of predictability is one of the fundamental features of cellular automata.