Don’t be distracted by the racist assumptions of white superiority. These were as unquestioned in the time of Jenkin and Darwin as our speciesist assumptions of human rights, human dignity, and the sacredness of human life are unquestioned today. We can rephrase Jenkin’s argument in a more neutral analogy. If you mix white paint and black paint together, what you get is grey paint. If you mix grey paint and grey paint together, you can’t reconstruct either the original white or the original black. Mixing paints is not so far from the pre-Mendelian vision of heredity, and even today popular culture frequently expresses heredity in terms of a mixing of ‘bloods’. Jenkin’s argument is an argument about swamping. As the generations go by, under the assumption of blending inheritance, variation is bound to become swamped. Greater and greater uniformity will prevail. Eventually there will be no variation left for natural selection to work upon.
Plausible as this argument must have sounded, it is not only an argument against natural selection. It is more an argument against inescapable facts about heredity itself! It manifestly isn’t true that variation disappears as the generations go by. People are not more similar to each other today than they were in their grandparents’ time. Variation is maintained. There is a pool of variation for selection to work on. This was pointed out mathematically in 1908 by W. Weinberg, and independently by the eccentric mathematician G. H. Hardy, who incidentally, as the betting book of his (and my) college records, once took a bet from a colleague of ‘One half penny to his fortune till death, that the sun will rise tomorrow’. But it took R. A. Fisher and his colleagues, the founders of modern population genetics, to develop the full answer to Fleeming Jenkin in terms of Mendel’s theory of particle genetics. This was an irony at the time, because, as we shall see in Chapter 11, the leading followers of Mendel in the early twentieth century thought of themselves as anti-Darwinian. Fisher and his colleagues showed that Darwinian selection made sense, and Jenkin’s problem was elegantly solved, if what changed in evolution was the relative frequency of discrete hereditary particles, or genes, each of which was either there or not there in any particular individual body. Darwinism post-Fisher is called neo-Darwinism. Its digital nature is not an incidental fact that happens to be true of genetic information technology. Digitalness is probably a necessary precondition for Darwinism itself to work.
In our electronic technology the discrete, digital locations have only two states, conventionally represented as 0 and 1 although you can think of them as high and low, on and off, up and down: all that matters is that they should be distinct from one another, and that the pattern of their states can be ‘read out’ so that it can have some influence on something. Electronic technology uses various physical media for storing 1s and 0s, including magnetic discs, magnetic tape, punched cards and tape, and integrated ‘chips’ with lots of little semiconductor units inside them.
The main storage medium inside willow seeds, ants and all other living cells is not electronic but chemical. It exploits the fact that certain kinds of molecule are capable of ‘polymerizing’, that is joining up in long chains of indefinite length. There are lots of different kinds of polymer. For example, ‘polythene’ is made of long chains of the small molecule called ethylene — polymerized ethylene. Starch and cellulose are polymerized sugars. Some polymers, instead of being uniform chains of one small molecule like ethylene, are chains of two or more different kinds of small molecule. As soon as such heterogeneity enters into a polymer chain, information technology becomes a theoretical possibility. If there are two kinds of small molecule in the chain, the two can be thought of as 1 and 0 respectively, and immediately any amount of information, of any kind, can be stored, provided only that the chain is long enough. The particular polymers used by living cells are called polynucleotides. There are two main families of polynucleotides in living cells, called DNA and RNA for short. Both are chains of small molecules called nucleotides. Both DNA and RNA are heterogeneous chains, with four different kinds of nucleotides. This, of course, is where the opportunity for information storage lies. Instead of just the two states 1 and 0, the information technology of living cells uses four states, which we may conventionally represent as A, T, C and G. There is very little difference, in principle, between a two-state binary information technology like ours, and a four-state information technology like that of the living cell.
As I mentioned at the end of Chapter 1, there is enough information capacity in a single human cell to store the Encyclopaedia Britannica, all 30 volumes of it, three or four times over. I don’t know the comparable figure for a willow seed or an ant, but it will be of the same order of staggeringness. There is enough storage capacity in the DNA of a single lily seed or a single salamander sperm to store the Encyclopaedia Britannica 60 times over. Some species of the unjustly called ‘primitive’ amoebas have as much information in their DNA as 1,000 Encyclopaedia Britannicas.
Amazingly, only about 1 per cent of the genetic information in, for example, human cells, seems to be actually used: roughly the equivalent of one volume of the Encyclopaedia Britannica. Nobody knows why the other 99 per cent is there. In a previous book I suggested that it might be parasitic, freeloading on the efforts of the 1 per cent, a theory that has more recently been taken up by molecular biologists under the name of ‘selfish DNA’. A bacterium has a smaller information capacity than a human cell, by a factor of about 1,000, and it probably uses nearly all of it: there is little room for parasites. Its DNA could ‘only’ hold one copy of the New Testament!
Modern genetic engineers already have the technology to write the New Testament or anything else into a bacterium’s DNA. The ‘meaning’ of the symbols in any information technology is arbitrary, and there is no reason why we should not assign combinations, say triplets, from DNA’s 4-letter alphabet, to letters of our own 26-letter alphabet (there would be room for all the upper and lower-case letters with 12 punctuation characters). Unfortunately, it would take about five man-centuries to write the New Testament into a bacterium, so I doubt if anybody will bother. If they did, the rate of reproduction of bacteria is such that 10 million copies of the New Testament could be run off in a single day, a missionary’s dream if only people could read the DNA alphabet but, alas, the characters are so small that all 10 million copies of the New Testament could simultaneously dance upon the surface of a pin’s head.
Electronic computer memory is conventionally classified into ROM and RAM. ROM stands for ‘read only’ memory. More strictly it is ‘write once, read many times’ memory. The pattern of 0s and 1s is ‘burned’ into it once and for all on manufacture. It then remains unchanged throughout the life of the memory, and the information can be read out any number of times. Other electronic memory, called RAM, can be ‘written to’ (one soon gets used to this inelegant computer jargon) as well as read. RAM can therefore do everything that ROM can do, and more. What the letters R A M actually stand for is misleading, so I won’t mention it. The point about RAM is that you can put any pattern of 1s and 0s into any part of it that you like, on as many occasions as you like. Most of a computer’s memory is RAM. As I type these words they are going straight into RAM, and the word-processing program controlling things is also in RAM, although it could theoretically be burned into ROM and then never subsequently altered. ROM is used for a fixed repertoire of standard programs, which are needed again and again, and which you can’t change even if you wanted to.
DNA is ROM. It can be read millions of times over, but only written to once — when it is first assembled at the birth of the cell in which it resides. The DNA in the cells of any individual is ‘burned in’, and is never altered during that individual’s lifetime, except by very rare random deterioration. It can be copied, however. It is duplicated every time a cell divides. The pattern of A,T,C and G nucleotides is faithfully copied into the DNA of each of the trillions of new cells that are made as a baby grows. When a new individual is conceived, a new and unique pattern of data is ‘burned into’ his DNA ROM, a
nd he is then stuck with that pattern for the rest of his life. It is copied into all his cells (except his reproductive cells, into which a random half of his DNA is copied, as we shall see).
All computer memory, whether ‘ROM’ or ‘RAM’, is addressed. This means that every location in the memory has a label, usually a number but this is an arbitrary convention. It is important to understand the distinction between the address and the contents of a memory location. Each location is known by its address. For instance the first two letters of this chapter, ‘It’, are at this moment sitting in RAM locations 6446 and 6447 of my computer, which has 65,536 RAM locations altogether. At another time, the contents of those two locations will be different. The contents of a location is whatever was most recently written in that location. Each ROM location also has an address and a contents. The difference is that each location is stuck with its contents, once and for all.
The DNA is arranged along stringy chromosomes, like long computer tapes. All the DNA in each of our cells is addressed in the same sense as computer ROM, or indeed computer tape, is addressed. The exact numbers or names that we use to label a given address are arbitrary, just as they are for computer memory. What matters is that a particular location in my DNA corresponds precisely to one particular location in your DNA: they have the same address. The contents of my DNA location 321762 may or may not be the same as the contents of your location 321762. But my location 321762 is in precisely the same position in my cells as your location 321762 is in your cells. ‘Position’ here means position along the length of a particular chromosome. The exact physical position of a chromosome in a cell doesn’t matter. Indeed, it floats about in fluid so its physical position varies, but every location along the chromosome is precisely addressed in terms of linear order along the length of the chromosome, just as every location along a computer tape is precisely addressed, even if the tape is strewn around the floor rather than being neatly rolled up. All of us, all human beings, have the same set of DNA addresses, but not necessarily the same contents of those addresses. That is the main reason why we are all different from each other.
Other species don’t have the same set of addresses. Chimpanzees, for instance, have 48 chromosomes compared to our 46. Strictly speaking it is not possible to compare contents, address by address, because addresses don’t correspond to each other across species barriers. Closely related species, however, like chimps and humans, have such large chunks of adjacent contents in common that we can easily identify them as basically the same, even though we can’t use quite the same addressing system for the two species. The thing that defines a species is that all members have the same addressing system for their DNA. Give or take a few minor exceptions, all members have the same number of chromosomes, and every location along the length of a chromosome has its exact opposite number in the same position along the length of the corresponding chromosome in all other members of the species. What can differ among the members of a species is the contents of those locations.
The differences in contents in different individuals come about in the following manner, and here I must stress that I am talking about sexually reproducing species such as our own. Our sperms or eggs each contain 23 chromosomes. Each addressed location in one of my sperms corresponds to a particular addressed location in every other one of my sperms, and in every one of your eggs (or sperms). All my other cells contain 46 — a double set. The same addresses are used twice over in each of these cells. Every cell contains two chromosome 9s, and two versions of location 7230 along chromosome 9. The contents of the two may or may not be the same, just as they may or may not be the same in other members of the species. When a sperm, with its 23 chromosomes, is made from a body cell with its 46 chromosomes, it only gets one of the two copies of each addressed location. Which one it gets can be treated as random. The same goes for eggs. The result is that every sperm produced and every egg produced is unique in terms of the contents of their locations, although their addressing system is identical in all members of one species (with minor exceptions that need not concern us). When a sperm fertilizes an egg, a full complement of 46 chromosomes is, of course, made up; and all 46 are then duplicated in all the cells of the developing embryo.
I said that ROM cannot be written to except when it is first manufactured, and that is true also of the DNA in cells, except for occasional random errors in copying. But there is a sense in which the collective data bank consisting of the ROMs of an entire species can be constructively written to. The nonrandom survival and reproductive success of individuals within the species effectively ‘writes’ improved instructions for survival into the collective genetic memory of the species as the generations go by. Evolutionary change in a species largely consists of changes in how many copies there are of each of the various possible contents at each addressed DNA location, as the generations pass. Of course, at any particular time, every copy has to be inside an individual body. But what matters in evolution is changes in frequency of alternative possible contents at each address in populations. The addressing system remains the same, but the statistical profile of location contents changes as the centuries go by.
Once in a blue moon the addressing system itself changes. Chimpanzees have 24 pairs of chromosomes and we have 23. We share a common ancestor with chimpanzees, so at some point in either our ancestry or chimps’ there must have been a change in chromosome number. Either we lost a chromosome (two merged), or chimps gained one (one split). There must have been at least one individual who had a different number of chromosomes from his parents. There are other occasional changes in the entire genetic system. Whole lengths of code, as we shall see, may occasionally be copied to completely different chromosomes. We know this because we find, scattered around the chromosomes, long strings of DNA text that are identical.
When the information in a computer memory has been read from a particular location, one of two things may happen to it. It can either simply be written somewhere else, or it can become involved in some ‘action’. Being written somewhere else means being copied. We have already seen that DNA is readily copied from one cell to a new cell, and that chunks of DNA may be copied from one individual to another individual, namely its child. ‘Action’ is more complicated. In computers, one kind of action is the execution of program instructions. In my computer’s ROM, location numbers 64489, 64490 and 64491, taken together, contain a particular pattern of contents — 1s and 0s — which when interpreted as instructions, result in the computer’s little loudspeaker uttering a blip sound. This bit pattern is 10101101 00110000 11000000. There is nothing inherently blippy or noisy about that bit pattern. Nothing about it tells you that it will have that effect on the loudspeaker. It has that effect only because of the way the rest of the computer is wired up. In the same way, patterns in the DNA four-letter code have effects, for instance on eye colour or behaviour, but these effects are not inherent in the DNA data patterns themselves. They have their effects only as a result of the way the rest of the embryo develops, which in turn is influenced by the effects of patterns in other parts of the DNA. This interaction between genes will be a main theme of Chapter 7.
Before they can be involved in any kind of action, the code symbols of DNA have to be translated into another medium. They are first transcribed into exactly corresponding RNA symbols. RNA also has a four-letter alphabet. From here, they are translated into a different kind of polymer called a polypeptide or protein. It might be called a polyamino acid, because the basic units are amino acids. There are 20 kinds of amino acids in living cells. All biological proteins are chains made of these 20 basic building-blocks. Although a protein is a chain of amino acids, most of them don’t remain long and stringy. Each chain coils up into a complicated knot, the precise shape of which is determined by the order of amino acids. This knot shape therefore never varies for any given sequence of amino acids. The sequence of amino acids in turn is precisely determined by the code symbols in a length of DNA (via RNA as an intermediary
). There is a sense, therefore, in which the three-dimensional coiled shape of a protein is determined by the one-dimensional sequence of code symbols in the DNA.
The translation procedure embodies the celebrated three-letter ‘genetic code’. This is a dictionary, in which each of the 64 (4 × 4 × 4) possible triplets of DNA (or RNA) symbols is translated into one of the 20 amino acids or a ‘stop reading’ symbol. There are three of these ‘stop reading’ punctuation marks. Many of the amino acids are coded by more than one triplet (as you might have guessed from the fact that there are 64 triplets and only 20 amino acids). The whole translation, from strictly sequential DNA ROM to precisely invariant three-dimensional protein shape, is a remarkable feat of digital information technology. Subsequent steps by which genes influence bodies are a little less obviously computer-like.