This is how the method of approximation worked. They took the first of the five genes, say haemoglobin-A (in all cases I use the name of the protein to stand for the gene that codes for that protein). Of all those millions of trees, they wanted to find which was the most ‘parsimonious’ where haemoglobin-A was concerned. Parsimonious here means ‘needing to postulate the minimum amount of evolutionary change’. For example, all those thousands of trees that assumed that the closest cousin to a human was a kangaroo while humans and chimpanzees are more distantly related, proved to be very unparsimonious trees: they needed to assume a lot of evolutionary change, in order to yield the result that kangaroos and humans had a recent common ancestor. Haemoglobin-A’s verdict would be along these lines:

  This is a terribly unparsimonious tree. Not only do I have to put in lots of mutational work in order to end up so different in humans and kangaroos, despite our close cousinship according to this tree, I also have to put in lots of mutational work in the other direction, in order to ensure that, despite their great separation on this particular tree, humans and chimps somehow ended up with such similar haemoglobin-A. I vote against this tree.

  Haemoglobin-A delivers a verdict of this kind, some verdicts more favourable than others, on each of the 34 million trees, and finally ends up choosing a few dozen top-ranking trees. Of each of these top-ranking trees, haemoglobin-A would say something like this:

  This tree puts humans and chimpanzees as close cousins, and it puts sheep and cows as close cousins, and it puts kangaroos out on a limb. This turns out to be a very good tree, because it makes me do hardly any mutational work at all to explain the evolutionary changes. This is an excellently parsimonious tree. It gets the haemoglobin-A vote!

  Of course, it would have been nice if haemoglobin-A, and every other gene, could have come up with a single most parsimonious tree, but that is too much to ask. Among the 34 million trees, it is only to be expected that several slightly different trees should tie for haemoglobin-A’s top-ranking slot.

  Now, how about haemoglobin-B? How about cytochrome-C? Each one of the five proteins is entitled to its own separate vote, to find its own preferred (that is, most parsimonious) trees from among the 34 million trees. It would be perfectly possible for cytochrome-C to come up with a completely different vote on which is the most parsimonious tree. It could turn out that the cytochrome-C of humans really is very similar to that of kangaroos, and very different from that of chimpanzees. Far from saluting the close pairing of sheep and cow discerned by haemoglobin-A, cytochrome-C might find that it hardly needs to mutate at all in order to place sheep very close to, say, monkeys, and in order to place cows very close to rabbits. On the creation hypothesis there is no reason why that shouldn’t happen. But what Penny and his colleagues actually found was that there was astonishingly high agreement among all five proteins (and they used yet more clever statistics to show how unlikely such concordance would be by chance). All five proteins ‘voted’ for pretty much the same subset of trees from among the 34 million possible trees. This is, of course, exactly what we would expect on the assumption that there really is only one true tree relating all eleven animals, and it is the family tree: the tree of evolutionary relationships. What is more, the consensus tree that the five molecules all voted for turned out to be the same as zoologists had already worked out on anatomic and palaeontological, not molecular, grounds.

  The Penny study was published in 1982, quite a while ago now. The intervening years have seen a prolific multiplication of detailed evidence on the exact sequences of genes of lots and lots of species of animals and plants. Agreement on the most parsimonious trees now extends far beyond the eleven species and five molecules that Penny and his colleagues studied. Theirs was just a nice example, overwhelming as their statistical evidence proved. The sum total of genetic sequence data now available puts the matter beyond all conceivable doubt. Far more convincingly even than the (also highly convincing) fossil evidence, the evidence from comparisons among genes is converging, rapidly and decisively, on a single great tree of life. Above is a tree for the eleven species of the Penny study, which represents a modern consensus vote from many different parts of the mammalian genome. It is the consistency of agreement among all the different genes in the genome that gives us confidence, not only in the historical accuracy of the consensus tree itself, but also in the fact that evolution has occurred.

  Family tree for Penny’s eleven species

  If molecular genetic technology continues to expand at its present exponential rate, by the year 2050 deriving the complete sequence of an animal’s genome will be cheap and quick, scarcely any more trouble than taking its temperature or its blood pressure. Why do I say that genetic technology is expanding exponentially? Could we even measure it? There is a parallel in computer technology called Moore’s Law. Named after Gordon Moore, one of the founders of the Intel computer chip company, it can be expressed in various ways because several measures of computer power are linked to each other. One version of the law states that the number of units that can be packed into an integrated circuit of a given size doubles every eighteen months to two years or so. It is an empirical law, meaning that, rather than deriving from some piece of theory, it just turns out to be true when you measure the data. It has held good over a period of about fifty years so far, and many experts think it will do so for at least a few more decades. Other exponential trends, with a similar doubling time, which can be regarded as versions of Moore’s Law, include the increase in speed of computation, and size of memory, per unit cost. Exponential trends always lead to startling results, as Darwin demonstrated when, with the aid of his mathematician son George, he took the elephant as an example of a slow-breeding animal and showed that, in just a few centuries of unrestricted exponential growth, the descendants of just one pair of elephants would carpet the earth. Needless to say, population growth of elephants is not, in practice, exponential. It is limited by competition for food and space, by disease, and by many other things. That, indeed, was Darwin’s whole point, for that is where natural selection steps in.

  But Moore’s Law really has remained in force, at least approximately, for fifty years. Although nobody has a very clear idea why, various measures of computer power actually have increased exponentially in practice, where Darwin’s elephant trend is exponential only in theory. It occurred to me that there might be a similar law in force for genetic technology and the sequencing of DNA. I suggested it to Jonathan Hodgkin, Oxford’s Professor of Genetics (who had once been an undergraduate pupil of mine). To my delight, it turned out that he had already thought of it – and measured it, in preparation for a lecture at his old school. He estimated the cost of sequencing a standard length of DNA at four dates in history, 1965, 1975, 1995 and 2000. I inverted his figures to ‘bangs for the buck’, or ‘How much DNA could you sequence for £1,000?’ I plotted the figures on a logarithmic scale, chosen because an exponential trend will always show up as a straight line when plotted logarithmically. Sure enough, Hodgkin’s four points fall pretty well on a straight line. I fitted a line to the points (for the technique of linear regression, see note on p. 112) and then took the liberty of projecting it on into the future. More recently, just as this book was going to press, I showed this section to Professor Hodgkin, and he told me the most recent data of which he was aware: the duckbilled platypus genome, which was sequenced in 2008 (the platypus was a good choice, because of its strategic position in the tree of life: the ancestor that it shares with us lived 180 million years ago, which is nearly three times as long ago as the extinction of the dinosaurs). I’ve drawn the platypus’s point as a star on the graph, and you can see that it fits pretty well near the projected line that was calculated from the earlier data.

  The slope of the line for what I am now calling (without permission) Hodgkin’s Law is only slightly shallower than that for Moore’s Law. The doubling time is a bit more than two years, where the Moore’s Law doubling time is a bit less than
two years. DNA technology is intensely dependent on computers, so it’s a good guess that Hodgkin’s Law is at least partly dependent on Moore’s Law. The arrows on the right indicate the genome sizes of various creatures. If you follow the arrow towards the left until it hits the sloping line of Hodgkin’s Law, you can read off an estimate of when it will be possible to sequence a genome the same size as the creature concerned for only £1,000 (of today’s money). For a genome the size of yeast’s, we need wait only till about 2020. For a new mammal genome (as far as this kind of back-of-envelope calculation is concerned, all mammals are equally expensive), the estimated date is just this side of 2040. It’s an exhilarating prospect: a massive database of DNA sequences, cheaply and easily obtained from all corners of the animal and plant kingdoms. Detailed DNA comparisons will fill in all the gaps in our knowledge about the actual evolutionary relatedness of every species to every other: we shall know, with complete certainty, the entire family tree of all living creatures.* Goodness knows how we’ll plot it; it won’t fit on any practical-sized sheet of paper.

  ‘Hodgkin’s Law’

  The largest-scale attempt in that direction so far has been made by a group associated with David Hillis, brother of Danny Hillis who pioneered one of the first supercomputers. The Hillis plot makes the tree diagram more compact by wrapping it around in a circle. You can’t see the gap, where the two ends almost meet, but it lies between the ‘bacteria’ and the ‘archaea’. To see how the circular plot works, look at the greatly stripped-down version tattooed on the back of Dr Clare D’Alberto of the University of Melbourne, whose enthusiasm for zoology is more than skin deep. Clare has graciously allowed me to reproduce the photograph in this book (see colour page 25). Her tattoo includes a small sample of eighty-six species (the number of terminal twigs). You can see the gap in the circular plot, and imagine the circle opened out. The smaller number of illustrations around the edge are strategically chosen from bacteria, protozoa, plants, fungi, and four animal phyla. The vertebrates are represented by the weedy sea dragon on the right, a surprising fish, protected by its resemblance to seaweed. The Hillis circular plot is the same, except that it has three thousand species. Their names appear around the outside edge of the circle above, far too small to read – though Homo sapiens is helpfully marked ‘You are here’. You can get an idea of how sparse a sampling of the tree even this huge plot is when I tell you that the closest relatives of humans that it can fit in the circle are rats and mice. The mammals had to be stripped down drastically, in order to fit in all the other branches of the tree to the same depth. Just imagine trying to plot a similar tree with ten million species instead of the three thousand included here. And ten million is not the most extravagant estimate of the number of surviving species. It’s well worth downloading the Hillis tree from his website (see endnotes), and then printing it as a wall hanging, on a piece of paper which, they recommend, should be at least 54 inches wide (even bigger would be an advantage).

  The Hillis plot

  THE MOLECULAR CLOCK

  Now, while we are talking molecules, we have some unfinished business left over from the chapter on evolutionary clocks. There, we looked at tree rings, and at various kinds of radioactive clocks, but we deferred consideration of the so-called molecular clock until we had learned about some other aspects of molecular genetics. The time has now come. Think of this section as an appendix to the chapter on clocks.

  The molecular clock assumes that evolution is true, and that it proceeds at a sufficiently constant rate through geological time to be used as a clock in its own right, provided that it can be calibrated using fossils, which are in turn calibrated with radioactive clocks. Just as a candle clock assumes that candles burn at a fixed and known rate, and a water clock assumes that water drains from a bucket at a rate that can be calibrated, and a grandfather clock assumes that a pendulum swings at a fixed rate, so the molecular clock assumes that there are certain aspects of evolution itself that proceed at a fixed rate. That fixed rate can be calibrated against those parts of the evolutionary record that are well documented with (radioactively datable) fossils. Once calibrated, the molecular clock can then be used for other parts of evolution that are not well documented by fossils. For example, it can be used for animals that don’t have hard skeletons and seldom fossilize.

  Nice idea, but what gives us the right to hope that we can find evolutionary processes that go at a fixed rate? Indeed, much evidence suggests that evolutionary rates are highly variable. Long before the modern era of molecular biology, J. B. S. Haldane proposed the darwin as a measure of evolutionary rates. Suppose that, over evolutionary time, some measured characteristic of an animal is changing in a consistent direction. For example, suppose the mean leg length is increasing. If, over a period of a million years, leg length increases by a factor of e (2.718 . . ., a number chosen for reasons of mathematical convenience, which we needn’t go into),* the rate of evolutionary change is said to be one darwin. Haldane himself assessed the rate of evolution of the horse as approximately 40 millidarwins, while it has been suggested that the evolution of domestic animals under artificial selection should be measured in kilodarwins. The rate of evolution of guppies transplanted to a predator-free stream, as described in Chapter 5, has been estimated as 45 kilodarwins. The evolution of ‘living fossils’ such as Lingula (page 140) is probably to be measured in microdarwins. You get the point: rates of evolution of things that you can see and measure, like legs and beaks, are hugely variable.

  If rates of evolution are so variable, how can we hope to use them as a clock? This is where molecular genetics comes to the rescue. At first sight, it will not be clear how this can be so. When measurable characteristics like leg length evolve, what we are seeing is the outward and visible manifestation of an underlying genetic change. How, then, can it be the case that rates of change at the molecular level provide a good clock while rates of leg or wing evolution don’t? If legs and beaks undergo change at rates ranging from microdarwins to kilodarwins, why should molecules be any more reliable as clocks? The answer is that the genetic changes that manifest themselves in outward and visible evolution – of things like legs and arms – are a very small tip of the iceberg, and they are the tip that is heavily influenced by varying natural selection. The majority of genetic change at the molecular level is neutral, and can therefore be expected to proceed at a rate that is independent of usefulness and might even be approximately constant within any one gene. A neutral genetic change has no effect on the survival of the animal, and this is a helpful credential for a clock. This is because genes that affect survival, positively or negatively, would be expected to evolve at a changed rate, reflecting this.

  When the neutral theory of molecular evolution was first proposed by, among others, the great Japanese geneticist Motoo Kimura, it was controversial. Some version of it is now widely accepted and, without going into the detailed evidence here, I am going to accept it in this book. Since I have a reputation as an arch-‘adaptationist’ (allegedly obsessed with natural selection as the major or even only driving force of evolution) you can have some confidence that if even I support the neutral theory it is unlikely that many other biologists will oppose it!*

  A neutral mutation is one that, although easily measurable by molecular genetic techniques, is not subject to natural selection, either positive or negative. ‘Pseudogenes’ are neutral for one kind of reason. They are genes that once did something useful but have now been sidelined and are never transcribed or translated. They might as well not exist, as far as the animal’s welfare is concerned. But as far as the scientist is concerned they very much exist, and they are exactly what we need for an evolutionary clock. Pseudogenes are only one class of those genes that are never translated in embryology. There are other classes which are preferred by scientists for molecular clocks, but I won’t go into detail. What pseudogenes are useful for is embarrassing creationists. It stretches even their creative ingenuity to make up a convincing reason
why an intelligent designer should have created a pseudogene – a gene that does absolutely nothing and gives every appearance of being a superannuated version of a gene that used to do something – unless he was deliberately setting out to fool us.

  Leaving pseudogenes aside, it is a remarkable fact that the greater part (95 per cent in the case of humans) of the genome might as well not be there, for all the difference it makes. The neutral theory applies even to many of the genes in the remaining 5 per cent – the genes that are read and used. It applies even to genes that are totally vital for survival. I must be clear here. We are not saying that a gene to which the neutral theory applies has no effect on the body. What we are saying is that a mutant version of the gene has exactly the same effect as the unmutated version. However important or unimportant the gene itself may be, the mutated version has the same effect as the unmutated version. Unlike pseudogenes, where the gene itself can properly be described as neutral, we are now talking about cases where it is only mutations (i.e. changes in genes) that can strictly be described as neutral, not genes themselves.

  Mutations can be neutral for various reasons. The DNA code is a ‘degenerate code’. This is a technical term meaning that some code ‘words’ are exact synonyms of each other.* When a gene mutates into one of its synonyms, you might as well not bother to call it a mutation at all. Indeed, it isn’t a mutation, as far as consequences on the body are concerned. And for the same reason it isn’t a mutation at all as far as natural selection is concerned. But it is a mutation as far as molecular geneticists are concerned, for they can see it using their methods. It is as though I were to change the font in which I write a word, say kangaroo to kangaroo. You can still read the word, and it still means the same Australian hopping animal. The change of typeface from Minion to Helvetica is detectable but irrelevant to the meaning.