The Gene

Previous Page Next Page

Sometime late one evening, Cohen inoculated a vat of sterile bacterial broth with a single colony of bacterial cells with the gene hybrids. Cells grew overnight in a shaking beaker. A hundred, a thousand, and then a million copies of the genetic chimera were replicated, each containing a mixture of genetic material from two completely different organisms. The birth of a new world was announced with no more noise than the mechanical tick-tick-tick of a bacterial incubator rocking through the night.

* * *

I. If a gene is added to an SV40 genome, it can no longer generate a virus because the DNA becomes too large to package it into the viral coat or shell. Despite this, the expanded SV40 genome, with its foreign gene, remains perfectly capable of inserting itself, and its payload gene, into an animal cell. It was this property of gene delivery that Berg hoped to use.

II. Mertz’s discovery, made with Ron Davis, involved a fortuitous quality of enzymes such as EcoR1. If she cut the bacterial plasmid and the SV40 genome with EcoR1, she found, the ends came out naturally “sticky,” like complementary pieces of Velcro, thereby making it easier to join them together into gene hybrids.

The New Music

Each generation needs a new music.

—Francis Crick

People now made music from everything.

—Richard Powers, Orfeo

While Berg, Boyer, and Cohen were mixing and matching gene fragments in test tubes at Stanford and UCSF, an equally seminal breakthrough in genetics was emerging from a laboratory in Cambridge, England. To understand the nature of this discovery, we must return to the formal language of genes. Genetics, like any language, is built out of basic structural elements—alphabet, vocabulary, syntax, and grammar. The “alphabet” of genes has only four letters: the four bases of DNA—A, C, G, and T. The “vocabulary” consists of the triplet code: three bases of DNA are read together to encode one amino acid in a protein; ACT encodes Threonine, CAT encodes Histidine, GGT encodes Glycine, and so forth. A protein is the “sentence” encoded by a gene, using alphabets strung together in a chain (ACT-CAT-GGT encodes Threonine-Histidine-Glycine). And the regulation of genes, as Monod and Jacob had discovered, creates a context for these words and sentences to generate meaning. The regulatory sequences appended to a gene—i.e., signals to turn a gene on or off at certain times and in certain cells—can be imagined as the internal grammar of the genome.

But the alphabet, grammar, and syntax of genetics exist exclusively within cells; humans are not native speakers. For a biologist to be able to read and write the language of genes, a novel set of tools had to be invented. To “write” is to mix and match words in unique permutations to generate new meanings. At Stanford, Berg, Cohen, and Boyer were beginning to write genes using gene cloning—generating words and sentences in DNA that had never existed in nature (a bacterial gene combined with a viral gene to form a new genetic element). But the “reading” of genes—the deciphering of the precise sequence of bases in a stretch of DNA—was still an enormous technical hurdle.

Ironically, the very features that enable a cell to read DNA are the features that make it incomprehensible to humans—to chemists, in particular. DNA, as Schrödinger had predicted, was a chemical built to defy chemists, a molecule of exquisite contradictions—monotonous and yet infinitely varied, repetitive to the extreme and yet idiosyncratic to the extreme. Chemists generally piece together the structure of a molecule by breaking the molecule down into smaller and smaller parts, like puzzle pieces, and then assembling the structure from the constituents. But DNA, broken into pieces, degenerates into a garble of four bases—A, C, G, and T. You cannot read a book by dissolving all its words into alphabets. With DNA, as with words, the sequence carries the meaning. Dissolve DNA into its constituent bases, and it turns into a primordial four-letter alphabet soup.

How might a chemist determine the sequence of a gene? In Cambridge, England, in a hutlike laboratory buried half-underground near the fens, Frederick Sanger, the biochemist, had struggled with gene sequencing since the 1960s. Sanger had an obsessive interest in the chemical structures of complex biological molecules. In the early 1950s, Sanger had solved the sequence of a protein—insulin—using a variant of the conventional disintegration method. Insulin, first purified from dozens of pounds of ground-up dog pancreases in 1921 by a Toronto surgeon, Frederick Banting, and his medical student Charles Best, was the grand prize of protein purification—a hormone that, injected into diabetic children, could rapidly reverse their wasting, lethal, sugar-choking disease. By the late 1920s, the pharmaceutical company Eli Lilly was manufacturing grams of insulin out of vast vats of liquefied cow and pig pancreases.

Yet, despite several attempts, insulin remained doggedly resistant to molecular characterization. Sanger brought a chemist’s methodological rigor to the problem: the solution—as any chemist knew—was always in dissolution. Every protein is made of a sequence of amino acids strung into a chain—Methionine-Histidine-Arginine-Lysine or Glycine-Histidine-Arginine-Lysine, and so forth. To identify the sequence of a protein, Sanger realized, he would have to run a sequence of degradation reactions. He would snap off one amino acid from the end of the chain, dissolve it in solvents, and characterize it chemically—Methionine. And he would repeat the process, snapping off the next amino acid: Histidine. The degradation and identification would be repeated again and again—Arginine . . . snap . . . Lysine . . . snap—until he reached the end of the protein. It was like unstringing a necklace, bead by bead—reversing the cycle used by a cell to build a protein. Piece by piece, the disintegration of insulin would reveal the structure of its chain. In 1958, Sanger won the Nobel Prize for this landmark discovery.

Between 1955 and 1962, Sanger used variations of this disintegration method to solve the sequences of several important proteins—but left the problem of DNA sequencing largely untouched. These were his “lean years,” he wrote; he lived in the leeward shadow of his fame. He published rarely—immensely detailed papers on protein sequencing that others characterized as magisterial—but he counted none of these as major successes. In the summer of 1962, Sanger moved to another laboratory in Cambridge—the Medical Research Council (MRC) Building—where he was surrounded by new neighbors, among them Crick, Perutz, and Sydney Brenner, all immersed in the cult of DNA.

The transition of labs marked a seminal transition in Sanger’s focus. Some scientists—Crick, Wilkins—were born into DNA. Others—Watson, Franklin, Brenner—had acquired it. Fred Sanger had DNA thrust upon him.

In the mid-1960s, Sanger switched his focus from proteins to nucleic acids and began to consider DNA sequencing seriously. But the methods that had worked so marvelously for insulin—breaking, dissolving, breaking, dissolving—refused to work for DNA. Proteins are chemically structured such that amino acids can be serially snapped off the chain—but with DNA, no such tools existed. Sanger tried to reconfigure his degradation technique, but the experiments only produced chemical chaos. Cut into pieces and dissolved, DNA turned from genetic information to gobbledygook.

Inspiration came to Sanger unexpectedly in the winter of 1971—in the form of an inversion. He had spent decades learning to break molecules apart to solve their sequence. But what if he turned his own strategy upside down and tried to build DNA, rather than break it down? To solve a gene sequence, Sanger reasoned, one must think like a gene. Cells build genes all the time: each time a cell divides, it makes a copy of every gene. If a biochemist could strap himself to the gene-copying enzyme (DNA polymerase), straddling its back as it made a copy of DNA and keeping tabs as the enzyme added base upon base—A, C, T, G, C, C, C, and so forth—the sequence of a gene would become known. It was like eavesdropping on a copying machine: you could reconstruct the original from the copy. Once again, the mirror image would illuminate the original—Dorian Gray would be re-created, piece upon piece, from his reflection.

In 1971, Sanger began to devise a gene-sequencing technique using the copying reaction of DNA polymerase. (At Harvard, W
alter Gilbert and Allan Maxam were also devising a system to sequence DNA, although using different reagents. Their method also worked, but was soon outmoded by Sanger’s.) At first, Sanger’s method was inefficient and prone to inexplicable failures. In part, the problem was that the copying reaction was too fast: polymerase raced along the strand of DNA, adding nucleotides at such a breakneck pace that Sanger could not catch the intermediate steps. In 1975, Sanger made an ingenious modification: he spiked the copying reaction with a series of chemically altered bases—ever-so-slight variants of A, C, G, and T—that were still recognized by DNA polymerase, but jammed its copying ability. As polymerase stalled, Sanger could use the slowed-down reaction to map a gene by its jams—an A here, a T there, a G there, and so forth—for thousands of bases of DNA.

On February 24, 1977, Sanger used this technique to reveal the full sequence of a virus—ΦX174—in a paper in Nature. Only 5,386 base pairs in length, phi was a tiny virus—its entire genome was smaller than some of the smallest human genes—but the publication announced a transformative scientific advance. “The sequence identifies many of the features responsible for the production of the proteins of the nine known genes of the organism,” he wrote. Sanger had learned to read the language of genes.

The new techniques of genetics—gene sequencing and gene cloning—immediately illuminated novel characteristics of genes and genomes. The first, and most surprising, discovery concerned a unique feature of the genes of animals and animal viruses. In 1977, two scientists working independently, Richard Roberts and Phillip Sharp, discovered that most animal proteins were not encoded in long, continuous stretches of DNA, but were actually split into modules. In bacteria, every gene is a continuous, uninterrupted stretch of DNA, starting with the first triplet code (ATG) and running contiguously to the final “stop” signal. Bacterial genes do not contain separate modules, and they are not split internally by spacers. But in animals, and in animal viruses, Roberts and Sharp found that a gene was typically split into parts and interrupted by long stretches of stuffer DNA.

As an analogy, consider the word structure. In bacteria, the gene is embedded in the genome in precisely that format, structure, with no breaks, stuffers, interpositions, or interruptions. In the human genome, in contrast, the word is interrupted by intermediate stretches of DNA: s . . . tru . . . ct . . . ur . . . e.

The long stretches of DNA marked by the ellipses (. . .) do not contain any protein-encoding information. When such an interrupted gene is used to generate a message—i.e., when DNA is used to build RNA—the stuffer fragments are excised from the RNA message, and the RNA is stitched together again with the intervening pieces removed: s . . . tru . . . ct . . . ur . . . e became simplified to structure. Roberts and Sharp later coined a phrase for the process: gene splicing or RNA splicing (since the RNA message of the gene was “spliced” to remove the stuffer fragments).

At first, this split structure of genes seemed puzzling: Why would an animal genome waste such long stretches of DNA splitting genes into bits and pieces, only to stitch them back into a continuous message? But the inner logic of split genes soon became evident: by splitting genes into modules, a cell could generate bewildering combinations of messages out of a single gene. The word s . . . tru . . . c . . . t . . . ur . . . e can be spliced to yield cure and true and so forth, thereby creating vast numbers of variant messages—called isoforms—out of a single gene. From g . . . e . . . n . . . om . . . e you can use splicing to generate gene, gnome, and om. And modular genes also had an evolutionary advantage: the individual modules from different genes could be mixed and matched to build entirely new kinds of genes (c . . . om . . . e . . . t). Wally Gilbert, the Harvard geneticist, created a new word for these modules; he called them exons. The in-between stuffer fragments were termed introns.

Introns are not the exception in human genes; they are the rule. Human introns are often enormous—spanning several hundreds of thousands of bases of DNA. And genes themselves are separated from each other by long stretches of intervening DNA, called intergenic DNA. Intergenic DNA and introns—spacers between genes and stuffers within genes—are thought to have sequences that allow genes to be regulated in context. To return to our analogy, these regions might be described as long ellipses scattered with occasional punctuation marks. The human genome can thus be visualized as:

This . . . . . . is . . . . . . . . . . . . the . . . . . . (. . .) . . . s . . . truc . . . ture . . . . . . of . . . . . . your . . . . . . gen . . . om . . . e;

The words represent genes. The long ellipses between the words represent the stretches of intergenic DNA. The shorter ellipses within the words (gen . . . ome . . . e) are introns. The parentheses and semicolons—punctuation marks—are regions of DNA that regulate genes.

The twin technologies of gene sequencing and gene cloning also rescued genetics from an experimental jam. In the late 1960s, genetics had found itself caught in a deadlock. Every experimental science depends, crucially, on the capacity to perturb a system intentionally, and to measure the effects of that perturbation. But the only way to alter genes was by creating mutants—essentially a random process—and the only means to read the alteration was through changes in form and function. You could shower fruit flies with X-rays, as Muller had, to make wingless or eyeless flies, but you had no means to intentionally manipulate the genes that controlled eyes or wings, or to understand exactly how the wing or eye gene had been changed. “The gene,” as one scientist described it, “was something inaccessible.”

The inaccessibility of the gene had been particularly frustrating to the messiahs of the “new biology”—James Watson among them. In 1955, two years after his discovery of the structure of DNA, Watson had moved to the Department of Biology at Harvard and instantly raised the hackles of some of its most venerated professors. Biology, as Watson saw it, was a discipline splitting through its middle. On one side sat its old guard—natural historians, taxonomists, anatomists, and ecologists who were still preoccupied by the classifications of animals and by largely qualitative descriptions of organismal anatomy and physiology. The “new” biologists, in contrast, studied molecules and genes. The old school spoke of diversity and variation. The new school: of universal codes, common mechanisms, and “central dogmas.”I

“Each generation needs a new music,” Crick had said; Watson was frankly scornful of the old music. Natural history—a largely “descriptive” discipline, as Watson characterized it—would be replaced by a vigorous, muscular experimental science that he had helped create. The dinosaurs who studied dinosaurs would soon become extinct in their own right. Watson called the old biologists “stamp collectors”—mocking their preoccupation with the collection and classification of biological specimens.II

But even Watson had to admit that the inability to perform directed genetic interventions, or to read the exact nature of gene alterations, was a frustration for new biology. If genes could be sequenced and manipulated, then a vast experimental landscape would be thrown open. Until then, biologists would be stuck probing gene function with the only available tool—the genesis of random mutations in simple organisms. To Watson’s insult, a natural historian might have hurled an equal and opposite injury. If old biologists were “stamp collectors,” then the new molecular biologists were “mutant hunters.”

Between 1970 and 1980, the mutant hunters transformed into gene manipulators and gene decoders. Consider this: In 1969, if a disease-linked gene was found in humans, scientists had no means to understand the nature of the mutation, no mechanism to compare the altered gene to normal form, and no obvious method to reconstruct the gene mutation in a different organism to study its function. By 1979, that same gene could be shuttled into bacteria, spliced into a viral vector, delivered into the genome of a mammalian cell, cloned, sequenced, and compared to the normal form.

In December 1980, in recognition of these seminal advancements in genetic technologies, the Nobel Prize in Chemistry was awarded jointly to Fred Sanger, Wal
ter Gilbert, and Paul Berg—the readers and writers of DNA. The “arsenal of chemical manipulations [of genes],” as one science journalist put it, was now fully stocked. “Genetic engineering,” Peter Medawar, the biologist, wrote, “implies deliberate genetic change brought about by the manipulation of DNA, the vector of hereditary information. . . . Is it not a major truth of technology that anything which is in principle possible will be done . . . ? Land on the moon? Yes, assuredly. Abolish smallpox? A pleasure. Make up for deficiencies in the human genome? Mmmm, yes, though that’s more difficult and will take longer. We aren’t there yet, but we are certainly moving in the right direction.”

The technologies to manipulate, clone, and sequence genes may have been initially invented to shuttle genes between bacteria, viruses, and mammalian cells (à la Berg, Boyer, and Cohen) but the impact of these technologies reverberated broadly through organismal biology. Although the phrases gene cloning or molecular cloning were initially coined to refer to the production of identical copies of DNA (i.e., “clones”) in bacteria or viruses, they would soon become shorthand for the entire gamut of techniques that allowed biologists to extract genes from organisms, manipulate these genes in test tubes, produce gene hybrids, and propagate the genes in living organisms (you could only clone genes, after all, by using a combination of all these techniques). “By learning to manipulate genes experimentally,” Berg said, “you could learn to manipulate organisms experimentally. And by mixing and matching gene-manipulation and gene-sequencing tools, a scientist could interrogate not just genetics, but the whole universe of biology with a kind of experimental audacity that was unimaginable in the past.”

Previous Page Next Page