Osborn was on a mission to find several elusive species, including a bioluminescent worm called Poeobius, and to sequence their genes for a global database of DNA. “We need the genome to figure out how these things are related to each other,” she explained. “Once we have that tree, we can start asking interesting questions about how those animals evolved, how they’ve changed through time, how they’ve adapted to their habitats.” Eventually, such genomes could inspire profound innovations, from new crops to medical cures. Osborn was starting to worry, however: she had already made several trips in the submarine and had not seen a single Poeobius. Each worm measures just a few centimetres in length and feeds on marine snow, or organic detritus that falls from the surface. Because it is yellow on one end, like a cigarette, it is sometimes called the butt worm.
As the pilot steered into deeper waters, Osborn operated a suction hose at the end of a robotic arm. Whenever she spotted organisms that she wanted to sample—crustaceans, sea butterflies, jellies—she’d suck them through a tube and into a collection box that was filled with seawater. She started to wish that the submarine had a rest room on board. Then, a few hundred metres down, she finally saw a group of Poeobius. “Oh, that’s what we want!” she remembers exclaiming. “Go! Go get that!” The pilot slowly turned the sub and Osborn sucked up the worms.
Back on the ship, even before using the rest room, Osborn deposited her boxes in an onboard laboratory. “It’s always exciting to climb out and go look at all the samplers, and take them into the lab and see what animals you’ve gotten,” she told me. She placed one of the Poeobius worms under a microscope, anesthetized it, sliced off a bit of gelatinous tissue, and placed it into a vial, which contained a liquid that would protect the DNA from deterioration. (The butt worm did not survive.) Back at the Smithsonian, a team would extract the genetic material and sequence it. It would soon become a new branch on a growing tree of life.
The evolution of life on Earth—a process that has spanned billions of years and innumerable strands of DNA—could be considered the biggest experiment in history. It has given rise to amoebas and dinosaurs; fireflies and flytraps; even mammals that look like ducks and fish that look like horses. These species have solved countless ecological problems, finding novel ways to eat, evade, defend, compete, and multiply. Their genomes contain information that humans could use to reconstruct the origins of life, develop new foods and medicines and materials, and even save species that are dying out. But we are also losing much of the data; humans are one of the main causes of an ongoing mass extinction. More than forty thousand animal, fungal, and plant species are considered threatened—and those are just the ones we know about.
Osborn is part of a group of scientists who are mounting a kind of scientific salvage mission. It is known as the Earth BioGenome Project, or E.B.P., and its goal is to sequence a genome from every plant, animal, and fungus on the planet, as well as from many single-celled organisms, such as algae, retrieving the results of life’s grand experiment before it’s too late. “This is a completely wonderful and insane goal,” Hank Greely, a Stanford law professor who works with the E.B.P., told me. The effort, described by its organizers as a “moonshot for biology,” will likely cost billions of dollars—yet it does not currently have any direct funding, and depends instead on the volunteer work of scientists who do. Researchers will need to scour oceans, deserts, and rain forests to collect samples before species die out. And, as new species are discovered, the task of sequencing all of them will only grow. “That’s a heavy aspiration that will probably never be entirely achieved,” Greely, who is seventy-one, told me. “It’s like, when you’re my age, planting a young oak tree in your yard. You’re not going to live to see that be a mature oak, but your hope is somebody will.”
For hundreds of years, biologists have roamed the globe in an epic effort to collect and categorize the life on Earth. In the seventeen-hundreds, after traversing Sweden to document its flora and fauna, Carl Linnaeus helped create the system that scientists still use to classify and name species, from Homo sapiens to Poeobius meseres. In 1831, Charles Darwin set out aboard H.M.S. Beagle to collect living and fossilized specimens, which inspired his theory of natural selection. The discovery of DNA, in the nineteenth century, offered a new way to classify species: by comparing their genetic material. DNA’s four building blocks—adenine (A), thymine (T), guanine (G), and cytosine (C)—encode profound differences between organisms. By studying their sequence, we might come to speak life’s language.
Scientists didn’t even begin to sequence a DNA molecule until 1968. In 1977, they sequenced the roughly five thousand base pairs in a virus that invades bacteria. And, in 1990, the Human Genome Project started the thirteen-year process of sequencing almost all of the three billion base pairs in our DNA. Its organizers called the endeavor “one of the most ambitious scientific undertakings of all time, even compared to splitting the atom or going to the moon.” Since then, researchers have been filling in gaps and improving the quality of their sequences, in part by using a new format known as a telomere-to-telomere, or T2T, genome. The first T2T human genome was sequenced only last year, but already scientists with the Earth BioGenome Project are talking about repeating this process for every known eukaryotic species. (Eukaryotes are organisms whose cells have nuclei.)
Because the E.B.P. does not have its own funding, it does not sample or sequence species on its own. Instead, it’s a network of networks; its organizers set ethical and scientific standards for more than fifty projects, including the Darwin Tree of Life, Vertebrate Genomes Project, the African BioGenome Project, and the Butterfly Genome Project. This way, “when we get to the end of the project, it’s not the Tower of Babel,” Harris Lewin, an evolutionary biologist at the University of California, Davis, who chairs the E.B.P. executive council, told me. “You know—your genomes are produced this way, and mine are produced that way, and they’re of different quality, so that, when you compare them, you get different results.”
By 2025, the participants hope to assemble about nine thousand sequences, one from every known family of eukaryotes. By 2029, they aim to have one sequence from every genus—a hundred and eighty thousand in all. After the third and final phase, which could be completed a decade from now, they aim to have sequenced all 1.8 million species that scientists have documented so far. (Roughly eighty per cent of eukaryotic species are still undiscovered.) This database of genomes, including annotations and metadata, will require close to an exabyte of data, or as much as two hundred million DVDs. The amount of information involved is more than “astronomical,” Lewin said; it’s “genomical.” He compared the project to the Webb Space Telescope, which received about ten billion dollars of government funding. Given how much these projects change the way that humans see the world, Lewin said, “the cost is really not that much.”
Natural-history museums already have some of the samples needed to outline a genetic tree of life. The Smithsonian, for instance, has about fifty million biological samples. But, because DNA degrades quickly, it’s difficult to extract a high-quality sequence from, say, a frog in formaldehyde or an old taxidermy parrot. For this reason, the E.B.P. usually restricts itself to recent samples, which are often frozen. It relies on the Global Genome Biodiversity Network to keep track of who has what; another database, called Genomes on a Tree, tracks which species have been sequenced already, and whether they meet exacting standards. Scientists such as Osborn will have to find the rest—and their jobs will only become more difficult as the low-hanging fruit is plucked.
After Osborn collected her butt worms, she had to transport them to her colleagues at the Smithsonian. This process can be more difficult than it sounds. Many researchers keep their samples intact by packing them with dry ice or liquid nitrogen in the field; airport-security workers sometimes flag these packages as suspicious, leading to delays that can spoil the DNA and waste an expedition. Osborn, for her part, checked a large insulated box on the flight from Cape Verde, and then waited a few hours in Newark for Fish and Wildlife officials to approve it for entry. As it turned out, her samples came from an entirely new species of Poeobius; a paper announcing the discovery is forthcoming.
The first stop in the journey from sample to sequence is a genetics laboratory such as the Vertebrate Genome Lab, at the Rockefeller University, on the eastern shore of Manhattan. On a drizzly day last May, I visited the V.G.L. to see how scientists turn a bit of animal tissue into a string of billions of letters. Olivier Fedrigo, a bespectacled geneticist who was then the lab’s director, led me down a hallway decorated with photos of species that had been sequenced there: a snake, a swan, a shark. It was a kind of trophy wall on which inclusion signified not death but a kind of immortality.
Researchers extract DNA from animal tissue in a biosafety-level-two room, which requires goggles, gloves, coats, and special ventilation to protect people and samples. Nivesh Jain, a scientist who works there, told me that he minces the tissue and places it in a lysis buffer—a chemical that breaks open cells—and then uses one of two methods to get the DNA out. The first is a type of microscopic magnetic bead, which is treated with chemicals that help it stick to genetic material; magnets hold the beads and their attached DNA in place while Jain washes everything else away. The second is a glass wafer called a Nanobind disk, which similarly sticks to DNA while Jain removes the rest of the sample. When we met, Jain was standing at a lab bench, checking the concentration of DNA in a vial. The vial would then go to another room, where Jennifer Balacco, the lab-operation lead, would pipette pieces of extracted DNA into little plastic tubes. Special enzymes attach short, recognizable pieces of DNA, called adapters, to the animal DNA, which readies them for the sequencer.
Finally, the samples travel into refrigerator-size PacBio sequencing machines, which, in this case, were labelled with nicknames from “Star Trek.” Enzymes latch onto the adapters and traverse the strands, attaching a color-coded molecule to every building block of DNA. The machine detects the colors and “reads” the sequence that they represent.
It’s not enough to sequence DNA in pieces: scientists must figure out how each fragment connects to make a genome. Genomes tend to be bundled up in complicated shapes. A technique called Hi-C mapping “helps you to sort out the puzzle pieces,” Fedrigo told me. The resulting map of folded DNA is crowded with colorful squiggles. At some computers down the hall from the sequencers, the maps help another team of researchers assemble sequence fragments into a full T2T genome. Nadolina Brajuka, a bioinformatician, was assembling an Asian-elephant genome. “I can physically use key and mouse controls and pick pieces of the genome up and move them around,” she said. The last step is for a “data wrangler” on the team to upload the raw-sequence data file, the final genome assembly, and background information about the sample—including where, when, and how it was collected, and a photo of the species—to a public server called GenomeArk.