Neither Ewan Birney nor Nick Goldman can remember exactly how they came up with the idea of storing all the world’s knowledge in DNA. They know it happened in the bar of the Gastwerk Hotel in Hamburg, and that many beers were involved. They may or may not have scrawled their ideas on a napkin. “It must have involved a pen or pencil because I can’t think without holding one,” says Goldman. “It would’ve involved a lot of hands from me,” says Birney.
Their chat was fuelled by a simple realisation: scientists would soon start amassing more genetic information than they could afford to store. In the 1990s, this problem would have seemed laughable. Back then, it took a decade to sequence the human genome and geneticists could store their data on an Excel spreadsheet. Since then, the relentless improvement in sequencing machines has turned that trickle of genomic data into a full-on flood. This technology doubles in efficiency every six months, allowing you to sequence twice as much DNA for the same amount of money. However, it takes 18 months to get twice as much hard disk for your buck, so it is starting to cost more to store the results of experiments than to actually run them in the first place. “And at some point, not too far in the future, you would run out of either disk space or money,” says Goldman.
That would be a setback for a normal lab and an outright catastrophe for the place where Birney and Goldman work. Located in an isolated campus on the outskirts of Cambridge, UK, the European Bioinformatics Institute (EBI) stores genomic data from labs all over the world. At an internal conference in Hamburg, in April 2010, “you couldn’t move for someone saying the EBI will have to close down the DNA archive because it’s unsustainable”, says Birney.
After the conference, Goldman and Birney retreated to a local pub and started batting around possible solutions, beers in hand. They realised that the big problem was the cycle of obsolescence that all data-storing technologies go through. Old machines are junked in favour of new hardware (remember VCRs?) and any data stored on out-of-date media must be re-read and re-written onto the medium du jour, all at great expense. “We thought: Isn’t there some other nano-machine that would allow us to store digital data?” says Birney. Both of them start laughing—the answer was so obvious. “We said: Duh! It’s going to be DNA.”
Living things have been storing information in DNA since the dawn of life, including the instructions for building every human, animal, bacterium and plant. The molecule itself looks like a twisting ladder, whose rungs are made of four molecules called bases that pair up in specific ways—adenine (A) with thymine (T), and cytosine (C) with guanine (G). If you can create your own strands of DNA, with the ones and zeroes of binary data converted into these As, Gs, Cs, and Ts, you have a storage medium that will never go obsolete. Sequencing machines will continue to improve and will need to be replaced, but once information is stored in DNA, that’s that.
In terms of information density, DNA outclasses anything we’ve been able to invent. A single gram can contain as much data as 3 million CDs. All of the world’s data would fit in the back of a minivan.
And once encoded into DNA, information is a doddle to copy. To transfer the contents of one hard disk into another, you need to hook both of them up to a computer and wait for minutes or hours. To transfer the contents of a tube of DNA, you dissolve it in water, suck up some of the liquid into a pipette, and squeeze it into another tube. It takes seconds. “I could copy a petabyte like this,” says Birney, who mimes depressing his thumb.