Science-- there's something for everyone

Friday, February 8, 2013

New and improved DNA data storage


To be clear, we’re not talking about storing DNA itself. DNA is actually surprisingly stable and, under the right conditions, can last thousands of years without degrading. No, the idea is to use DNA as a medium for storing other kinds of information. For example, Nick Goldman and his colleagues from the European Bioinformatics Institute have used DNA to store all of Shakespeare’s sonnets, a color photograph and a sound recording of Martin Luther King Junior. The authors believe that their new technique could one day solve all our data archiving needs in perpetuity.

The idea of using DNA to store information is not new. DNA has long been thought an attractive data depository because all it requires for long-term maintenance is a cool, dark environment. You can also fit an amazing amount of data in a small space. The authors estimate that all the data that’s ever been created could fit in the back of one pick-up truck. And best of all, because the nucleotides don’t change, unlike cassette tapes and DVDs, the same decoding technology should work a thousand years from now.

To use DNA for data storage, you simply manufacture DNA using the sequence of As, Ts, Gs and Cs as a code to spell out whatever you wish. To be clear, these synthetic strands of DNA will not encode any genes. That is, like magnetic tape or ink, they will not have any function other than to store or retrieve information. Unfortunately, at this time, it’s exceedingly difficult to synthesize DNA that’s much longer than a hundred bases long, barely enough for a sentence. Almost any data file would have to be broken into a huge number of pieces that would then have to be faithfully joined together. Goldman and his colleagues improved upon this both by creating a novel code and by using four-fold redundancy.

Briefly, the researchers took the information to be DNA-itized (a sonnet in the example below) and converted it first into binary code (shown blue below) and then into a novel trinary code (0s, 1s and 2s, shown in red) where each digit is represented by two nucleotides. The resulting sequence of DNA (green) was synthesized in short overlapping fragments, so that each data point was found in four distinct pieces. Each fragment of DNA contained tags indicating where to fit it in order to regenerate the original sequence. The high degree of redundancy ensured accurate retrieval. 

Digital information encoding in DNA.

Nature PMID: 23354052.

The scientists were able to send their DNA from the U.S. to Germany, where it was correctly reconstructed and decoded.

As of now, even this new method of DNA storage is far too expensive and has too slow a retrieval rate to be of any practical use. The authors have every expectation that this will change. Perhaps in as little as ten years, DNA will be the medium of choice for our data storage needs.

You can read more about this exciting research here.


Goldman N, Bertone P, Chen S, Dessimoz C, Leproust EM, Sipos B, & Birney E (2013). Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature PMID: 23354052.