Updated: May 5, 2022
DNA, the natural stockpile of our genetic information could become a more sustainable alternative than our hard drives, USB keys and other data storage methods.
For 50 years, our society has become progressively more digitalized, and along with it the exponential production of digital data. We have produced more data in the last two years than in the entire history of humanity. Current estimates are that more than 64 zettabytes (64 thousand billion gigabytes) of digital data have been produced in the entire world; this is predicted to surpass 175 zettabytes in 2025 (1). The storage of this data has become a key issue in our society and poses a considerable number of problems. Firstly, the colossal amount of energy required to store data on servers has a heavy environmental impact; the consumption of electricity of data centers represents 2% of all greenhouse gas emissions today and is expected to rise to 14% in 2040 (2). A further problem is the short lifespan (5 – 7 years on average (3)) and obsolescence of current data storage methods. CDs represent a clear example, as they are now no longer compatible with the devices now used every day to read data. Therefore, a sustainable solution for data storage is urgently needed!
A Biological Data Storage Medium
An alternative to current data storage methods (flash drives, hard drives, CDs…) would be a “natural hard drive”, present in the cells of all living things: Deoxyribonucleic acid, or DNA.
DNA is a long molecule made of 4 “bricks”, or nucleotides: adenine (A), cytosine (C), guanine (G) and thymine (T). The DNA sequence, or the order in which the different nucleotides appear (A, T, C and G), contains the genetic information which defines our heritable characteristics, such as eye colour, height, or blood group. DNA is therefore a natural data storage medium which has endured through billions of years of evolution.
Similar to our methods of storing photos or videos on a hard drive in binary code, in the format of 0s and 1s, we could store this same information in a DNA molecule. The 0s and 1s of binary would be encoded in a sequence of the letters of DNA, where, for example, the nucleotides A and C would correspond to 0, and G and T to a 1 (4).
A DNA based method of data storage would have several advantages. Firstly, DNA molecules are very compact; 455 billion gigabytes can be stored in a single gram of DNA (5). In theory, all the data produced in the world could be encoded in a coffee cup of DNA. In addition to the advantage of space, DNA storage requires little energy as it is stable at room temperature. This solution has the potential to be more sustainable and environmentally conscious than our current methods. Finally, as all biological systems have used DNA for several billion years, DNA is a method which will not become obsolete over time, unlike traditional data storage methods.
How to store data in a DNA molecule?
Having established that DNA has numerous advantages for data conservation, here is how we could proceed to store our information in this biological medium.
This is the first step, which requires the translation of a sequence of 0s and 1s containing digital information into a sequence of As, Ts, Cs and Gs. Multiple encoding strategies are possible For example, we could encode 2 bits per letter by translating 00, 01, 10 and 11 into A, T, C and G, respectively, or 1 bit per letter by translating 0s in to A or C and 1s into Ts or Gs. Encoding a single bit per base is the most utilized method as it permits the translation of the same message from multiple sequences. This avoids the creation of sequences which are difficult to read or synthesize, such as repetitions of the same nucleotide, or an overabundance of Gs and Cs (6).
The DNA sequence in the form of A, T, C and G is synthesized by a series of chemical reactions. A DNA strand is formed by adding the nucleotides one by one following a given sequence.
3) Information Storage
The information is stored in the form of a DNA molecule. It can be conserved in vitro, in a tube, and will remain stable for a long time at ambient temperature, provided it is kept away from humidity, air and light (7). DNA can also be stored in vivo, for example in a bacterium, which comes with the advantage of natural duplication upon cell division. This allows for information to be copied without the necessity of expensive molecular biology techniques used for in vitro duplication.
Once stored, the information must be able to be accessed. This step can be done thanks to DNA sequencing, largely used in molecular biology, which allows for the “reading” of the sequence of a strand of DNA. The order of the letters A, T, C and G can then be reconverted into 0s and 1s to obtain the digital information.
Based upon this general concept of encoding and decoding digital information in DNA, a number of research groups have worked upon the development and improvement of methods to permit the easy manipulation of DNA sequences. It would be necessary, as an example, to identify and read only the portion of DNA containing the information desired. This is possible using a PCR (polymerase chain reaction) technique which allows for the specific copying of a precise DNA fragment. The PCR is based on the use of short fragments of DNA, “primers”, which specify the region of the strand of DNa which will be copied. Once bound to the DNA stranded, the primers recruit an enzyme which will copy only the region found between the two primers.
In this fashion, two short specific sequences can be added on either side of the DNA sequence storing the information. These two sequences play the role of barcodes allowing for the identification and retrieval of specific information in the middle of long strands of DNA. In effect, these barcodes can be recognized by PCR primers, permitting the amplification of a specific region of interest. It is thus sufficient to sequence the copies amplified by PCR to find the composition and order of the nucleotides, and thus reconstitute the original information. This allows us to retrieve and read information of interest among a multitude of DNA strands.
Storing our Data in DNA: A dream or soon to be reality?
Despite all the advantages presented by DNA for data storage, we have not yet replaced our hard drives with tubes of DNA due to the considerable limits still present on this technology. The principle limiting factors are the speed and the cost of DNA synthesis. This process is estimated to cost 3500 US dollars per megabyte of information (8).
Confronted with these major limitations, numerous research teams are working on the development of novel techniques for DNA synthesis. These include methods for more rapid enzymatic DNA synthesis, allowing for the faster creation of longer DNA fragments than is currently possible with traditional chemical synthesis methods (9).
Today, a full transition towards the use of DNA as a method of data storage, or the idea of a computer based upon such a method, remains a utopia. However, DNA based data storage could be used to archive documents which we wish to conserve across time, but that don’t require regular access. This is the project which Stéphane Lemaire, the director of research at CNRS, and Pierre Crozet, lecturer at Sorbonne Université, have developed over these last years. Copies of the “Déclaration des Droits de l’Homme et du Citoyen” (Declaration of the rights of Man and Citizens) from 1789 and of the “Déclaration des Droits de la Femme et de la Citoyenne” (Declaration of the rights of Women and Citizens) of 1971 by Olympe de Goges, were encoded in DNA and are now conserved in the safe of the French National Archives. Such projects have as yet not been widely replicated, but given the rapid evolution of molecular biology techniques, it is not impossible to imagine that in several years, DNA-based storage of digital DNA could become a preferred solution for archiving documents.
The study of DNA-based data storage is a subject which interests me personally. As a student in a laboratory studying synthetic biology, I have always been fascinated by the idea of turning to the natural world to resolve our societal problems. I am particularly interested in the idea of using DNA, our natural method of storing genetic information, to store our data as we do on our hard drives. Moreover, I regularly use molecular biology techniques, such as DNA sequences and DNA synthesis, and I am always impressed by the innovations in this field which allow us to manipulate DNA more and more easily.
1. IDC (International Data Corporation) report (Novembre 2018).The Digitization of the World From Edge to Core.
2. Datacenters et changement climatique : enjeux et nouvelles limites (Mai 2021). Ostaca Blog. (https://blog.ostraca.fr/datacenters-et-changement-climatique-enjeux-et-nouvelles-limites/)
3. Extance, A. (2016). How DNA could store all the world's data. Nature, 537(7618).
4. Ceze et al (2019). Molecular digital data storage using DNA. Nature Reviews Genetics, https://doi.org/10.1038/s41576-019-0125-3
5. Baker, M. A fresh chapter for organic data storage. Nature (2012).
6. Church, G. M., Gao, Y., & Kosuri, S. (2012). Next-generation digital information storage in DNA. Science, 337(6102), 1628-1628.
7. Bonnet J. (2010). Chain and conformation stability of solid-state DNA: implications for room temperature storage. Nucleic acids research, 38(5), 1531-1546.
8. DNA Data Storage - Integrated information storage technology for writing large amounts of digital information in DNA using an enzyme-driven, sustainable, low-cost approach (https://wyss.harvard.edu/technology/dna-data-storage/)
9. Lee et al (2020). Photon-directed multiplexed enzymatic DNA synthesis for molecular digital data storage. Nature communications. https://doi.org/10.1038/s41587-019-0240-x
Video - DNA Data Storage is the Future! - Simply Explained - Jun 17, 2019 : https://www.youtube.com/watch?v=aPWA-n9oo4k