Data storage is reminiscent of hard drives, USB sticks and data centers. They store billions and billions of texts and images, encoded in digital bits. Our bodies also own trillions of tiny data centers. Information is stored in every cell nucleus in tens of thousands of genes: data about eye colour, gender and height, among other things, can be found in our DNA.

This natural storage medium keeps this information intact for tens of thousands of years. “But DNA also offers this possibility for novels, holiday photos and cat films,” says Tom de Greef, professor of synthetic biology at Eindhoven University of Technology. At the Institute for Complex Molecular Systems, De Greef and his research group are improving the technique for storing data in synthetic DNA.

Last year, such a 90 zettabytes (ZB) of data, which is equivalent to 90 trillion byte or 90 billion terabytes. In 2025 that will have doubled to 180 ZB. It is expected to come up in that year a storage ceiling. This means that choices must follow about which data is and is not stored. Moreover, building additional data centers is not a sustainable solution because of their large-scale energy and water consumption.

The information density of DNA is unparalleled Tom Greef TU Eindhoven

According to De Greef, the data centers of the future are laboratories with spheres filled with DNA. “It may sound like science fiction, but I can already imagine that robot arms are busy pipetting DNA back and forth to read data.”

In theory, one gram of DNA can store 455 exabytes (455 million terabytes) of data, in practice the maximum is now at 17 exabytes per gram. If that does not increase in the near future, you will still only need 26 kilograms of DNA for all the data produced by 2025. “The information density of DNA is unparalleled: about fifty million times higher than a hard disk,” says De Greef.

Another advantage: DNA lasts longer than a hard disk. “Instead of the ten to twenty years that hard drives last, DNA remains stable at room temperature for about a thousand years.”

Barely energy needed

The DNA is not stored in living cells, but in wells, small laboratory dishes, so that no mutations take place, explains de Greef. Thanks to the stability of DNA at room temperature, hardly any energy is needed to maintain data. Only writing and reading costs energy.

The technique also has its limitations. Reading DNA is time consuming. “It can just take a day before you have retrieved the data. Our hard drives at home will therefore not replace DNA. DNA as a storage medium is especially useful for archival storage, such as old film collections of film companies or secret documents of intelligence services, which you only need to access a few times a year.” This so-called ‘cold data’ now covers more than 60 percent of the total data production.

How exactly does that work, storing data in DNA? Suppose you start with an image. You can describe the color of each pixel with a binary computer code. DNA does not work with ones and zeros, but with the nucleotides A, C, G and T. Those letters can encode two consecutive bits. For example, a computer algorithm translates ’00’ into A, ’01’ becomes C, ’10’ G and ’11’ T. Then DNA strands are synthesized with the correct nucleotide sequence and voilà, your digital image is stored in DNA code.

The price is currently falling exponentially Tom Greef TU Eindhoven

In terms of technology, this is already feasible, says Jesse Middelwijk, biotechnologist at a DNA synthesis company. Middelwijk itself is not involved in research into DNA as a data storage medium. “We can add each nucleotide to the strand with an efficiency of 99.5 percent and thus determine the correct sequence.”

If you want to access your image again, a sequencing device reads out the nucleotide sequence of these strands. Because sequencers make small errors, this is preceded by multiplication with a PCR reaction: the sequencers determine the sequence of several copies of the same DNA strand, after which an algorithm filters out the reading errors. A computer then translates the letter code back to pixels via ones and zeros.

“You would need a very long DNA strand to store a file, which increases the risk of writing errors,” explains De Greef. The solution is to divide the file into chunks of 150 nucleotides. “65,000 of those DNA strands together form a file of about one megabyte,” says De Greef.

Errors in the code

Storing and reading one such file in a well works well, but multiple DNA files in the same well causes a problem. If you read several files from one well at the same time, the PCR reaction will make errors due to overlap in the DNA code. So you have to choose. Or store different files in different wells, but that takes up a lot of space. Or settle for only reading one file at a time, but that takes a lot of time.

The Eindhoven team solved this PCR problem by storing DNA in polymer spheres measuring thirty to forty micrometres. Multiple DNA files can thus be placed in different spheres in one well without any problems. The DNA strands that together code one file are anchored on the inside of the capsules. “At a temperature of 50°C, those bubbles close, which creates a physical separation between different files. The PCR reaction takes place within it, almost flawlessly. When we lower the temperature, the bubbles open. Then you can wash the DNA copies made from the spheres for reading, while the original DNA strands remain anchored to the spheres,” De Greef describes.

Thousand euros for a megabyte

In this way you can read the stored data parallel and repeatedly. That is a quality that all storage media must possess, explains De Greef. “Whereas before we could only read one file at a time, there are now 25. And the dots also reduce the loss of data. After three readings, 35 percent of the DNA was lost. With the spheres, that loss is less than 1 percent.”

Although these hurdles have been overcome, no DNA data centers are yet in operation. The synthesis of DNA is extremely expensive: De Greef currently estimates that storing a megabyte at a thousand euros. “The price is currently falling exponentially and if that continues, I estimate that the first DNA data center will be available in five to ten years,” says de Greef.

Many chemicals are currently still needed to obtain pure DNA Jesse Middelwijk biotechnologist

Middelwijk is more reserved. “Dna synthesis is getting cheaper, but it alternates between impulses and flat pieces, so that is difficult to predict.” He also comments on sustainability. “A lot of chemicals are currently still needed to obtain pure DNA. A new synthesis method with enzymes is a lot less polluting, but it is still in its infancy.” De Greef expects that the synthesis with enzymes will reduce the costs of DNA synthesis.

Middelwijk is enthusiastic about the idea behind the technology. “We can learn so much from how DNA stores information. Not only by the nucleotides themselves, but also how certain pieces of DNA are and are not read in our body.”