ResearchHub | Open Science Community

Random access in large-scale DNA data storage

Lee Organick et al.Feb 19, 2018

200 MB of digital data is stored in DNA, randomly accessed and recovered using an error-free approach. Synthetic DNA is durable and can encode digital data with high density, making it an attractive medium for data storage. However, recovering stored data on a large-scale currently requires all the DNA in a pool to be sequenced, even if only a subset of the information needs to be extracted. Here, we encode and store 35 distinct files (over 200 MB of data), in more than 13 million DNA oligonucleotides, and show that we can recover each file individually and with no errors, using a random access approach. We design and validate a large library of primers that enable individual recovery of all files stored within the DNA. We also develop an algorithm that greatly reduces the sequencing read coverage required for error-free decoding by maximizing information from all sequence reads. These advances demonstrate a viable, large-scale system for DNA data storage and retrieval.

Genetics

Artificial Intelligence

0

Paper

Save

Information Decay and Enzymatic Information Recovery for DNA Data Storage

Linda Meiser et al.Mar 4, 2022

ABSTRACT Synthetic DNA has been proposed as a storage medium for digital information due to its high theoretical storage density and anticipated long storage horizons. However, under all ambient storage conditions, DNA undergoes a slow chemical decay process resulting in nicked (broken) DNA strands, and the information stored in these strands is no longer readable. In this work we design an enzymatic repair procedure, which is applicable to the DNA pool prior to readout and can partially reverse the damage. Through a chemical understanding of the decay process, an overhang at the 3’ end of the damaged site is identified as obstructive to repair via the base excision-repair (BER) mechanism. The obstruction can be removed via the enzyme apurinic/apyrimidinic endonuclease I (APE1), thereby enabling repair of hydrolytically damaged DNA via Bst polymerase and Taq ligase. Simulations of damage and repair reveal the benefit of the enzymatic repair step for DNA data storage, especially when data is stored in DNA at high storage densities (= low physical redundancy) and for long time durations.

Biochemistry

Biophysics

3

Paper

Save

Demonstration of End-to-End Automation of DNA Data Storage

Christopher Takahashi et al.Oct 10, 2018

We developed a complete end-to-end DNA data storage device. The device enables the encoding of data, which is then written to a DNA oligonucleotide using a custom DNA synthesizer, pooled for liquid storage, and read using a nanopore sequencer and a novel, minimal preparation protocol. We demonstrate an automated 5-byte write, store, and read cycle with the ability to expand as new technology is available.

Genetics

Molecular Biology

0

Paper

Save

Scaling up DNA data storage and random access retrieval

Lee Organick et al.Mar 7, 2017

Current storage technologies can no longer keep pace with exponentially growing amounts of data. Synthetic DNA offers an attractive alternative due to its potential information density of ~ 1018B/mm3, 107 times denser than magnetic tape, and potential durability of thousands of years. Recent advances in DNA data storage have highlighted technical challenges, in particular, coding and random access, but have stored only modest amounts of data in synthetic DNA. This paper demonstrates an end-to-end approach toward the viability of DNA data storage with large-scale random access. We encoded and stored 35 distinct files, totaling 200MB of data, in more than 13 million DNA oligonucleotides (about 2 billion nucleotides in total) and fully recovered the data with no bit errors, representing an advance of almost an order of magnitude compared to prior work. Our data curation focused on technologically advanced data types and historical relevance, including the Universal Declaration of Human Rights in over 100 languages, a high-definition music video of the band OK Go, and a CropTrust database of the seeds stored in the Svalbard Global Seed Vault. We developed a random access methodology based on selective amplification, for which we designed and validated a large library of primers, and successfully retrieved arbitrarily chosen items from a subset of our pool containing 10.3 million DNA sequences. Moreover, we developed a novel coding scheme that dramatically reduces the physical redundancy (sequencing read coverage) required for error-free decoding to a median of 5x, while maintaining levels of logical redundancy comparable to the best prior codes. We further stress-tested our coding approach by successfully decoding a file using the more error-prone nanopore-based sequencing. We provide a detailed analysis of errors in the process of writing, storing, and reading data from synthetic DNA at a large scale, which helps characterize DNA as a storage medium and justify our coding approach. Thus, we have demonstrated a significant improvement in data volume, random access, and encoding/decoding schemes that contribute to a whole-system vision for DNA data storage.

Artificial Intelligence

Biotechnology

0

Paper

Artificial Intelligence

Biotechnology

0

Save

1

DNA storage in thermoresponsive microcapsules for repeated random multiplexed data access

Bas Bögels et al.Mar 18, 2023

Abstract Owing to its longevity and extremely high information density, DNA has emerged as an attractive medium for archival data storage. Scalable parallel random access of information is a desirable property of any storage system. For DNA-based storage systems, however, this yet has to be robustly established. Here we develop thermoconfined PCR, a novel method that enables multiplexed, repeated random access of compartmentalized DNA files. Our strategy is based on stable localization of biotin-functionalized oligonucleotides inside microcapsules with temperature-dependent membrane permeability. At low temperatures, microcapsules are permeable to enzymes, primers, and amplified products, while at high temperatures membrane collapse prevents molecular crosstalk during amplification. We demonstrate that our platform outperforms non-compartmentalized DNA storage with respect to repeated random access and reducing amplification bias during multiplex PCR. Using fluorescent sorting, we additionally demonstrate sample pooling and data retrieval by barcoding of microcapsules. Our thermoresponsive microcapsule technology offers a scalable, sequence-agnostic approach for repeated random access of archival DNA files.

Ecology

Artificial Intelligence

1

Paper

Ecology

Artificial Intelligence

0

Save

0

A empirical comparison of preservation methods for synthetic DNA data storage

Lee Organick et al.Sep 19, 2020

ABSTRACT Synthetic DNA has recently risen as a viable alternative for long-term digital data storage. To ensure that information is safely recovered after storage, it is essential to appropriately preserve the physical DNA molecules encoding the data. While preservation of biological DNA has been studied previously, synthetic DNA differs in that it is typically much shorter in length, it has different sequence profiles with fewer, if any, repeats (or homopolymers), and it has different contaminants. In this paper we evaluate nine different methods used to preserve data files encoded in synthetic DNA by accelerated aging of nearly 29,000 DNA sequences. In addition to a molecular count comparison, we also sequence and analyze the DNA after aging. Our findings show that errors and erasures are stochastic and show no practical distribution difference between preservation methods. Finally, we compare the physical density of these methods and provide a stability versus density trade-offs discussion.

Genetics

Ecology

0

Paper

Genetics

Ecology

0

Save