Adaptive Sampling for Storage of Progressive Images on DNA

The short lifespan of traditional data storage media, coupled with an exponential increase in storage demand, has made long-term archival a fundamental problem in the data storage industry and beyond. Consequently, researchers are looking for innovative media solutions that can store data over long time periods at a very low cost. DNA molecules, with their high density, long lifespan, and low energy needs, have emerged as a viable alternative to digital data archival. However, current DNA data storage technologies are facing challenges with respect to cost and reliability. Thus, coding rate and error robustness are critical to scale DNA storage and make it technologically and economically achievable. Moreover, the molecules of DNA that encode different files are often located in the same oligo pool. Without random access solutions at the oligo level, it is very impractical to decode a specific file from these mixed pools, as all oligos need to first be sequenced and decoded before a target file can be retrieved, which greatly deteriorates the read cost. This paper introduces a solution to efficiently encode and store images into DNA molecules, that aims at reducing the read cost necessary to retrieve a resolution-reduced version of an image. This image storage system is based on the Progressive Decoding Functionality of the JPEG2000 codec but can be adapted to any conventional progressive codec. Each resolution layer is encoded into a set of oligos using the JPEG DNA VM codec, a DNA-based coder that aims at retrieving a file with a high reliability. Depending on the desired resolution to be read, the set of oligos as well as the portion of the oligos to be sequenced and decoded are adjusted accordingly. These oligos will be selected at sequencing time, with the help of the adaptive sampling method provided by the Nanopore sequencers, making it a PCR-free random access solution.

翻译：传统数据存储介质的短暂寿命与存储需求的指数级增长，使得长期归档成为数据存储行业乃至更广泛领域的根本性难题。因此，研究人员正在寻求能够在极低成本下长期存储数据的创新介质解决方案。DNA分子凭借其高密度、长寿命和低能耗特性，已成为数字数据归档的可行替代方案。然而，当前的DNA数据存储技术在成本和可靠性方面仍面临挑战。因此，编码速率和错误鲁棒性对于扩展DNA存储规模并实现其技术与经济可行性至关重要。此外，编码不同文件的DNA分子通常位于同一寡核苷酸池中。若缺乏寡核苷酸层面的随机访问方案，从这些混合池中解码特定文件将极为不便——因为需要先对所有寡核苷酸进行测序和解码才能提取目标文件，这显著恶化了读取成本。本文提出了一种将图像高效编码并存储至DNA分子的解决方案，旨在降低检索降分辨率版本图像所需的读取成本。该图像存储系统基于JPEG2000编解码器的渐进解码功能，但可适配任何传统渐进式编解码器。每个分辨率层通过JPEG DNA VM编解码器（一种旨在高可靠性检索文件的DNA编码器）编码为一组寡核苷酸。根据待读取的目标分辨率，相应调整待测序解码的寡核苷酸集合及其比例。这些寡核苷酸将在测序时借助纳米孔测序仪提供的自适应采样方法进行选择，从而形成无需PCR的随机访问解决方案。