To make DNA a suitable medium for archival data storage, it is essential to consider the decay process of the strands observed in DNA storage systems. This paper studies the decay process as a probabilistic noisy torn paper channel (TPC), which first corrupts the bits of the transmitted sequence in a probabilistic manner by substitutions, then breaks the sequence into a set of noisy unordered substrings. The present work devises coding schemes for the noisy TPC by embedding markers in the transmitted sequence. We investigate the use of static markers and markers connected to the data in the form of hash functions. These two tools have also been recently exploited to tackle the noiseless TPC. Simulations show that static markers excel at higher substitution probabilities, while data-dependent markers are superior at lower noise levels. Both approaches achieve reconstruction rates exceeding $99\%$ with no false decodings observed, primarily limited by computational resources.
翻译:为使DNA成为适用于档案数据存储的介质,必须考虑DNA存储系统中观察到的链衰变过程。本文将衰变过程建模为概率性噪声撕裂纸信道(TPC),该信道首先通过替换方式以概率形式破坏传输序列的比特位,随后将序列分解为一组无序的噪声子串。本研究通过在传输序列中嵌入标记来设计噪声TPC的编码方案。我们探究了静态标记与通过哈希函数形式关联数据的标记这两种方法的应用。这两种工具近期也被用于解决无噪声TPC问题。仿真结果表明,静态标记在较高替换概率下表现优异,而数据依赖型标记在较低噪声水平下更具优势。两种方案均实现了超过$99\%$的重建率且未观察到错误解码,其主要限制在于计算资源。