DNA data storage offers a high-density, long-term alternative to traditional storage systems, addressing the exponential growth of digital data. Composite DNA extends this paradigm by leveraging mixtures of nucleotides to increase storage capacity beyond the four standard bases. In this work, we model composite DNA storage as a multinomial channel and draw an analogy to digital modulation by representing composite letters on the three-dimensional probability simplex. To mitigate errors caused by sampling randomness, we derive transition probabilities and log-likelihood ratios (LLRs) for each constellation point and employ practical channel codes for error correction. We then extend this framework to substitution and insertion-deletion (ID) channels, proposing constellation update rules that account for these additional impairments. Numerical results demonstrate that our approach achieves reliable performance with existing LDPC codes, compared to the prior schemes designed for limited-magnitude probability errors, whose performance degrades significantly under sampling randomness.
翻译:DNA数据存储作为一种高密度、长期存储方案,为应对数字数据的指数级增长提供了传统存储系统之外的替代选择。复合DNA通过利用核苷酸混合物扩展了这一范式,使存储容量超越四种标准碱基的限制。本研究将复合DNA存储建模为多项分布信道,并通过在三维概率单纯形上表示复合字母,与数字调制技术进行类比。为减轻由采样随机性引起的误差,我们推导了各星座点的转移概率与对数似然比(LLR),并采用实用信道编码进行纠错。随后,我们将该框架扩展至替换信道及插入-删除(ID)信道,提出了能够适应这些额外损伤的星座更新规则。数值结果表明,相较于先前针对有限幅度概率误差设计的方案(其在采样随机性下性能显著恶化),我们的方法采用现有LDPC编码即可实现可靠的存储性能。