Labeling of DNA molecules is a fundamental technique for DNA visualization and analysis. This process was mathematically modeled in [1], where the received sequence indicates the positions of the used labels. In this work, we develop error correcting codes for labeled DNA sequences, establishing bounds and constructing explicit systematic encoders for single substitution, insertion, and deletion errors. We focus on two cases: (1) using the complete set of length-two labels and (2) using the minimal set of length-two labels that ensures the recovery of DNA sequences from their labeling for 'almost' all DNA sequences.
翻译:DNA分子标记是DNA可视化与分析的基础技术。该过程在文献[1]中被数学模型化,其中接收序列指示了所用标记的位置。本研究针对标记DNA序列开发纠错编码,建立了单碱基替换、插入及删除错误的界,并构建了显式系统化编码器。我们聚焦于两种情形:(1) 使用完整的长度为二的标记集合;(2) 使用能确保从标记中恢复'几乎所有'DNA序列的最小长度为二的标记集合。