Nanopore sequencing technology remains highly error-prone, making efficient error correction essential in DNA-based data storage. Prior work addressed high error rates using convolutional codes with their decoder coupled with the basecaller, but such approaches only accommodate a limited number of code classes and incur significant decoding complexity. To overcome these limitations, we propose two algorithms: PrimerSeeker, which efficiently detects primer sequences in raw nanopore sequencing reads, and SynDe, a decoder that operates on the same raw reads and supports any linear error correction code with a low-complexity graphical representation. PrimerSeeker provides primer location estimates close to those of existing approaches while being better suited for real-time primer detection during sequencing. SynDe performs well with convolutional codes augmented with periodic markers, often approaching or exceeding the performance of existing algorithms with a lower time complexity. Remarkably, the confidence scores produced by SynDe reliably identify which of its outputs should be discarded.
翻译:摘要:纳米孔测序技术仍存在高错误率问题,使得高效纠错成为基于DNA的数据存储中的关键需求。此前研究通过使用卷积码并将其解码器与碱基识别器结合来应对高错误率,但此类方法仅能适配有限数量的码类,且会产生显著的解码复杂度。为克服这些局限性,我们提出两种算法:PrimerSeeker——可高效检测原始纳米孔测序读段中的引物序列;以及SynDe——一种直接处理原始读段的解码器,支持任意具有低复杂度图形表示的线性纠错码。PrimerSeeker提供的引物位置估计值与现有方法接近,且更适用于测序过程中的实时引物检测。SynDe在结合周期性标记的卷积码上表现优异,其性能常接近或超越现有算法,同时具有更低的时间复杂度。值得注意的是,SynDe生成的置信度分数可可靠地识别其输出中应被丢弃的结果。