In recent years, widespread attention has been drawn to the challenge of correcting insertion, deletion, and substitution (IDS) errors in DNA-based data storage. Among various IDS-correcting codes, Varshamov-Tenengolts (VT) codes, originally designed for single-error correction, have been established as a central research focus. While existing decoding methods demonstrate high accuracy for single-error correction, they are typically not applicable to the correction of multiple IDS errors. In this work, the latent capability of VT codes for multiple-error correction is investigated through a statistic-enhanced Transformer-based VT decoder (VT-Former), utilizing both symbol and statistic feature embeddings. Experimental results demonstrate that VT-Former achieves nearly 100\% accuracy on correcting single errors. For multi-error decoding tasks across various codeword lengths, improvements in both frame accuracy and bit accuracy are observed, compared to conventional hard-decision and soft-in soft-out decoding algorithms. Furthermore, while lower decoding latency is exhibited by the base model compared to traditional soft decoders, the architecture is further optimized in this study to enhance decoding efficiency and reduce computational overhead.
翻译:近年来,DNA数据存储中插入、删除与替换(IDS)错误的校正问题受到广泛关注。在众多IDS纠错码中,最初为单错误校正设计的Varshamov-Tenengolts(VT)码已成为研究焦点。尽管现有解码方法在单错误校正中展现出高精度,但通常无法适用于多IDS错误校正。本研究通过统计增强型Transformer基VT解码器(VT-Former),利用符号特征与统计特征嵌入,探索了VT码在多错误校正中的潜在能力。实验结果表明,VT-Former在单错误校正中实现了近乎100%的准确率。在不同码字长度的多错误解码任务中,与传统的硬判决和软输入软输出解码算法相比,其帧准确率和比特准确率均有所提升。此外,虽然基础模型的解码延迟低于传统软解码器,但本研究进一步优化了架构,以提高解码效率并降低计算开销。