The problem of correcting deletions and insertions has recently received significantly increased attention due to the DNA-based data storage technology, which suffers from deletions and insertions with extremely high probability. In this work, we study the problem of constructing non-binary burst-deletion/insertion correcting codes. Particularly, for the quaternary alphabet, our designed codes are suited for correcting a burst of deletions/insertions in DNA storage. Non-binary codes correcting a single deletion or insertion were introduced by Tenengolts [1984], and the results were extended to correct a fixed-length burst of deletions or insertions by Schoeny et al. [2017]. Recently, Wang et al. [2021] proposed constructions of non-binary codes of length n, correcting a burst of length at most two for q-ary alphabets with redundancy log n+O(log q log log n) bits, for arbitrary even q. The common idea in those constructions is to convert non-binary sequences into binary sequences, and the error decoding algorithms for the q-ary sequences are mainly based on the success of recovering the corresponding binary sequences, respectively. In this work, we look at a natural solution in which the error detection and correction algorithms are performed directly over q-ary sequences, and for certain cases, our codes provide a more efficient encoder with lower redundancy than the best-known encoder in the literature.
翻译:由于基于DNA的数据存储技术面临极高概率的删除与插入错误,近年来针对删除及插入错误校正问题的研究获得了显著关注。本文研究非二元突发删除/插入校正码的构造问题。特别地,针对四元字母表,我们设计的编码适用于DNA存储中的突发删除/插入校正。Tenengolts [1984] 提出了校正单次删除或插入的非二元编码,Schoeny等人 [2017] 将其扩展至固定长度突发删除/插入的校正。近期,Wang等人 [2021] 针对任意偶数q,提出了长度为n、可校正至多长度为2的突发错误的q元非二元编码,其冗余度为log n+O(log q log log n)比特。上述构造的共同思路是将非二元序列转化为二元序列,且q元序列的错误译码算法分别依赖于对应二元序列的成功恢复。本文提出一种直接对q元序列执行错误检测与校正算法的自然方案,在某些情况下,我们的编码相比现有最优编码器具有更低的冗余度与更高的编码效率。