Motivated by applications in in-vivo DNA storage, we study codes for correcting duplications. A reverse-complement duplication of length $k$ is the insertion of the reversed and complemented copy of a substring of length $k$ adjacent to its original position, while a palindromic duplication only inserts the reversed copy without complementation. We first construct an explicit code with a single redundant symbol capable of correcting an arbitrary number of reverse-complement duplications (respectively, palindromic duplications), provided that all duplications have length $k \ge 3\lceil \log_q n \rceil$ and are disjoint. Next, we derive a Gilbert-Varshamov bound for codes that can correct a reverse-complement duplication (respectively, palindromic duplication) of arbitrary length, showing that the optimal redundancy is upper bounded by $2\log_q n + \log_q\log_q n + O(1)$. Finally, for $q \ge 4$, we present two explicit constructions of codes that can correct $t$ length-one reverse-complement duplications. The first construction achieves a redundancy of $2t\log_q n + O(\log_q\log_q n)$ with encoding complexity $O(n)$ and decoding complexity $O\big(n(\log_2 n)^4\big)$. The second construction achieves an improved redundancy of $(2t-1)\log_q n + O(\log_q\log_q n)$, but with encoding and decoding complexities of $O\big(n \cdot \mathrm{poly}(\log_2 n)\big)$.
翻译:受体内DNA存储应用的启发,我们研究了用于纠正重复错误的编码。长度为$k$的反向互补重复是指将长度为$k$的子串的反向互补副本插入到其原始位置相邻处,而回文重复仅插入反向副本而不进行互补操作。我们首先构造了一个仅需单个冗余符号的显式编码,该编码能够纠正任意数量的反向互补重复(相应地,回文重复),前提是所有重复的长度满足$k \ge 3\lceil \log_q n \rceil$且互不重叠。接着,我们推导了能够纠正任意长度反向互补重复(相应地,回文重复)的编码的Gilbert-Varshamov界,证明最优冗余度的上界为$2\log_q n + \log_q\log_q n + O(1)$。最后,对于$q \ge 4$,我们提出了两种能够纠正$t$个长度为1的反向互补重复的显式编码构造。第一种构造实现了$2t\log_q n + O(\log_q\log_q n)$的冗余度,其编码复杂度为$O(n)$,解码复杂度为$O\big(n(\log_2 n)^4\big)$。第二种构造实现了改进的冗余度$(2t-1)\log_q n + O(\log_q\log_q n)$,但其编码和解码复杂度均为$O\big(n \cdot \mathrm{poly}(\log_2 n)\big)$。