We consider the problem of binary string reconstruction from the multiset of its substring compositions, i.e., referred to as the substring composition multiset, first introduced and studied by Acharya et al. We introduce a new algorithm for the problem of string reconstruction from its substring composition multiset which relies on the algebraic properties of the equivalent bivariate polynomial formulation of the problem. We then characterize specific algebraic conditions for the binary string to be reconstructed that guarantee the algorithm does not require any backtracking through the reconstruction, and, consequently, the time complexity is bounded polynomially. More specifically, in the case of no backtracking, our algorithm has a time complexity of $O(n^2)$ compared to the algorithm by Acharya et al., which has a time complexity of $O(n^2\log(n))$, where $n$ is the length of the binary string. Furthermore, it is shown that larger sets of binary strings are uniquely reconstructable by the new algorithm and without the need for backtracking leading to codebooks of reconstruction codes that are larger, by a linear factor in size, compared to the previously known construction by Pattabiraman et al., while having $O(n^2)$ reconstruction complexity.
翻译:我们考虑利用子串组成多重集(即 Acharya 等人首次提出并研究的子串组成多重集)进行二进制字符串重建的问题。我们提出了一种新算法,用于从子串组成多重集重建字符串,该算法依赖于问题等价双变量多项式表述的代数性质。随后,我们刻画了待重建二进制字符串的特定代数条件,这些条件保证了算法在重建过程中无需任何回溯,因此时间复杂度受多项式界限约束。具体而言,在无回溯情况下,我们的算法时间复杂度为 $O(n^2)$,而 Acharya 等人的算法时间复杂度为 $O(n^2\log(n))$,其中 $n$ 为二进制字符串长度。进一步表明,新算法能够唯一重建更大规模的二进制字符串集合,且无需回溯,从而构建出比 Pattabiraman 等人先前已知构造更大(规模上呈线性因子)的重建码码本,同时保持 $O(n^2)$ 的重建复杂度。