In the trace reconstruction problem, one observes the output of passing a binary string $s \in \{0,1\}^n$ through a deletion channel $T$ times and wishes to recover $s$ from the resulting $T$ "traces." Most of the literature has focused on characterizing the hardness of this problem in terms of the number of traces $T$ needed for perfect reconstruction either in the worst case or in the average case (over input sequences $s$). In this paper, we propose an alternative, instance-based approach to the problem. We define the "Levenshtein difficulty" of a problem instance $(s,T)$ as the probability that the resulting traces do not provide enough information for correct recovery with full certainty. One can then try to characterize, for a specific $s$, how $T$ needs to scale in order for the Levenshtein difficulty to go to zero, and seek reconstruction algorithms that match this scaling for each $s$. For a class of binary strings with alternating long runs, we precisely characterize the scaling of $T$ for which the Levenshtein difficulty goes to zero. For this class, we also prove that a simple "Las Vegas algorithm" has an error probability that decays to zero with the same rate as that with which the Levenshtein difficulty tends to zero.
翻译:在迹重建问题中,我们观测二进制字符串$s \in \{0,1\}^n$经过删除信道$T$次后的输出,并希望从得到的$T$条"迹"中恢复$s$。现有文献主要关注刻画该问题的难度,即需要多少条迹$T$才能在最坏情况或平均情况(对输入序列$s$)下实现完美重建。本文提出了一种基于实例的替代方法。我们将问题实例$(s,T)$的"莱文斯坦难度"定义为:所得迹无法提供足够信息以实现完全确定正确恢复的概率。进而可以针对特定的$s$刻画$T$需要如何扩展才能使莱文斯坦难度趋于零,并寻求对每个$s$都能匹配该扩展规律的重构算法。对于一类具有交替长游程的二进制字符串,我们精确刻画了使莱文斯坦难度趋于零的$T$的扩展规律。针对该类字符串,我们还证明了一种简单的"拉斯维加斯算法"其错误概率随莱文斯坦难度趋于零的速率同步衰减至零。