An Instance-Based Approach to the Trace Reconstruction Problem

from arxiv, 7 pages, part of this paper was presented at the 58th Annual Conference on Information Sciences and Systems (CISS 2024), funding information added in updated document, an error in the presentation of the main results in the CISS 2024 version of the paper is fixed in the updated document

In the trace reconstruction problem, one observes the output of passing a binary string $s \in \{0,1\}^n$ through a deletion channel $T$ times and wishes to recover $s$ from the resulting $T$ "traces." Most of the literature has focused on characterizing the hardness of this problem in terms of the number of traces $T$ needed for perfect reconstruction either in the worst case or in the average case (over input sequences $s$). In this paper, we propose an alternative, instance-based approach to the problem. We define the "Levenshtein difficulty" of a problem instance $(s,T)$ as the probability that the resulting traces do not provide enough information for correct recovery with full certainty. One can then try to characterize, for a specific $s$, how $T$ needs to scale in order for the Levenshtein difficulty to go to zero, and seek reconstruction algorithms that match this scaling for each $s$. We derive a lower bound on the Levenshtein difficulty, and prove that $T$ needs to scale exponentially fast in $n$ for the Levenshtein difficulty to approach zero for a very broad class of strings. For a class of binary strings with alternating long runs, we design an algorithm whose probability of reconstruction error approaches zero whenever the Levenshtein difficulty approaches zero. For this class, we also prove that the error probability of this algorithm decays to zero at least as fast as the Levenshtein difficulty.

翻译：在迹重构问题中，研究者观察一个二进制字符串 $s \in \{0,1\}^n$ 通过删除信道 $T$ 次后产生的输出，并希望从所得的 $T$ 条"迹"中恢复原始字符串 $s$。现有文献主要关注于刻画该问题的困难程度，即实现完美重构所需迹的数量 $T$，通常针对最坏情况或平均情况（对输入序列 $s$ 取平均）进行分析。本文提出一种基于实例的替代性研究思路。我们将问题实例 $(s,T)$ 的"莱文斯坦难度"定义为：在给定 $T$ 条迹的情况下，无法以完全确定性正确恢复 $s$ 的概率。基于此定义，可针对特定字符串 $s$ 分析 $T$ 需要如何增长才能使莱文斯坦难度趋近于零，并寻求能匹配每个 $s$ 对应增长规律的的重构算法。我们推导了莱文斯坦难度的下界，并证明对于非常广泛的字符串类别，为使莱文斯坦难度趋近于零，$T$ 需要随 $n$ 呈指数级增长。针对具有交替长游程的二进制字符串类别，我们设计了一种算法，其重构错误概率在莱文斯坦难度趋近于零时也趋近于零。对于此类字符串，我们还证明该算法的错误概率衰减速度至少与莱文斯坦难度的衰减速度相当。