A Generalized Trace Reconstruction Problem: Recovering a String of Probabilities

We introduce the following natural generalization of trace reconstruction, parameterized by a deletion probability $\delta \in (0,1)$ and length $n$: There is a length $n$ string of probabilities, $S=p_1,\ldots,p_n,$ and each "trace" is obtained by 1) sampling a length $n$ binary string whose $i$th coordinate is independently set to 1 with probability $p_i$ and 0 otherwise, and then 2) deleting each of the binary values independently with probability $\delta$, and returning the corresponding binary string of length $\le n$. The goal is to recover an estimate of $S$ from a set of independently drawn traces. In the case that all $p_i \in \{0,1\}$ this is the standard trace reconstruction problem. We show two complementary results. First, for worst-case strings $S$ and any deletion probability at least order $1/\sqrt{n}$, no algorithm can approximate $S$ to constant $\ell_\infty$ distance or $\ell_1$ distance $o(\sqrt n)$ using fewer than $2^{\Omega(\sqrt{n})}$ traces. Second -- as in the case for standard trace reconstruction -- reconstruction is easy for random $S$: for any sufficiently small constant deletion probability, and any $\epsilon>0$, drawing each $p_i$ independently from the uniform distribution over $[0,1]$, with high probability $S$ can be recovered to $\ell_1$ error $\epsilon$ using $\mathrm{poly}(n,1/\epsilon)$ traces and computation time. We show indistinguishability in our lower bound by regarding a complicated alternating sum (comparing two distributions) as the Fourier transformation of some function evaluated at $\pm \pi,$ and then showing that the Fourier transform decays rapidly away from zero by analyzing its moment generating function.

翻译：我们引入以下迹重构问题的自然推广，其参数为删除概率 $\delta \in (0,1)$ 和长度 $n$：存在一个长度为 $n$ 的概率字符串 $S=p_1,\ldots,p_n$，每条"迹"的生成方式为：1) 采样一个长度为 $n$ 的二进制字符串，其第 $i$ 位以概率 $p_i$ 独立设置为 1，否则为 0；然后 2) 以概率 $\delta$ 独立删除每个二进制值，并返回相应的长度 $\le n$ 的二进制字符串。目标是从一组独立抽取的迹中恢复 $S$ 的估计。当所有 $p_i \in \{0,1\}$ 时，即为标准迹重构问题。我们展示了两个互补的结果。首先，对于最坏情况下的字符串 $S$ 以及任何删除概率至少为 $1/\sqrt{n}$ 量级时，任何算法都无法使用少于 $2^{\Omega(\sqrt{n})}$ 条迹，以常数 $\ell_\infty$ 距离或 $o(\sqrt n)$ 的 $\ell_1$ 距离近似 $S$。其次——与标准迹重构的情况类似——对于随机的 $S$，重构是容易的：对于任何足够小的常数删除概率以及任意 $\epsilon>0$，若每个 $p_i$ 独立地从 $[0,1]$ 上的均匀分布中抽取，则高概率下可以使用 $\mathrm{poly}(n,1/\epsilon)$ 条迹和计算时间将 $S$ 恢复至 $\ell_1$ 误差 $\epsilon$。我们通过将一个复杂的交错和（比较两个分布）视为某个函数在 $\pm \pi$ 处的傅里叶变换，然后通过分析其矩生成函数证明该傅里叶变换在远离零点处快速衰减，从而证明了我们下界中的不可区分性。