We introduce the Random Subsequence Model, a spin glass model on pairs of random strings $(X,Y) \in \{0,1\}^N \times \{0,1\}^M$ whose partition function counts subsequence embeddings of $Y$ into $X$. We study two variants: the null model, where $X$ and $Y$ are independent and uniform, and the planted model, where $X$ is uniform and $Y$ is a uniformly-random length-$M$ subsequence of $X$. We connect the Random Subsequence Model to longstanding problems in various fields, including the best rate achievable by uniformly-random codes in the deletion channel, the longest common subsequence problem between two random strings, and models of directed polymers in statistical physics. In the regime where $N,M\to\infty$ at a fixed ratio $α= M/N \in (0,1)$, we exhibit strict asymptotic separations between the null annealed free energy and the quenched free energies of the null and planted models at all values of the density parameter $α$. This suggests that these models are in a spin glass phase at zero temperature throughout the entire dense regime. As a consequence, we show that uniformly-random codes achieve a positive rate in the deletion channel for all deletion probabilities $p\in [0,1),$ settling multiple conjectures of the second author, Isik and Weissman (2024) and proving the first such positive rate result for the regime $p \geq 1/2$. We also give an exact analytic formula for the annealed free energy of the planted model for all values of the density parameter. This implies a corresponding analytic upper bound on the best rate achievable by uniformly-random codes in the deletion channel, complementing the lower bound from our first result. Our upper and lower bounds for the capacity of the deletion channel under uniform codes are far closer to each other than the best known upper and lower bounds for the capacity of the deletion channel.
翻译:我们引入随机子序列模型(Random Subsequence Model),这是一个针对随机字符串对 $(X,Y) \in \{0,1\}^N \times \{0,1\}^M$ 的自旋玻璃模型,其配分函数统计将 $Y$ 嵌入 $X$ 的子序列方式。我们研究两种变体:零模型(null model),其中 $X$ 和 $Y$ 独立且均匀分布;以及植入模型(planted model),其中 $X$ 均匀分布,$Y$ 是 $X$ 中均匀随机选取的长度为 $M$ 的子序列。我们将随机子序列模型与多个领域的长期问题联系起来,包括删除信道中均匀编码可达的最佳速率、两个随机字符串之间的最长公共子序列问题,以及统计物理中的有向聚合物模型。在 $N,M\to\infty$ 且固定比例 $\alpha = M/N \in (0,1)$ 的范围内,我们展示了在密度参数 $\alpha$ 的所有取值下,零模型的退火自由能与零模型及植入模型的淬火自由能之间存在严格的渐近分离。这表明这些模型在整个稠密区域处于零温自旋玻璃相。作为推论,我们证明了对于所有删除概率 $p\in [0,1)$,均匀编码在删除信道中实现正速率,从而解决了第二作者、Isik 和 Weissman (2024) 的多个猜想,并首次证明了 $p \geq 1/2$ 区域的正速率结果。我们还给出了植入模型退火自由能在所有密度参数取值下的精确解析公式。这为均匀编码在删除信道中的最佳可达速率提供了相应的解析上界,补充了我们第一个结果中的下界。我们的均匀编码删除信道容量的上下界,比已知的删除信道容量上下界相互之间更为接近。