This paper studies information rates of noisy duplication channels with memory, motivated by nanopore DNA sequencing. In nanopore sequencing, the measured signal is affected by both inter-symbol interference (ISI), caused by multiple DNA bases residing in the pore, and random sample duplications, where variable translocation speed causes each base to generate a random number of samples. These two effects make direct theoretical analysis difficult. To address this, we derive a new decomposition of the information rate into two interpretable terms: one associated with the intrinsic memory of an auxiliary ISI channel, and another that captures the uncertainty in the segment boundaries caused by random duplications. This decomposition separates the dominant channel distortions and replaces the direct analysis of the full channel with two more readily interpretable components. We then study the second term through a soft alignment functional closely related to Soft-DTW, which enables strong AEP results and an alternative proof of the Markov-constrained coding theorem based on strong information stability. Finally, we develop a lower bound on the information rate that depends on the distribution of jump distances between adjacent nanopore levels. This bound gives a simple geometric explanation of channel synchronisability and provides a tractable framework for computing achievable rates of Oxford nanopore sequencers.
翻译:本文研究了受纳米孔DNA测序启发的带记忆噪声重复信道的信息率。在纳米孔测序中,测量信号同时受到由孔内多个DNA碱基引起的码间干扰(ISI)和随机样本重复的影响,其中可变的易位速度导致每个碱基产生随机数量的样本。这两个效应使得直接理论分析变得困难。为解决此问题,我们推导出一种新的信息率分解方法,将其分为两个可解释项:一项与辅助ISI信道的内在记忆相关,另一项捕捉由随机重复引起的片段边界不确定性。此分解分离了主导信道失真,并用两个更易解释的分量替代了对完整信道的直接分析。随后,我们通过一个与Soft-DTW密切相关的软对齐函数研究第二项,该函数能够实现强AEP结果,并给出基于强信息稳定性的马尔可夫约束编码定理的替代证明。最后,我们推导出信息率的一个下界,该下界依赖于相邻纳米孔电平之间的跳跃距离分布。该界给出了信道可同步性的简单几何解释,并为计算牛津纳米孔测序仪的可达速率提供了可处理的框架。