One of the primary sequencing methods gaining prominence in DNA storage is nanopore sequencing, attributed to various factors. In this work, we consider a simplified model of the sequencer, characterized as a channel. This channel takes a sequence and processes it using a sliding window of length $\ell$, shifting the window by $\delta$ characters each time. The output of this channel, which we refer to as the read vector, is a vector containing the sums of the entries in each of the windows. The capacity of the channel is defined as the maximal information rate of the channel. Previous works have already revealed capacity values for certain parameters $\ell$ and $\delta$. In this work, we show that when $\delta < \ell < 2\delta$, the capacity value is given by $\frac{1}{\delta}\log_2 \frac{1}{2}(\ell+1+ \sqrt{(\ell+1)^2 - 4(\ell - \delta)(\ell-\delta +1)})$. Additionally, we construct an upper bound when $2\delta < \ell$. Finally, we extend the model to the two-dimensional case and present several results on its capacity.
翻译:在DNA存储中,纳米孔测序因其多种优势而成为日益突出的主要测序方法之一。本研究考虑测序器的简化模型,将其表征为一种信道。该信道接收序列后,通过长度为$\ell$的滑动窗口进行处理,每次移动$\delta$个字符。该信道的输出(称为读取向量)是每个窗口内元素之和构成的向量。信道容量定义为该信道的最大信息速率。先前研究已揭示特定参数$\ell$和$\delta$下的容量值。本研究证明,当$\delta < \ell < 2\delta$时,容量值为$\frac{1}{\delta}\log_2 \frac{1}{2}(\ell+1+ \sqrt{(\ell+1)^2 - 4(\ell - \delta)(\ell-\delta +1)})$。此外,我们构建了$2\delta < \ell$时的上界。最后,将模型推广至二维情形,并给出关于其容量的若干结果。