We study the amount of reliable information that can be stored in a DNA-based storage system with noisy sequencing, where each codeword is composed of short DNA molecules. We analyze a concatenated coding scheme, where the outer code is designed to handle the random sampling, while the inner code is designed to handle the random sequencing noise. We assume that the sequencing channel is symmetric and choose the inner coding scheme to be composed by a linear block code and a zero-undetected-error decoder. As a byproduct, the resulting optimal maximum-likelihood decoder land itself for an amenable analysis, and we are able to derive an achievability bound for the scaling of the number of information bits that can be reliably stored. As a result of independent interest, we prove that the average error probability of random linear block codes under zero-undetected-error decoding converges to zero exponentially fast with the block length, as long as its coding rate does not exceed some critical value, which is known to serve as a lower bound to the zero-undetected-error capacity.
翻译:本研究探讨了在基于DNA的存储系统中,通过包含短DNA分子的码字,在存在噪声测序的情况下能够可靠存储的信息量。我们分析了一种级联编码方案,其中外码设计用于处理随机采样,而内码设计用于处理随机测序噪声。我们假设测序通道是对称的,并选择内编码方案由线性分组码和零未检测错误解码器组成。作为副产品,由此产生的最优最大似然解码器便于分析,我们能够推导出可可靠存储信息比特数量缩放的可达性界。作为一项具有独立意义的结果,我们证明了在零未检测错误解码下,随机线性分组码的平均错误概率随码长呈指数级收敛到零,只要其编码率不超过某个临界值,该临界值已知是零未检测错误容量的一个下界。