DNA, with remarkable properties of high density, durability, and replicability, is one of the most appealing storage media. Emerging DNA storage technologies use composite DNA letters, where information is represented by probability vectors, leading to higher information density and lower synthesizing costs than regular DNA letters. However, it faces the problem of inevitable noise and information corruption. This paper explores the channel of composite DNA letters in DNA-based storage systems and introduces block codes for limited-magnitude probability errors on probability vectors. First, outer and inner bounds for limited-magnitude probability error correction codes are provided. Moreover, code constructions are proposed where the number of errors is bounded by t, the error magnitudes are bounded by l, and the probability resolution is fixed as k. These constructions focus on leveraging the properties of limited-magnitude probability errors in DNA-based storage systems, leading to improved performance in terms of complexity and redundancy. In addition, the asymptotic optimality for one of the proposed constructions is established. Finally, systematic codes based on one of the proposed constructions are presented, which enable efficient information extraction for practical implementation.
翻译:DNA以其高密度、高耐久性和可复制性等显著特性,成为最具吸引力的存储介质之一。新兴的DNA存储技术采用复合DNA字母,信息通过概率向量表示,相比常规DNA字母具有更高的信息密度和更低的合成成本。然而,该方法面临不可避免的噪声和信息损坏问题。本文探究DNA存储系统中复合DNA字母的信道特性,并引入针对概率向量上有限幅度概率错误的块编码。首先,给出了有限幅度概率纠错码的外界和内界。此外,提出了编码构造方案,其中错误数量以t为界,错误幅度以l为界,概率分辨率固定为k。这些构造方案充分利用DNA存储系统中有限幅度概率错误的特性,在复杂度和冗余度方面实现了更优的性能。同时,验证了其中一种构造方案的渐近最优性。最后,基于所提出构造方案之一展示了系统化编码,为实际应用中的高效信息提取提供了支持。