In DNA-based data storage, DNA codes with biochemical constraints and error correction are designed to protect data reliability. Single-stranded DNA sequences with secondary structure avoidance (SSA) help to avoid undesirable secondary structures which may cause chemical inactivity. Homopolymer run-length limit and GC-balanced limit also help to reduce the error probability of DNA sequences during synthesizing and sequencing. In this letter, based on a recent work \cite{bib7}, we construct DNA codes free of secondary structures of stem length $\geq m$ and have homopolymer run-length $\leq\ell$ for odd $m\leq11$ and $\ell\geq3$ with rate $1+\log_2\rho_m-3/(2^{\ell-1}+\ell+1)$, where $\rho_m$ is in Table \ref{tm}. In particular, when $m=3$, $\ell=4$, its rate tends to 1.3206 bits/nt, beating a previous work by Benerjee {\it et al.}. We also construct DNA codes with all of the above three constraints as well as single error correction. At last, codes with GC-locally balanced constraint are presented.
翻译:在基于DNA的数据存储中,设计具有生化约束和错误校正能力的DNA编码以保障数据可靠性。具有二级结构规避特性的单链DNA序列有助于避免可能引起化学惰性的不良二级结构。均聚物长度限制和GC平衡限制也能降低DNA序列在合成和测序过程中的错误概率。基于近期工作\cite{bib7},本文针对奇数$m\leq11$且$\ell\geq3$的情况,构建了不含茎长$\geq m$二级结构且均聚物长度$\leq\ell$的DNA编码,其编码率为$1+\log_2\rho_m-3/(2^{\ell-1}+\ell+1)$,其中$\rho_m$取值见表\ref{tm}。特别地,当$m=3$、$\ell=4$时,编码率达到1.3206比特/核苷酸,优于Benerjee等人先前的工作。此外,我们还构建了同时满足上述三项约束及单错误校正能力的DNA编码,最后给出了具有GC局部平衡约束的编码方案。