Encoding digital information into DNA sequences offers an attractive potential solution for storing rapidly growing data under the information age and the rise of artificial intelligence. However, practical implementations of DNA storage are constrained by errors introduced during synthesis, preservation, and sequencing processes, and traditional error-correcting codes remain vulnerable to noise levels that exceed predefined thresholds. Here, we developed a Partitioning-mapping with Jump-rotating (PJ) encoding scheme, which exhibits exceptional noise resilience. PJ removes cross-strand information dependencies so that strand loss manifests as localized gaps rather than catastrophic file failure. It prioritizes file decodability under arbitrary noise conditions and leverages AI-based inference to enable controllable recovery of digital information. For the intra-strand encoding, we develop a jump-rotating strategy that relaxes sequence constraints relative to conventional rotating codes and provides tunable information density via an adjustable jump length. Based on this encoding architecture, the original file information can always be decoded and recovered under any strand loss ratio, with fidelity degrading smoothly as damage increases. We demonstrate that original files can be effectively recovered even with 10% strand loss, and machine learning datasets stored under these conditions retain their classification performance. Experiments further confirmed that PJ successfully decodes image files after extreme environmental disturbance using accelerated aging and high-intensity X-ray irradiation. By eliminating reliance on prior error probabilities, PJ establishes a general framework for robust, archival DNA storage capable of withstanding the rigorous conditions of real-world preservation.
翻译:将数字信息编码为DNA序列,为信息时代和人工智能兴起背景下快速增长的数据存储需求提供了一种极具吸引力的潜在解决方案。然而,DNA存储的实际应用受到合成、保存及测序过程中引入错误的制约,而传统纠错码在面对超过预设阈值的噪声水平时依然脆弱。本文提出了一种基于跳跃旋转的分区映射(PJ)编码方案,该方案展现出卓越的噪声鲁棒性。PJ消除了跨链信息依赖性,使得链丢失仅表现为局部间隙而非灾难性的文件失效。它优先保障任意噪声条件下的文件可解码性,并利用基于人工智能的推理实现数字信息的可控恢复。在链内编码方面,我们开发了一种跳跃旋转策略,该策略相较于传统旋转码放宽了序列约束,并通过可调节的跳跃长度提供可调谐的信息密度。基于此编码架构,原始文件信息在任何链丢失率下均可被解码与恢复,其保真度随损坏程度增加而平缓下降。我们证明即使存在10%的链丢失,原始文件仍能被有效恢复,且在此条件下存储的机器学习数据集能保持其分类性能。实验进一步证实,通过加速老化和高强度X射线辐照模拟极端环境干扰后,PJ仍能成功解码图像文件。通过消除对先验错误概率的依赖,PJ建立了一个适用于现实世界严苛保存条件的、具备强鲁棒性的归档DNA存储通用框架。