SPIDER-WEB generates coding algorithms with superior error tolerance and real-time information retrieval capacity

DNA has been considered a promising medium for storing digital information. As an essential step in the DNA-based data storage workflow, coding algorithms are responsible to implement functions including bit-to-base transcoding, error correction, etc. In previous studies, these functions are normally realized by introducing multiple algorithms. Here, we report a graph-based architecture, named SPIDER-WEB, providing an all-in-one coding solution by generating customized algorithms automatically. SPIDERWEB is able to correct a maximum of 4% edit errors in the DNA sequences including substitution and insertion/deletion (indel), with only 5.5% redundant symbols. Since no DNA sequence pretreatment is required for the correcting and decoding processes, SPIDER-WEB offers the function of real-time information retrieval, which is 305.08 times faster than the speed of single-molecule sequencing techniques. Our retrieval process can improve 2 orders of magnitude faster compared to the conventional one under megabyte-level data and can be scalable to fit exabyte-level data. Therefore, SPIDER-WEB holds the potential to improve the practicability in large-scale data storage applications.

翻译：DNA 一直被视为存储数字信息的有前景介质。在基于DNA的数据存储流程中，编码算法作为关键环节，负责实现比特到碱基的转码、纠错等功能。以往研究中，这些功能通常通过引入多个算法来实现。本文报告一种名为SPIDER-WEB的基于图结构的架构，通过自动生成定制化算法提供一体化编码解决方案。SPIDER-WEB能够纠正DNA序列中最多4%的编辑错误（包括替换和插入/删除），冗余符号仅占5.5%。由于纠错和解码过程无需对DNA序列进行预处理，SPIDER-WEB具备实时信息检索功能，其速度比单分子测序技术快305.08倍。在兆字节级数据下，我们的检索速度相比传统方法可提升两个数量级，并可扩展至适配艾字节级数据。因此，SPIDER-WEB有望提升大规模数据存储应用中的实用性。

相关内容

网络爬虫

关注 13

网络爬虫（又被称为网页蜘蛛，网络机器人，在FOAF社区中间，更经常被称为网页追逐者），是一种按照一定的规则，自动的抓取万维网信息的程序或者脚本，已被广泛应用于互联网领域。搜索引擎使用网络爬虫抓取Web网页、文档甚至图片、音频、视频等资源，通过相应的索引技术组织这些信息，提供给搜索用户进行查询。网络爬虫也为中小站点的推广提供了有效的途径。

【伯克利博士论文】机器人机械搜索的操作与感知策略

专知会员服务

16+阅读 · 2022年6月4日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

16+阅读 · 2020年4月28日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日