Minerva: Decentralized Collaborative Query Processing over InterPlanetary File System

Data silos create barriers in accessing and utilizing data dispersed over networks. Directly sharing data easily suffers from the long downloading time, the single point failure and the untraceable data usage. In this paper, we present Minerva, a peer-to-peer cross-cluster data query system based on InterPlanetary File System (IPFS). Minerva makes use of the distributed Hash table (DHT) lookup to pinpoint the locations that store content chunks. We theoretically model the DHT query delay and introduce the fat Merkle tree structure as well as the DHT caching to reduce it. We design the query plan for read and write operations on top of Apache Drill that enables the collaborative query with decentralized workers. We conduct comprehensive experiments on Minerva, and the results show that Minerva achieves up to $2.08 \times$ query performance acceleration compared to the original IPFS data query, and could complete data analysis queries on the Internet-like environments within an average latency of $0.615$ second. With collaborative query, Minerva could perform up to $1.39 \times$ performance acceleration than centralized query with raw data shipment.

翻译：数据孤岛给网络中分散数据的访问与利用造成了障碍。直接共享数据容易面临下载时间长、单点故障以及数据使用不可追溯等问题。本文提出一种名为米涅瓦的基于星际文件系统（IPFS）的对等跨集群数据查询系统。该系统利用分布式哈希表（DHT）定位来精确查找存储内容分片的位置。我们从理论上建立了DHT查询延迟模型，并引入胖Merkle树结构及DHT缓存以降低延迟。基于Apache Drill设计了支持读写操作的查询计划，实现了去中心化工作节点的协作查询。针对米涅瓦开展了全面实验，结果表明：相较原始IPFS数据查询，米涅瓦可实现高达$2.08 \times$的查询性能加速，在类互联网环境中完成数据分析查询的平均延迟为$0.615$秒。通过协作查询，相比采用原始数据传输的集中式查询，米涅瓦最高可实现$1.39 \times$的性能提升。

相关内容

分布式哈希表技术

关注 0

分布式哈希表技术(Distributed Hash Table)简称DHT,类似Tracker的根据种子特征码返回种子信息的网络·是一种分布式存储方法。在不需要服务器的情况下，每个客户端负责一个小范围的路由，并负责存储一小部分数据，从而实现整个DHT网络的寻址和存储。新版BitComet允许同行连接DHT网络和Tracker，也就是说在完全不连上[Tracker服务器的情况下，也可以很好的下载，因为它可以在DHT网络中寻找下载同一文件的其他用户。BitComet的DHT网络协议和BitTorrent今年5月测试版的协议完全兼容，也就是说可以连入一个同DHT网络分享数据。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日