Minerva: Decentralized Collaborative Query Processing over InterPlanetary File System

Data silos create barriers in accessing and utilizing data dispersed over networks. Directly sharing data easily suffers from the long downloading time, the single point failure and the untraceable data usage. In this paper, we present Minerva, a peer-to-peer cross-cluster data query system based on InterPlanetary File System (IPFS). Minerva makes use of the distributed Hash table (DHT) lookup to pinpoint the locations that store content chunks. We theoretically model the DHT query delay and introduce the fat Merkle tree structure as well as the DHT caching to reduce it. We design the query plan for read and write operations on top of Apache Drill that enables the collaborative query with decentralized workers. We conduct comprehensive experiments on Minerva, and the results show that Minerva achieves up to $2.08 \times$ query performance acceleration compared to the original IPFS data query, and could complete data analysis queries on the Internet-like environments within an average latency of $0.615$ second. With collaborative query, Minerva could perform up to $1.39 \times$ performance acceleration than centralized query with raw data shipment.

翻译：数据孤岛为访问和利用分散在网络中的数据造成了障碍。直接共享数据容易面临下载时间长、单点故障以及数据使用不可追溯等问题。本文提出 Minerva，一个基于星际文件系统（IPFS）的点对点跨集群数据查询系统。Minerva 利用分布式哈希表（DHT）查找来定位存储内容分块的节点。我们从理论上对 DHT 查询延迟进行建模，并引入胖梅克尔树结构以及 DHT 缓存来降低延迟。我们在 Apache Drill 之上设计了读写操作的查询计划，使去中心化的工作节点能够协同查询。我们对 Minerva 进行了全面实验，结果表明，与原始 IPFS 数据查询相比，Minerva 的查询性能加速最高达 $2.08 \times$，并且能在类似互联网的环境中完成数据分析查询，平均延迟为 $0.615$ 秒。通过协同查询，Minerva 的性能加速比使用原始数据传输的集中式查询最高可达 $1.39 \times$。

相关内容

分布式哈希表技术

关注 0

分布式哈希表技术(Distributed Hash Table)简称DHT,类似Tracker的根据种子特征码返回种子信息的网络·是一种分布式存储方法。在不需要服务器的情况下，每个客户端负责一个小范围的路由，并负责存储一小部分数据，从而实现整个DHT网络的寻址和存储。新版BitComet允许同行连接DHT网络和Tracker，也就是说在完全不连上[Tracker服务器的情况下，也可以很好的下载，因为它可以在DHT网络中寻找下载同一文件的其他用户。BitComet的DHT网络协议和BitTorrent今年5月测试版的协议完全兼容，也就是说可以连入一个同DHT网络分享数据。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日