Graphs play an increasingly important role in various big data applications. However, existing graph data structures cannot simultaneously address the performance bottlenecks caused by the dynamic updates, large scale, and high query complexity of current graphs. This paper proposes a novel data structure for large-scale dynamic graphs called CuckooGraph. It does not need to know the amount of graph data in advance, and can adaptively resize to the most memory-efficient form according to the data scale, realizing multiple graph analytic tasks faster. The key techniques of CuckooGraph include TRANSFORMATION and DENYLIST. TRANSFORMATION fully utilizes the limited memory by designing related data structures that allow flexible space transformations to smoothly expand/tighten the required space depending on the number of incoming items. DENYLIST efficiently handles item insertion failures and further improves processing speed. We conduct extensive experiments, and the results show that CuckooGraph significantly reduces query time by four orders of magnitude on 1-hop successor and precursor queries compared to the state-of-the-art.
翻译:图在各种大数据应用中的作用日益重要。然而,现有的图数据结构无法同时应对当前图数据因动态更新、规模庞大和查询复杂度高而带来的性能瓶颈。本文提出了一种用于大规模动态图的新型数据结构,称为CuckooGraph。它无需预先知晓图数据量,并能根据数据规模自适应地调整至最节省内存的形式,从而更快地实现多种图分析任务。CuckooGraph的核心技术包括TRANSFORMATION和DENYLIST。TRANSFORMATION通过设计相关的数据结构,充分利用有限的内存,允许灵活的空间变换,以便根据新增数据项的数量平稳地扩展或紧缩所需空间。DENYLIST则能高效处理数据项插入失败的情况,并进一步提升处理速度。我们进行了大量实验,结果表明,在1跳后继和前驱查询上,CuckooGraph相比现有最优技术将查询时间显著降低了四个数量级。