We survey graph reachability indexing techniques for efficient processing of graph reachability queries in two types of popular graph models: plain graphs and edge-labeled graphs. Reachability queries are fundamental in graph processing, and reachability indexes are specialized data structures tailored for speeding up such queries. Work on this topic goes back four decades -- we include 33 of the proposed techniques. Plain graphs contain only vertices and edges, with reachability queries checking path existence between a source and target vertex. Edge-labeled graphs, in contrast, augment plain graphs by adding edge labels. Reachability queries in edge-labeled graphs incorporate path constraints based on edge labels, assessing both path existence and compliance with constraints. We categorize techniques in both plain and edge-labeled graphs and discuss the approaches according to this classification, using existing techniques as exemplars. We discuss the main challenges within each class and how these might be addressed in other approaches. We conclude with a discussion of the open research challenges and future research directions, along the lines of integrating reachability indexes into graph data management systems. This survey serves as a comprehensive resource for researchers and practitioners interested in the advancements, techniques, and challenges on reachability indexing in graph analytics.
翻译:本文综述了两种主流图模型(普通图和边标记图)中用于高效处理图可达性查询的索引技术。可达性查询是图处理中的基本操作,而可达性索引是专为加速此类查询而设计的特殊数据结构。该领域的研究已有四十年历史——本文涵盖33种已提出的技术。普通图仅包含顶点和边,其可达性查询检查源顶点与目标顶点之间是否存在路径。相比之下,边标记图在普通图基础上增加了边标签,其可达性查询需结合基于边标签的路径约束,同时验证路径存在性与约束符合性。本文对普通图和边标记图中的技术进行分类,并以现有技术为例,依据分类结果讨论各类方法。我们探讨每类方法面临的主要挑战,以及其它方法如何应对这些挑战。最后,本文讨论了将可达性索引集成到图数据管理系统中的开放研究挑战和未来研究方向。本综述为关注图分析中可达性索引的进展、技术及挑战的研究者和从业者提供了全面资源。