We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: the main one is that the line graph is typically orders of magnitude larger than the original hypergraph. We then introduce HypED, a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding constructing the line graph. Our framework allows to approximately answer vertex-to-vertex, vertex-to-hyperedge, and hyperedge-to-hyperedge s-distance queries for any value of s. A key observation at the basis of our framework is that, as s increases, the hypergraph becomes more fragmented. We show how this can be exploited to improve the placement of landmarks, by identifying the s-connected components of the hypergraph. For this task, we devise an efficient algorithm based on the union-find technique and a dynamic inverted index. We experimentally evaluate HypED on several real-world hypergraphs and prove its versatility in answering s-distance queries for different values of s. Our framework allows answering such queries in fractions of a millisecond, while allowing fine-grained control of the trade-off between index size and approximation error at creation time. Finally, we prove the usefulness of the s-distance oracle in two applications, namely, hypergraph-based recommendation and the approximation of the s-closeness centrality of vertices and hyper-edges in the context of protein-to-protein interactions.
翻译:我们研究了超图中的点对点距离估计问题,其中查询由一个正整数s参数化,该参数定义了两个超边被视为相邻所需的交叠程度。为了回答s距离查询,我们首先探索了基于给定超图线图的预言机,并讨论了其局限性:主要问题在于线图通常比原始超图大数个数量级。随后我们引入了HypED——一种直接构建于超图之上、具有预设大小的基于地标的预言机,从而避免了线图的构建。我们的框架能够近似回答任意s值的顶点到顶点、顶点到超边以及超边到超边的s距离查询。该框架的一个关键观察基础是:随着s增大,超图会变得更加碎片化。我们展示了如何利用这一特性,通过识别超图的s连通分量来改进地标放置策略。为此,我们设计了一种基于并查集技术和动态倒排索引的高效算法。我们在多个真实世界超图上对HypED进行了实验评估,并证明了其在不同s值下回答s距离查询的适用性。我们的框架能够在亚毫秒级时间内完成此类查询,同时允许在创建时对索引大小和近似误差之间的权衡进行细粒度控制。最后,我们通过两个应用场景证明了s距离预言机的实用性:基于超图的推荐系统,以及在蛋白质相互作用背景下对顶点和超边的s接近中心性的近似计算。