Numerous real-world information networks form Heterogeneous Information Networks (HINs) with diverse objects and relations represented as nodes and edges in heterogeneous graphs. Similarity between nodes quantifies how closely two nodes resemble each other, mainly depending on the similarity of the nodes they are connected to, recursively. Users may be interested in only specific types of connections in the similarity definition, represented as meta-paths, i.e., a sequence of node and edge types. Existing Heterogeneous Graph Neural Network (HGNN)-based similarity search methods may accommodate meta-paths, but require retraining for different meta-paths. Conversely, existing path-based similarity search methods may switch flexibly between meta-paths but often suffer from lower accuracy, as they rely solely on path information. This paper proposes HetFS, a Fast Similarity method for ad-hoc queries with user-given meta-paths on Heterogeneous information networks. HetFS provides similarity results based on path information that satisfies the meta-path restriction, as well as node content. Extensive experiments demonstrate the effectiveness and efficiency of HetFS in addressing ad-hoc queries, outperforming state-of-the-art HGNNs and path-based approaches, and showing strong performance in downstream applications, including link prediction, node classification, and clustering.
翻译:众多现实世界的信息网络构成了异构信息网络,其中多样化的对象和关系被表示为异构图中的节点和边。节点之间的相似性量化了两个节点之间的相似程度,主要递归地取决于它们所连接节点的相似性。在相似性定义中,用户可能只对特定类型的连接感兴趣,这些连接被表示为元路径,即节点和边类型的序列。现有的基于异构图神经网络(HGNN)的相似性搜索方法可以适应元路径,但需要针对不同的元路径重新训练。相反,现有的基于路径的相似性搜索方法可以灵活地在不同元路径之间切换,但通常准确性较低,因为它们仅依赖于路径信息。本文提出HetFS,一种用于异构信息网络上用户给定元路径的临时查询的快速相似性方法。HetFS基于满足元路径限制的路径信息以及节点内容提供相似性结果。大量实验证明了HetFS在解决临时查询方面的有效性和效率,优于最先进的HGNN和基于路径的方法,并在下游应用中表现出强大的性能,包括链接预测、节点分类和聚类。