Heterogeneous graph neural network (HGNN) is a very popular technique for the modeling and analysis of heterogeneous graphs. Most existing HGNN-based approaches are supervised or semi-supervised learning methods requiring graphs to be annotated, which is costly and time-consuming. Self-supervised contrastive learning has been proposed to address the problem of requiring annotated data by mining intrinsic information hidden within the given data. However, the existing contrastive learning methods are inadequate for heterogeneous graphs because they construct contrastive views only based on data perturbation or pre-defined structural properties (e.g., meta-path) in graph data while ignore the noises that may exist in both node attributes and graph topologies. We develop for the first time a novel and robust heterogeneous graph contrastive learning approach, namely HGCL, which introduces two views on respective guidance of node attributes and graph topologies and integrates and enhances them by reciprocally contrastive mechanism to better model heterogeneous graphs. In this new approach, we adopt distinct but most suitable attribute and topology fusion mechanisms in the two views, which are conducive to mining relevant information in attributes and topologies separately. We further use both attribute similarity and topological correlation to construct high-quality contrastive samples. Extensive experiments on three large real-world heterogeneous graphs demonstrate the superiority and robustness of HGCL over state-of-the-art methods.
翻译:异构图神经网络(HGNN)是一种对异构图的建模与分析非常流行的技术。现有的基于HGNN的方法大多为有监督或半监督学习方法,需要图数据具有标注信息,这既昂贵又耗时。自监督对比学习通过挖掘给定数据中隐藏的内在信息,已被提出用于解决需要标注数据的问题。然而,现有的对比学习方法对异构图并不充分,因为它们仅基于数据扰动或图数据中预定义的结构属性(例如元路径)构建对比视图,而忽略了节点属性和图拓扑中可能存在的噪声。我们首次提出了一种新颖且鲁棒的异构图对比学习方法,即HGCL,该方法引入分别以节点属性和图拓扑为导向的两个视图,并通过互对比机制对其进行集成与增强,以更好地对异构图进行建模。在该新方法中,我们在这两个视图中采用不同但最合适的属性和拓扑融合机制,这有助于分别挖掘属性和拓扑中的相关信息。我们进一步利用属性相似度和拓扑关联性来构建高质量的对比样本。在三个大规模真实异构图上的广泛实验表明,HGCL相较于现有最优方法具有优越性和鲁棒性。