Existing graph contrastive learning (GCL) techniques typically require two forward passes for a single instance to construct the contrastive loss, which is effective for capturing the low-frequency signals of node features. Such a dual-pass design has shown empirical success on homophilic graphs, but its effectiveness on heterophilic graphs, where directly connected nodes typically have different labels, is unknown. In addition, existing GCL approaches fail to provide strong performance guarantees. Coupled with the unpredictability of GCL approaches on heterophilic graphs, their applicability in real-world contexts is limited. Then, a natural question arises: Can we design a GCL method that works for both homophilic and heterophilic graphs with a performance guarantee? To answer this question, we theoretically study the concentration property of features obtained by neighborhood aggregation on homophilic and heterophilic graphs, introduce the single-pass augmentation-free graph contrastive learning loss based on the property, and provide performance guarantees for the minimizer of the loss on downstream tasks. As a direct consequence of our analysis, we implement the Single-Pass Graph Contrastive Learning method (SP-GCL). Empirically, on 14 benchmark datasets with varying degrees of homophily, the features learned by the SP-GCL can match or outperform existing strong baselines with significantly less computational overhead, which demonstrates the usefulness of our findings in real-world cases.
翻译:现有图对比学习(GCL)技术通常需要对单个实例执行两次前向传播以构建对比损失,这种方法能有效捕获节点特征的低频信号。这种双轮设计在同质图上已取得实证成功,但在直接相连节点通常具有不同标签的异质图上的有效性尚不明确。此外,现有GCL方法未能提供强性能保证。加之GCL方法在异质图上的不可预测性,其实际应用场景受到限制。由此自然产生一个问题:我们能否设计一种适用于同质图和异质图且具有性能保证的GCL方法?为解答这一问题,我们从理论上研究了同质图和异质图上邻域聚合所得特征的浓度性质,基于该性质引入无需数据增强的单轮图对比学习损失,并为此损失函数在下游任务上的极小化器提供性能保证。基于理论分析,我们实现了单轮图对比学习方法(SP-GCL)。在14个具有不同同质性程度的基准数据集上的实验表明,SP-GCL学习到的特征能以显著更低的计算开销匹配甚至超越现有强基线方法,这验证了我们发现在实际场景中的有效性。