CAT: A Causally Graph Attention Network for Trimming Heterophilic Graph

Local Attention-guided Message Passing Mechanism (LAMP) adopted in Graph Attention Networks (GATs) is designed to adaptively learn the importance of neighboring nodes for better local aggregation on the graph, which can bring the representations of similar neighbors closer effectively, thus showing stronger discrimination ability. However, existing GATs suffer from a significant discrimination ability decline in heterophilic graphs because the high proportion of dissimilar neighbors can weaken the self-attention of the central node, jointly resulting in the deviation of the central node from similar nodes in the representation space. This kind of effect generated by neighboring nodes is called the Distraction Effect (DE) in this paper. To estimate and weaken the DE of neighboring nodes, we propose a Causally graph Attention network for Trimming heterophilic graph (CAT). To estimate the DE, since the DE are generated through two paths (grab the attention assigned to neighbors and reduce the self-attention of the central node), we use Total Effect to model DE, which is a kind of causal estimand and can be estimated from intervened data; To weaken the DE, we identify the neighbors with the highest DE (we call them Distraction Neighbors) and remove them. We adopt three representative GATs as the base model within the proposed CAT framework and conduct experiments on seven heterophilic datasets in three different sizes. Comparative experiments show that CAT can improve the node classification accuracy of all base GAT models. Ablation experiments and visualization further validate the enhancement of discrimination ability brought by CAT. The source code is available at https://github.com/GeoX-Lab/CAT.

翻译：局部注意力引导的消息传递机制（LAMP）在图形注意力网络（GATs）中用于自适应学习邻居节点的重要性，以实现更好的图局部聚合。该机制能有效拉近相似节点的表示距离，从而展现出更强的判别能力。然而，现有GATs在异配图中面临显著的判别能力下降问题，因为高比例的不相似邻居会削弱中心节点的自注意力，导致中心节点在表示空间中偏离相似节点。本文将邻居节点产生的此类效应称为干扰效应（DE）。为估计并削弱邻居节点的DE，我们提出一种用于修剪异配图的因果图注意力网络（CAT）。为估计DE，由于DE通过两条路径产生（抢夺分配给邻居的注意力并降低中心节点的自注意力），我们使用总效应（一种因果估计量，可从干预数据中估计）对DE进行建模；为削弱DE，我们识别具有最高DE的邻居（称为干扰邻居）并将其移除。我们在所提出的CAT框架中采用三种代表性GATs作为基础模型，并在七个不同规模的异配数据集上进行实验。对比实验表明，CAT能提升所有基础GAT模型的节点分类准确率。消融实验与可视化进一步验证了CAT带来的判别能力增强。源代码见https://github.com/GeoX-Lab/CAT。