Graph Attention Retrospective

Graph-based learning is a rapidly growing sub-field of machine learning with applications in social networks, citation networks, and bioinformatics. One of the most popular models is graph attention networks. They were introduced to allow a node to aggregate information from features of neighbor nodes in a non-uniform way, in contrast to simple graph convolution which does not distinguish the neighbors of a node. In this paper, we theoretically study the behaviour of graph attention networks. We prove multiple results on the performance of the graph attention mechanism for the problem of node classification for a contextual stochastic block model. Here, the node features are obtained from a mixture of Gaussians and the edges from a stochastic block model. We show that in an "easy" regime, where the distance between the means of the Gaussians is large enough, graph attention is able to distinguish inter-class from intra-class edges. Thus it maintains the weights of important edges and significantly reduces the weights of unimportant edges. Consequently, we show that this implies perfect node classification. In the "hard" regime, we show that every attention mechanism fails to distinguish intra-class from inter-class edges. In addition, we show that graph attention convolution cannot (almost) perfectly classify the nodes even if intra-class edges could be separated from inter-class edges. Beyond perfect node classification, we provide a positive result on graph attention's robustness against structural noise in the graph. In particular, our robustness result implies that graph attention can be strictly better than both the simple graph convolution and the best linear classifier of node features. We evaluate our theoretical results on synthetic and real-world data.

翻译：基于图的机器学习是机器学习中快速发展的子领域，在社交网络、引文网络和生物信息学中具有广泛应用。最流行的模型之一是图注意力网络。与简单图卷积（无法区分节点的邻居节点）不同，图注意力网络允许节点以非均匀方式从邻居节点的特征中聚合信息。本文对图注意力网络的行为进行了理论研究。针对上下文随机块模型的节点分类问题，我们证明了图注意力机制的若干性能结果。在该模型中，节点特征由高斯混合分布生成，边由随机块模型生成。我们证明，在“简单”模式下（即高斯均值间距足够大时），图注意力能够区分跨类别边与类别内边，从而保留重要边的权重并显著降低不重要边的权重，进而实现完美的节点分类。在“困难”模式下，我们发现所有注意力机制均无法区分类别内边与跨类别边。此外，即使类别内边与跨类别边可分离，图注意力卷积仍无法（几乎）完美分类节点。除了完美节点分类，我们还提供了图注意力在图结构噪声鲁棒性方面的正向结果。具体而言，我们的鲁棒性结果表明，图注意力严格优于简单图卷积和最佳线性节点特征分类器。我们在合成数据和真实数据上评估了理论结果。