Since most scientific literature data are unlabeled, this makes unsupervised graph-based semantic representation learning crucial. Therefore, an unsupervised semantic representation learning method of scientific literature based on graph attention mechanism and maximum mutual information (GAMMI) is proposed. By introducing a graph attention mechanism, the weighted summation of nearby node features make the weights of adjacent node features entirely depend on the node features. Depending on the features of the nearby nodes, different weights can be applied to each node in the graph. Therefore, the correlations between vertex features can be better integrated into the model. In addition, an unsupervised graph contrastive learning strategy is proposed to solve the problem of being unlabeled and scalable on large-scale graphs. By comparing the mutual information between the positive and negative local node representations on the latent space and the global graph representation, the graph neural network can capture both local and global information. Experimental results demonstrate competitive performance on various node classification benchmarks, achieving good results and sometimes even surpassing the performance of supervised learning.
翻译:由于大多数科学文献数据缺乏标注,这使得基于图的无监督语义表示学习至关重要。为此,提出了一种基于图注意力机制与最大互信息(GAMMI)的科学文献无监督语义表示学习方法。通过引入图注意力机制,对邻近节点特征进行加权求和,使得相邻节点特征的权重完全取决于节点特征本身。根据邻近节点的特征,可为图中每个节点赋予不同权重,从而更好地将顶点特征之间的相关性融入模型。此外,提出一种无监督图对比学习策略,以解决大规模图上无标注数据难以扩展的问题。通过对比潜在空间中正负局部节点表示与全局图表示之间的互信息,图神经网络能够同时捕获局部与全局信息。实验结果表明,该方法在多个节点分类基准任务上表现出竞争力,取得了良好效果,有时甚至超越了监督学习的性能。