Graph Markov Neural Networks (GMNN) have recently been proposed to improve regular graph neural networks (GNN) by including label dependencies into the semi-supervised node classification task. GMNNs do this in a theoretically principled way and use three kinds of information to predict labels. Just like ordinary GNNs, they use the node features and the graph structure but they moreover leverage information from the labels of neighboring nodes to improve the accuracy of their predictions. In this paper, we introduce a new dataset named WikiVitals which contains a graph of 48k mutually referred Wikipedia articles classified into 32 categories and connected by 2.3M edges. Our aim is to rigorously evaluate the contributions of three distinct sources of information to the prediction accuracy of GMNN for this dataset: the content of the articles, their connections with each other and the correlations among their labels. For this purpose we adapt a method which was recently proposed for performing fair comparisons of GNN performance using an appropriate randomization over partitions and a clear separation of model selection and model assessment.
翻译:图马尔可夫神经网络(GMNN)近期被提出,通过将标签依赖性纳入半监督节点分类任务,以改进常规图神经网络(GNN)。GMNN以理论严谨的方式实现这一目标,并利用三类信息预测标签。与普通GNN类似,它们使用节点特征和图结构,同时还利用相邻节点的标签信息来提高预测准确性。本文引入一个名为WikiVitals的新数据集,该数据集包含48,000个相互引用的维基百科文章构成的图,这些文章被分为32个类别,并由230万条边连接。我们的目标是严格评估三类不同信息源(文章内容、文章间连接以及标签间相关性)对GMNN在此数据集上预测准确性的贡献。为此,我们采用一种近期提出的方法,通过使用适当的划分随机化以及模型选择与模型评估的明确分离,对GNN性能进行公平比较。