Graph neural networks (GNNs) are designed for semi-supervised node classification on graphs where only a subset of nodes have class labels. However, under extreme cases when very few labels are available (e.g., 1 labeled node per class), GNNs suffer from severe performance degradation. Specifically, we observe that existing GNNs suffer from unstable training process on few-labeled graphs, resulting to inferior performance on node classification. Therefore, we propose an effective framework, Stabilized Self-Training (SST), which is applicable to existing GNNs to handle the scarcity of labeled data, and consequently, boost classification accuracy. We conduct thorough empirical and theoretical analysis to support our findings and motivate the algorithmic designs in SST. We apply SST to two popular GNN models GCN and DAGNN, to get SSTGCN and SSTDA methods respectively, and evaluate the two methods against 10 competitors over 5 benchmarking datasets. Extensive experiments show that the proposed SST framework is highly effective, especially when few labeled data are available. Our methods achieve superior performance under almost all settings over all datasets. For instance, on a Cora dataset with only 1 labeled node per class, the accuracy of SSTGCN is 62.5%, 17.9% higher than GCN, and the accuracy of SSTDA is 66.4%, which outperforms DAGNN by 6.6%.
翻译:图神经网络(GNNs)专为半监督节点分类设计,其中仅部分节点具有类别标签。然而,在标签数量极少的极端情况下(例如每类仅1个标注节点),GNNs的性能会严重下降。具体而言,我们发现现有GNNs在少标注图上存在训练过程不稳定的问题,导致节点分类效果欠佳。为此,我们提出一种高效框架——稳定自训练(Stabilized Self-Training, SST),该框架可应用于现有GNNs以应对标注数据稀缺问题,进而提升分类准确率。我们进行了充分的实证与理论分析,以支撑研究发现并驱动SST的算法设计。将SST应用于两种主流GNN模型GCN与DAGNN,分别获得SSTGCN与SSTDA方法,并在5个基准数据集上与10种对比方法进行评测。大量实验表明,所提SST框架在标注数据极少时尤为高效。我们的方法在几乎所有数据集设置下均取得了卓越性能。例如,在Cora数据集上(每类仅1个标注节点),SSTGCN准确率达62.5%,较GCN提升17.9%;SSTDA准确率达66.4%,比DAGNN高出6.6%。