In active learning for graph-structured data, Graph Neural Networks (GNNs) have shown effectiveness. However, a common challenge in these applications is the underutilization of crucial structural information. To address this problem, we propose the Structural-Clustering PageRank method for improved Active learning (SPA) specifically designed for graph-structured data. SPA integrates community detection using the SCAN algorithm with the PageRank scoring method for efficient and informative sample selection. SPA prioritizes nodes that are not only informative but also central in structure. Through extensive experiments, SPA demonstrates higher accuracy and macro-F1 score over existing methods across different annotation budgets and achieves significant reductions in query time. In addition, the proposed method only adds two hyperparameters, $\epsilon$ and $\mu$ in the algorithm to finely tune the balance between structural learning and node selection. This simplicity is a key advantage in active learning scenarios, where extensive hyperparameter tuning is often impractical.
翻译:在面向图结构数据的主动学习中,图神经网络已展现出有效性。然而,此类应用中普遍存在的问题是对关键结构信息的利用不足。为解决这一问题,我们提出了面向图结构数据的结构聚类PageRank主动学习方法(SPA)。该方法将基于SCAN算法的社区检测与PageRank评分机制相结合,实现了高效且信息丰富的样本选择。SPA优先选择兼具信息量与结构中心性的节点。通过大量实验,SPA在不同标注预算下均展现出优于现有方法的准确率和宏F1分数,并显著降低了查询时间。此外,该方法仅需引入$\epsilon$和$\mu$两个超参数,即可精细调节结构学习与节点选择之间的平衡。这一简洁性在超参数调优通常不切实际的主动学习场景中构成了关键优势。