We consider the problem of active learning on graphs, which has crucial applications in many real-world networks where labeling node responses is expensive. In this paper, we propose an offline active learning method that selects nodes to query by explicitly incorporating information from both the network structure and node covariates. Building on graph signal recovery theories and the random spectral sparsification technique, the proposed method adopts a two-stage biased sampling strategy that takes both informativeness and representativeness into consideration for node querying. Informativeness refers to the complexity of graph signals that are learnable from the responses of queried nodes, while representativeness refers to the capacity of queried nodes to control generalization errors given noisy node-level information. We establish a theoretical relationship between generalization error and the number of nodes selected by the proposed method. Our theoretical results demonstrate the trade-off between informativeness and representativeness in active learning. Extensive numerical experiments show that the proposed method is competitive with existing graph-based active learning methods, especially when node covariates and responses contain noises. Additionally, the proposed method is applicable to both regression and classification tasks on graphs.
翻译:本文研究图上的主动学习问题,该问题在许多真实网络应用中至关重要,因为在这些场景中标注节点响应代价高昂。我们提出一种离线主动学习方法,该方法通过显式结合网络结构与节点协变量的信息来选择待查询的节点。基于图信号恢复理论与随机谱稀疏化技术,所提方法采用一种两阶段偏置采样策略,在节点查询时同时考虑信息量与代表性。信息量指从已查询节点的响应中可学习的图信号的复杂度,而代表性指在给定含噪节点级信息的条件下,已查询节点控制泛化误差的能力。我们建立了所提方法选择的节点数量与泛化误差之间的理论关系。理论结果揭示了主动学习中信息量与代表性之间的权衡关系。大量数值实验表明,所提方法与现有的基于图的主动学习方法相比具有竞争力,尤其在节点协变量与响应包含噪声时表现突出。此外,该方法适用于图上的回归与分类任务。