Semi-supervised learning aims to infer class labels using only a small fraction of labeled data. In graph-based semi-supervised learning, this is typically achieved through label propagation to predict labels of unlabeled nodes. However, in real-world applications, data often arrive incrementally in batches. Each time a new batch appears, reapplying the traditional label propagation algorithm to recompute all labels is redundant, computationally intensive, and inefficient. To address the absence of an efficient label propagation update method, we propose DynLP, a novel GPU-centric Dynamic Batched Parallel Label Propagation algorithm that performs only the necessary updates, propagating changes to the relevant subgraph without requiring full recalculation. By exploiting GPU architectural optimizations, our algorithm achieves on average 13x and upto 102x speedup on large-scale datasets compared to state-of-the-art approaches.
翻译:半监督学习旨在仅利用少量标注数据推断类别标签。在图半监督学习中,通常通过标签传播来预测未标注节点的标签。然而,在实际应用中,数据通常以批量的方式增量到达。每次新批次数据出现时,重新应用传统标签传播算法重新计算所有标签会造成冗余、计算密集且效率低下。为解决缺乏高效标签传播更新方法的问题,我们提出DynLP——一种新颖的以GPU为核心的动态批量并行标签传播算法。该算法仅执行必要的更新,将变化传播到相关子图,而无需完全重新计算。通过利用GPU架构优化,与现有最优方法相比,我们的算法在大型数据集上平均实现13倍、最高102倍的加速效果。