In this paper we introduce SemiGPC, a distribution-aware label refinement strategy based on Gaussian Processes where the predictions of the model are derived from the labels posterior distribution. Differently from other buffer-based semi-supervised methods such as CoMatch and SimMatch, our SemiGPC includes a normalization term that addresses imbalances in the global data distribution while maintaining local sensitivity. This explicit control allows SemiGPC to be more robust to confirmation bias especially under class imbalance. We show that SemiGPC improves performance when paired with different Semi-Supervised methods such as FixMatch, ReMixMatch, SimMatch and FreeMatch and different pre-training strategies including MSN and Dino. We also show that SemiGPC achieves state of the art results under different degrees of class imbalance on standard CIFAR10-LT/CIFAR100-LT especially in the low data-regime. Using SemiGPC also results in about 2% avg.accuracy increase compared to a new competitive baseline on the more challenging benchmarks SemiAves, SemiCUB, SemiFungi and Semi-iNat.
翻译:本文提出SemiGPC——一种基于高斯过程的分布感知标签精炼策略,其中模型预测源自标签后验分布。与CoMatch和SimMatch等其他基于缓冲区的半监督方法不同,我们的SemiGPC包含一个归一化项,可在保持局部敏感性的同时解决全局数据分布的不平衡问题。这种显式控制使SemiGPC能够更稳健地应对确认偏差,尤其在类别不平衡场景下表现突出。我们证明,当SemiGPC与FixMatch、ReMixMatch、SimMatch和FreeMatch等不同半监督方法配对,以及结合MSN和Dino等不同预训练策略时,均能提升性能。研究还表明,在标准CIFAR10-LT/CIFAR100-LT数据集的不同类别不平衡程度下,尤其在低数据场景中,SemiGPC达到了最先进的结果。在更具挑战性的基准测试SemiAves、SemiCUB、SemiFungi和Semi-iNat上,使用SemiGPC相比新竞争基线平均准确率提升约2%。