Ensembling has proven to be a powerful technique for boosting model performance, uncertainty estimation, and robustness in supervised learning. Advances in self-supervised learning (SSL) enable leveraging large unlabeled corpora for state-of-the-art few-shot and supervised learning performance. In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framework that permits data-dependent weighted cross-entropy losses. We refrain from ensembling the representation backbone; this choice yields an efficient ensemble method that incurs a small training cost and requires no architectural changes or computational overhead to downstream evaluation. The effectiveness of our method is demonstrated with two state-of-the-art SSL methods, DINO (Caron et al., 2021) and MSN (Assran et al., 2022). Our method outperforms both in multiple evaluation metrics on ImageNet-1K, particularly in the few-shot setting. We explore several weighting schemes and find that those which increase the diversity of ensemble heads lead to better downstream evaluation results. Thorough experiments yield improved prior art baselines which our method still surpasses; e.g., our overall improvement with MSN ViT-B/16 is 3.9 p.p. for 1-shot learning.
翻译:集成学习已被证明是提升监督学习模型性能、不确定性估计及鲁棒性的强大技术。自监督学习(SSL)的进展使得利用大规模无标注语料库实现最先进的少样本学习与监督学习性能成为可能。本文通过开发一种允许数据依赖的加权交叉熵损失的框架,探究集成方法如何改进近期自监督学习技术。我们避免对表征主干网络进行集成;这一选择产生了一种高效的集成方法,其训练成本极低,且无需对下游评估进行架构修改或增加计算开销。我们通过两种最先进的自监督学习方法——DINO(Caron等,2021)和MSN(Assran等,2022)——验证了该方法的有效性。在ImageNet-1K数据集的多项评估指标上,特别是少样本学习场景中,我们的方法均优于这两种基线方法。我们探究了多种加权方案,发现那些能增加集成头多样性的方案可获得更好的下游评估结果。大量实验改进了现有最优基线,而我们的方法仍能超越这些基线;例如,在MSN ViT-B/16模型上,我们方法的1-shot学习整体提升幅度为3.9个百分点。