Ensembling has proven to be a powerful technique for boosting model performance, uncertainty estimation, and robustness in supervised learning. Advances in self-supervised learning (SSL) enable leveraging large unlabeled corpora for state-of-the-art few-shot and supervised learning performance. In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framework that permits data-dependent weighted cross-entropy losses. We refrain from ensembling the representation backbone; this choice yields an efficient ensemble method that incurs a small training cost and requires no architectural changes or computational overhead to downstream evaluation. The effectiveness of our method is demonstrated with two state-of-the-art SSL methods, DINO (Caron et al., 2021) and MSN (Assran et al., 2022). Our method outperforms both in multiple evaluation metrics on ImageNet-1K, particularly in the few-shot setting. We explore several weighting schemes and find that those which increase the diversity of ensemble heads lead to better downstream evaluation results. Thorough experiments yield improved prior art baselines which our method still surpasses; e.g., our overall improvement with MSN ViT-B/16 is 3.9 p.p. for 1-shot learning.
翻译:集成方法已被证明是监督学习中提升模型性能、不确定性估计和鲁棒性的强大技术。自监督学习(SSL)的进步使得利用大规模未标注语料库实现最先进的少样本和监督学习性能成为可能。本文通过开发一种支持数据依赖的加权交叉熵损失框架,探索了集成方法如何改进最新的自监督学习技术。我们避免对表征骨干网络进行集成;这一选择产生了一种高效的集成方法,其训练成本低,且无需对下游评估进行架构更改或增加计算开销。我们使用两种最先进的自监督学习方法——DINO(Caron等人,2021)和MSN(Assran等人,2022)——证明了该方法的有效性。在ImageNet-1K上的多项评估指标中,我们的方法均优于这两种方法,尤其在少样本设置下表现突出。我们探索了多种加权方案,发现那些能增加集成头多样性的方案会带来更好的下游评估结果。充分的实验改进了现有的先验基线,而我们提出的方法仍能超越这些基线;例如,在使用MSN ViT-B/16时,我们在1-shot学习中的总体改进幅度为3.9个百分点。