Self-Supervised Learning (SSL) models rely on a pretext task to learn representations. Because this pretext task differs from the downstream tasks used to evaluate the performance of these models, there is an inherent misalignment or pretraining bias. A commonly used trick in SSL, shown to make deep networks more robust to such bias, is the addition of a small projector (usually a 2 or 3 layer multi-layer perceptron) on top of a backbone network during training. In contrast to previous work that studied the impact of the projector architecture, we here focus on a simpler, yet overlooked lever to control the information in the backbone representation. We show that merely changing its dimensionality -- by changing only the size of the backbone's very last block -- is a remarkably effective technique to mitigate the pretraining bias. It significantly improves downstream transfer performance for both Self-Supervised and Supervised pretrained models.
翻译:自监督学习(Self-Supervised Learning, SSL)模型依赖前置任务来学习表示。由于该前置任务与用于评估模型性能的下游任务存在差异,因此产生了固有的不匹配或预训练偏差。在SSL中,一个常见技巧是在训练期间于主干网络顶部添加小型投影器(通常为2层或3层多层感知机),这被证明能使深度网络对此类偏差更具鲁棒性。与以往研究投影器架构影响的工作不同,本文聚焦于一个更简单但被忽视的杠杆——控制主干表示中的信息量。我们证明,仅通过改变主干网络最后一个模块的维度来调整其表示大小,是一种缓解预训练偏差的显著有效技术。该方法能显著提升自监督与监督预训练模型的下游迁移性能。