Prototypical self-supervised learning methods consistently suffer from partial prototype collapse, where multiple prototypes converge to nearly identical representations. This undermines their central purpose -- providing diverse and informative targets to guide encoders toward rich representations -- and has led practitioners to over-parameterize prototype sets or add ad-hoc regularizers, which mitigate symptoms rather than address the root cause. We empirically trace the collapse to the joint optimization of encoders and prototypes, which encourages a type of shortcut learning: early in training prototypes drift toward redundant representations that minimize loss without necessarily enhancing representation diversity. To break the joint optimization, we introduce a fully decoupled training strategy that learns prototypes and encoders under separate objectives. Concretely, we model prototypes as a Gaussian mixture updated with an online EM-style procedure, independent of the encoder's loss. This simple yet principled decoupling eliminates prototype collapse without explicit regularization and yields consistently diverse prototypes and stronger downstream performance.
翻译:原型自监督学习方法普遍存在部分原型坍缩问题,即多个原型收敛至几乎完全相同的表征。这破坏了其核心目标——为编码器提供多样且信息丰富的目标以引导其学习丰富表征——并导致实践者过度参数化原型集或添加临时正则化项,这些方法仅能缓解症状而未能解决根本原因。我们通过实证研究将坍缩归因于编码器与原型的联合优化过程,该过程助长了一种捷径学习:在训练早期,原型会漂移至冗余的表征,这些表征虽能最小化损失却未必提升表征多样性。为打破联合优化,我们提出一种完全解耦的训练策略,使原型与编码器在各自独立的目标下学习。具体而言,我们将原型建模为通过在线EM风格过程更新的高斯混合模型,该过程独立于编码器的损失函数。这种简单而原理清晰的解耦方法无需显式正则化即可消除原型坍缩,持续产生多样化的原型并提升下游任务性能。