What do different contrastive learning (CL) losses actually optimize for? Although multiple CL methods have demonstrated remarkable representation learning capabilities, the differences in their inner workings remain largely opaque. In this work, we analyse several CL families and prove that, under certain conditions, they admit the same minimisers when optimizing either their batch-level objectives or their expectations asymptotically. In both cases, an intimate connection with the hyperspherical energy minimisation (HEM) problem resurfaces. Drawing inspiration from this, we introduce a novel CL objective, coined Decoupled Hyperspherical Energy Loss (DHEL). DHEL simplifies the problem by decoupling the target hyperspherical energy from the alignment of positive examples while preserving the same theoretical guarantees. Going one step further, we show the same results hold for another relevant CL family, namely kernel contrastive learning (KCL), with the additional advantage of the expected loss being independent of batch size, thus identifying the minimisers in the non-asymptotic regime. Empirical results demonstrate improved downstream performance and robustness across combinations of different batch sizes and hyperparameters and reduced dimensionality collapse, on several computer vision datasets.
翻译:不同的对比学习(CL)损失函数究竟优化了什么目标?尽管多种CL方法已展现出卓越的表征学习能力,但其内部工作机制的差异在很大程度上仍不透明。在本工作中,我们分析了几个CL家族,并证明在一定条件下,无论优化其批量级目标还是渐近期望目标,它们都拥有相同的最小化器。在这两种情况下,均重新显现出与超球面能量最小化(HEM)问题的内在联系。受此启发,我们提出了一种新颖的CL目标,称为解耦超球面能量损失(DHEL)。DHEL通过将目标超球面能量与正样本对齐解耦来简化问题,同时保持相同的理论保证。更进一步,我们证明相同结论适用于另一个相关的CL家族——即核对比学习(KCL),其额外优势在于期望损失与批量大小无关,从而能在非渐进状态下识别最小化器。在多个计算机视觉数据集上的实证结果表明,该方法在不同批量大小与超参数组合下均能提升下游任务性能与鲁棒性,并缓解维度坍缩问题。