Although contrastive learning methods have shown prevailing performance on a variety of representation learning tasks, they encounter difficulty when the training dataset is long-tailed. Many researchers have combined contrastive learning and a logit adjustment technique to address this problem, but the combinations are done ad-hoc and a theoretical background has not yet been provided. The goal of this paper is to provide the background and further improve the performance. First, we show that the fundamental reason contrastive learning methods struggle with long-tailed tasks is that they try to maximize the mutual information maximization between latent features and input data. As ground-truth labels are not considered in the maximization, they are not able to address imbalances between class labels. Rather, we interpret the long-tailed recognition task as a mutual information maximization between latent features and ground-truth labels. This approach integrates contrastive learning and logit adjustment seamlessly to derive a loss function that shows state-of-the-art performance on long-tailed recognition benchmarks. It also demonstrates its efficacy in image segmentation tasks, verifying its versatility beyond image classification.
翻译:尽管对比学习方法在多种表示学习任务上展现出优异性能,但当训练数据集呈现长尾分布时,它们面临困难。许多研究者将对比学习与逻辑调整技术结合以解决该问题,但此类结合多为临时性手段,且缺乏理论依据支持。本文旨在提供理论依据并进一步提升性能。首先,我们揭示对比学习方法在长尾任务中表现不佳的根本原因在于其试图最大化潜在特征与输入数据之间的互信息。由于未将真实标签纳入最大化过程,这类方法无法应对类别标签间的不平衡。为此,我们将长尾识别任务重新诠释为潜在特征与真实标签间的互信息最大化问题。该框架自然地融合了对比学习与逻辑调整方法,推导出的损失函数在长尾识别基准测试中展现出最先进性能。此外,该方法在图像分割任务中同样验证了有效性,证实其超越图像分类领域的泛化能力。