Although contrastive learning methods have shown prevailing performance on a variety of representation learning tasks, they encounter difficulty when the training dataset is long-tailed. Many researchers have combined contrastive learning and a logit adjustment technique to address this problem, but the combinations are done ad-hoc and a theoretical background has not yet been provided. The goal of this paper is to provide the background and further improve the performance. First, we show that the fundamental reason contrastive learning methods struggle with long-tailed tasks is that they try to maximize the mutual information maximization between latent features and input data. As ground-truth labels are not considered in the maximization, they are not able to address imbalances between class labels. Rather, we interpret the long-tailed recognition task as a mutual information maximization between latent features and ground-truth labels. This approach integrates contrastive learning and logit adjustment seamlessly to derive a loss function that shows state-of-the-art performance on long-tailed recognition benchmarks. It also demonstrates its efficacy in image segmentation tasks, verifying its versatility beyond image classification.
翻译:尽管对比学习方法在各类表示学习任务中表现出色,但当训练数据集呈现长尾分布时,其性能面临显著挑战。许多研究者尝试将对比学习与logit调整技术相结合以解决该问题,但这些组合方法缺乏理论依据且存在临时拼凑的问题。本文旨在提供理论基础并进一步提升性能。首先,我们论证对比学习方法在长尾任务中表现不佳的根本原因在于其试图最大化潜在特征与输入数据之间的互信息。由于该最大化过程未考虑真实标签信息,模型无法有效处理类别标签的不平衡问题。为此,我们将长尾识别任务重新诠释为潜在特征与真实标签之间的互信息最大化。该方法将对比学习与logit调整无缝融合,推导出的损失函数在长尾识别基准测试中实现了最先进性能。进一步地,该方法在图像分割任务中的有效性验证了其超越图像分类领域的普适性。