In representation learning, regression has traditionally received less attention than classification. Directly applying representation learning techniques designed for classification to regression often results in fragmented representations in the latent space, yielding sub-optimal performance. In this paper, we argue that the potential of contrastive learning for regression has been overshadowed due to the neglect of two crucial aspects: ordinality-awareness and hardness. To address these challenges, we advocate "mixup your own contrastive pairs for supervised contrastive regression", instead of relying solely on real/augmented samples. Specifically, we propose Supervised Contrastive Learning for Regression with Mixup (SupReMix). It takes anchor-inclusive mixtures (mixup of the anchor and a distinct negative sample) as hard negative pairs and anchor-exclusive mixtures (mixup of two distinct negative samples) as hard positive pairs at the embedding level. This strategy formulates harder contrastive pairs by integrating richer ordinal information. Through extensive experiments on six regression datasets including 2D images, volumetric images, text, tabular data, and time-series signals, coupled with theoretical analysis, we demonstrate that SupReMix pre-training fosters continuous ordered representations of regression data, resulting in significant improvement in regression performance. Furthermore, SupReMix is superior to other approaches in a range of regression challenges including transfer learning, imbalanced training data, and scenarios with fewer training samples.
翻译:在表示学习中,回归任务传统上受到的关注少于分类任务。直接应用为分类设计的表示学习技术到回归中,往往导致潜在空间中的表示碎片化,从而产生次优性能。本文提出,对比学习在回归中的潜力因忽视了序数感知与困难性这两个关键方面而未被充分发掘。为解决这些挑战,我们倡导“为监督对比回归混合您自己的对比对”,而非仅依赖真实/增强样本。具体而言,我们提出了基于混合的监督对比回归学习(SupReMix)。该方法在嵌入层面将包含锚点的混合样本(锚点与不同负样本的混合)作为困难负对,将排除锚点的混合样本(两个不同负样本的混合)作为困难正对。该策略通过整合更丰富的序数信息构建了更困难的对比对。通过在六个回归数据集(涵盖二维图像、三维图像、文本、表格数据及时序信号)上的大量实验及理论分析,我们证明SupReMix预训练能够促进回归数据的连续有序表示,显著提升回归性能。此外,SupReMix在迁移学习、不平衡训练数据及少样本场景等各类回归挑战中均优于其他方法。