Contrastive Analysis is a sub-field of Representation Learning that aims at separating common factors of variation between two datasets, a background (i.e., healthy subjects) and a target (i.e., diseased subjects), from the salient factors of variation, only present in the target dataset. Despite their relevance, current models based on Variational Auto-Encoders have shown poor performance in learning semantically-expressive representations. On the other hand, Contrastive Representation Learning has shown tremendous performance leaps in various applications (classification, clustering, etc.). In this work, we propose to leverage the ability of Contrastive Learning to learn semantically expressive representations well adapted for Contrastive Analysis. We reformulate it under the lens of the InfoMax Principle and identify two Mutual Information terms to maximize and one to minimize. We decompose the first two terms into an Alignment and a Uniformity term, as commonly done in Contrastive Learning. Then, we motivate a novel Mutual Information minimization strategy to prevent information leakage between common and salient distributions. We validate our method, called SepCLR, on three visual datasets and three medical datasets, specifically conceived to assess the pattern separation capability in Contrastive Analysis. Code available at https://github.com/neurospin-projects/2024_rlouiset_sep_clr.
翻译:对比分析是表示学习的一个子领域,旨在将两个数据集(背景集(如健康受试者)和目标集(如患病受试者))之间的共有变化因子与仅存在于目标数据集中的显著变化因子进行分离。尽管此类模型具有相关性,但当前基于变分自编码器的模型在学习语义表达性表示方面表现欠佳。另一方面,对比表示学习已在分类、聚类等多种应用中展现出显著的性能提升。本研究提出利用对比学习对语义表达性表示的学习能力,使其良好适配对比分析任务。我们从信息最大原则的视角对其进行重新表述,识别出两项需最大化的互信息项和一项需最小化的互信息项。遵循对比学习的常规做法,将前两项分解为对齐项和均匀性项。进而提出一种新颖的互信息最小化策略,以防止共性与显著分布之间的信息泄漏。我们在三个视觉数据集和三个医学数据集上验证了所提出的SepCLR方法,这些数据集专为评估对比分析中的模式分离能力而设计。代码见https://github.com/neurospin-projects/2024_rlouiset_sep_clr。