Contrastive learning has become a leading paradigm for self-supervised representation learning, yet the conditions under which it recovers meaningful latent geometry remain incompletely understood. We develop a measure-theoretic framework formalizing the diversity condition, a support requirement on positive-pair sampling that is necessary for isometric latent recovery. We show that the standard full-support von Mises-Fisher setting implies the satisfaction of the diversity condition and as a consequence global contrastive loss minimizers recover latent geometry up to orthogonal transformation, while restricted conditionals can make non-orthogonal maps attain strictly lower asymptotic contrastive loss. We introduce a support-corrected Information Noise Contrastive Estimation (InfoNCE) variant as a theoretical fix: this correction makes orthogonal latent space recovery achievable but does not uniquely select it. Experiments on synthetic benchmarks validate the identifiability predictions, and CIFAR-10 experiments are consistent with the qualitative prediction that architectural inductive bias becomes more important when sampling diversity is limited. Together, our results clarify how sampling mechanisms and encoder inductive bias interact in contrastive representation learning.
翻译:对比学习已成为自监督表示学习的主导范式,但其恢复有意义潜在几何结构的条件尚未完全理解。我们构建了一个测度论框架,将多样性条件形式化——该条件要求正样本对的支撑集满足等距潜在恢复的必要性。研究表明:标准全支撑von Mises-Fisher设定隐含满足多样性条件,因此全局对比损失最小化器可恢复至正交变换下的潜在几何;而受限条件分布则可能使非正交映射获得严格更低的渐近对比损失。我们引入支撑校正信息噪声对比估计变体作为理论修正方案:该校正使正交潜在空间恢复成为可能,但无法唯一确定该空间。合成基准实验验证了可辨识性预测,CIFAR-10实验结果与以下定性预测一致:当采样多样性受限时,架构归纳偏置的重要性增强。综合而言,我们的结果阐明了对比表示学习中采样机制与编码器归纳偏置的交互机制。