While InfoNCE powers modern contrastive learning, its geometric mechanisms remain under-characterized beyond the canonical alignment--uniformity decomposition. We present a measure-theoretic framework that models learning as the evolution of representation measures on a fixed embedding manifold. By establishing value and gradient consistency in the large-batch limit, we bridge the stochastic objective to explicit deterministic energy landscapes, uncovering a fundamental geometric bifurcation between the unimodal and multimodal regimes. In the unimodal setting, the intrinsic landscape is strictly convex with a unique Gibbs equilibrium; here, entropy acts merely as a tie-breaker, clarifying "uniformity" as a constrained expansion within the alignment basin. In contrast, the symmetric multimodal objective contains a persistent negative symmetric divergence term that remains even after kernel sharpening. We show that this term induces barrier-driven co-adaptation, enforcing a population-level modality gap as a structural geometric necessity rather than an initialization artifact. Our results shift the analytical lens from pointwise discrimination to population geometry, offering a principled basis for diagnosing and controlling distributional misalignment.
翻译:尽管InfoNCE驱动着现代对比学习,其几何机制在经典的对齐-均匀性分解之外仍缺乏充分刻画。我们提出了一个测度论框架,将学习过程建模为固定嵌入流形上表示测度的演化。通过在大批次极限下建立值与梯度的一致性,我们将随机目标与显式确定性能量景观相连接,揭示了单模态与多模态机制之间的基本几何分岔。在单模态设定中,内在景观是严格凸的且具有唯一的吉布斯平衡态;此时熵仅作为平局打破者,从而将“均匀性”澄清为对齐盆地内的一种约束扩张。相反,对称多模态目标包含一个持续的负对称散度项,该项即使在核锐化后依然存在。我们证明该项会引发势垒驱动的协同适应,迫使群体层面的模态间隙成为一种结构几何必然,而非初始化伪影。我们的研究将分析视角从逐点判别转向群体几何,为诊断和控制分布失准提供了原理性基础。