We study the discriminative probabilistic modeling on a continuous domain for the data prediction task of (multimodal) self-supervised representation learning. To address the challenge of computing the integral in the partition function for each anchor data, we leverage the multiple importance sampling (MIS) technique for robust Monte Carlo integration, which can recover InfoNCE-based contrastive loss as a special case. Within this probabilistic modeling framework, we conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning and derive insights for developing better approaches by reducing the error of Monte Carlo integration. To this end, we propose a novel non-parametric method for approximating the sum of conditional probability densities required by MIS through convex optimization, yielding a new contrastive objective for self-supervised representation learning. Moreover, we design an efficient algorithm for solving the proposed objective. We empirically compare our algorithm to representative baselines on the contrastive image-language pretraining task. Experimental results on the CC3M and CC12M datasets demonstrate the superior overall performance of our algorithm. Our code is available at https://github.com/bokun-wang/NUCLR.
翻译:本研究探讨了在连续域上进行判别式概率建模,以解决(多模态)自监督表示学习中的数据预测任务。为应对计算每个锚点数据配分函数中积分的挑战,我们采用多重重要性采样(MIS)技术进行鲁棒的蒙特卡洛积分,该方法可将基于InfoNCE的对比损失作为特例进行还原。在此概率建模框架内,我们通过泛化误差分析揭示了当前基于InfoNCE的对比损失在自监督表示学习中的局限性,并通过降低蒙特卡洛积分误差为开发更优方法提供了理论依据。为此,我们提出一种新颖的非参数方法,通过凸优化逼近MIS所需的条件概率密度之和,从而构建出新的自监督表示学习对比目标函数。此外,我们设计了高效算法来求解所提出的目标函数。我们在对比式图像-语言预训练任务中,将所提算法与代表性基线方法进行了实证比较。在CC3M和CC12M数据集上的实验结果表明,我们的算法具有优越的综合性能。代码已开源:https://github.com/bokun-wang/NUCLR。