Contrastively trained encoders have recently been proven to invert the data-generating process: they encode each input, e.g., an image, into the true latent vector that generated the image (Zimmermann et al., 2021). However, real-world observations often have inherent ambiguities. For instance, images may be blurred or only show a 2D view of a 3D object, so multiple latents could have generated them. This makes the true posterior for the latent vector probabilistic with heteroscedastic uncertainty. In this setup, we extend the common InfoNCE objective and encoders to predict latent distributions instead of points. We prove that these distributions recover the correct posteriors of the data-generating process, including its level of aleatoric uncertainty, up to a rotation of the latent space. In addition to providing calibrated uncertainty estimates, these posteriors allow the computation of credible intervals in image retrieval. They comprise images with the same latent as a given query, subject to its uncertainty. Code is available at https://github.com/mkirchhof/Probabilistic_Contrastive_Learning
翻译:最近证明,经过对比训练的编码器能够反转数据生成过程:它们将每个输入(例如图像)编码为生成该图像的真实潜在向量(Zimmermann et al., 2021)。然而,现实世界的观测往往存在固有的模糊性。例如,图像可能模糊不清或仅显示三维物体的二维视图,因此多个潜在变量都可能生成它们。这使得潜在向量的真实后验分布具有异方差不确定性。在此框架下,我们将常见的InfoNCE目标和编码器扩展为预测潜在分布而非点估计。我们证明,这些分布能够恢复数据生成过程的正确后验分布(包括其偶然不确定性水平),仅受潜在空间旋转的影响。除了提供校准的不确定性估计外,这些后验分布还允许在图像检索中计算可信区间。它们包含与给定查询具有相同潜在变量(受其不确定性影响)的图像。代码见 https://github.com/mkirchhof/Probabilistic_Contrastive_Learning