Contrastively trained encoders have recently been proven to invert the data-generating process: they encode each input, e.g., an image, into the true latent vector that generated the image (Zimmermann et al., 2021). However, real-world observations often have inherent ambiguities. For instance, images may be blurred or only show a 2D view of a 3D object, so multiple latents could have generated them. This makes the true posterior for the latent vector probabilistic with heteroscedastic uncertainty. In this setup, we extend the common InfoNCE objective and encoders to predict latent distributions instead of points. We prove that these distributions recover the correct posteriors of the data-generating process, including its level of aleatoric uncertainty, up to a rotation of the latent space. In addition to providing calibrated uncertainty estimates, these posteriors allow the computation of credible intervals in image retrieval. They comprise images with the same latent as a given query, subject to its uncertainty.
翻译:对比训练编码器近期被证明能够逆转数据生成过程:它们将每个输入(如一张图像)编码为生成该图像的真实潜在向量(Zimmermann et al., 2021)。然而,现实世界的观测通常存在固有模糊性。例如,图像可能模糊,或仅显示三维物体的二维视图,因此多个潜在变量可能生成了同一图像。这使得潜在向量的真实后验概率具有异方差的不确定性。在此设置下,我们将通用的InfoNCE目标与编码器扩展到预测潜在分布而非点估计。我们证明,这些分布能够恢复数据生成过程的正确后验概率(包括其偶然不确定性水平),且结果在潜在空间的旋转下保持不变。除了提供校准后的不确定性估计,这些后验概率还能在图像检索中计算可信区间。这些区间包含与给定查询具有相同潜在变量且受其不确定性影响的图像。