The semantic similarity between sample expressions measures the distance between their latent 'meaning'. These meanings are themselves typically represented by textual expressions. We propose a novel approach whereby the semantic similarity among textual expressions is based not on other expressions they can be rephrased as, but rather based on the imagery they evoke. While this is not possible with humans, generative models allow us to easily visualize and compare generated images, or their distribution, evoked by a textual prompt. Therefore, we characterize the semantic similarity between two textual expressions simply as the distance between image distributions they induce, or 'conjure.' We show that by choosing the Jeffreys divergence between the reverse-time diffusion stochastic differential equations (SDEs) induced by each textual expression, this can be directly computed via Monte-Carlo sampling. Our method contributes a novel perspective on semantic similarity that not only aligns with human-annotated scores, but also opens up new avenues for the evaluation of text-conditioned generative models while offering better interpretability of their learnt representations.
翻译:样本表达之间的语义相似性衡量其潜在“意义”之间的距离。这些意义本身通常由文本表达表示。我们提出一种新方法,其中文本表达之间的语义相似性并非基于它们可改写成的其他表达,而是基于它们所唤起的意象。虽然这对于人类而言不可行,但生成模型使我们能够轻松可视化并比较由文本提示所唤起的生成图像或其分布。因此,我们将两个文本表达之间的语义相似性简单地表征为它们所诱发或“召唤”的图像分布之间的距离。通过选择由每个文本表达所诱发的反向时间扩散随机微分方程之间的杰弗里斯散度,我们可以通过蒙特卡洛采样直接计算这一距离。我们的方法为语义相似性提供了新视角,不仅与人工标注的分数相一致,还为文本条件生成模型的评估开辟了新途径,同时提供了对其学习表征的更好可解释性。