With the recent success of generative models in image and text, the evaluation of generative models has gained a lot of attention. Whereas most generative models are compared in terms of scalar values such as Frechet Inception Distance (FID) or Inception Score (IS), in the last years (Sajjadi et al., 2018) proposed a definition of precision-recall curve to characterize the closeness of two distributions. Since then, various approaches to precision and recall have seen the light (Kynkaanniemi et al., 2019; Naeem et al., 2020; Park & Kim, 2023). They center their attention on the extreme values of precision and recall, but apart from this fact, their ties are elusive. In this paper, we unify most of these approaches under the same umbrella, relying on the work of (Simon et al., 2019). Doing so, we were able not only to recover entire curves, but also to expose the sources of the accounted pitfalls of the concerned metrics. We also provide consistency results that go well beyond the ones presented in the corresponding literature. Last, we study the different behaviors of the curves obtained experimentally.
翻译:随着生成模型在图像和文本领域的最新成功,生成模型的评估受到了广泛关注。尽管大多数生成模型通过标量值(如Frechet Inception Distance (FID) 或 Inception Score (IS))进行比较,但近年来(Sajjadi 等,2018)提出了精确率-召回率曲线的定义,用于表征两个分布之间的接近程度。此后,多种精确率和召回率方法相继出现(Kynkaanniemi 等,2019;Naeem 等,2020;Park & Kim,2023)。这些方法重点关注精确率和召回率的极端值,但除此之外,它们之间的联系尚不明确。本文基于(Simon 等,2019)的工作,将大部分此类方法统一在同一框架下。通过这样做,我们不仅能够恢复完整的曲线,还能揭示相关指标已知缺陷的根源。我们还提供了一致性结果,其范围远超相关文献中给出的结果。最后,我们研究了实验获得的曲线的不同行为。