We propose a robust and reliable evaluation metric for generative models by introducing topological and statistical treatments for rigorous support estimation. Existing metrics, such as Inception Score (IS), Frechet Inception Distance (FID), and the variants of Precision and Recall (P&R), heavily rely on supports that are estimated from sample features. However, the reliability of their estimation has not been seriously discussed (and overlooked) even though the quality of the evaluation entirely depends on it. In this paper, we propose Topological Precision and Recall (TopP&R, pronounced 'topper'), which provides a systematic approach to estimating supports, retaining only topologically and statistically important features with a certain level of confidence. This not only makes TopP&R strong for noisy features, but also provides statistical consistency. Our theoretical and experimental results show that TopP&R is robust to outliers and non-independent and identically distributed (Non-IID) perturbations, while accurately capturing the true trend of change in samples. To the best of our knowledge, this is the first evaluation metric focused on the robust estimation of the support and provides its statistical consistency under noise.
翻译:我们提出了一种鲁棒且可靠的生成模型评估指标,通过引入拓扑和统计处理方法进行严谨的支持估计。现有指标,如Inception Score(IS)、Frechet Inception Distance(FID)以及精确率和召回率的各种变体(P&R),严重依赖于从样本特征中估计的支持。然而,其估计的可靠性尚未得到认真讨论(且被忽视),尽管评估质量完全取决于此。在本文中,我们提出了拓扑精确率和召回率(TopP&R,发音为'topper'),该方法提供了一种系统性的支持估计途径,仅保留在拓扑和统计上具有重要性且具有一定置信水平的特征。这不仅使TopP&R对噪声特征具有鲁棒性,还提供了统计一致性。我们的理论和实验结果表明,TopP&R对异常值和独立同分布(Non-IID)扰动具有鲁棒性,同时能准确捕捉样本变化的真实趋势。据我们所知,这是首个专注于支持稳健估计的评估指标,并能在噪声下保证其统计一致性。