We propose a robust and reliable evaluation metric for generative models by introducing topological and statistical treatments for rigorous support estimation. Existing metrics, such as Inception Score (IS), Frechet Inception Distance (FID), and the variants of Precision and Recall (P&R), heavily rely on supports that are estimated from sample features. However, the reliability of their estimation has not been seriously discussed (and overlooked) even though the quality of the evaluation entirely depends on it. In this paper, we propose Topological Precision and Recall (TopP&R, pronounced 'topper'), which provides a systematic approach to estimating supports, retaining only topologically and statistically important features with a certain level of confidence. This not only makes TopP&R strong for noisy features, but also provides statistical consistency. Our theoretical and experimental results show that TopP&R is robust to outliers and non-independent and identically distributed (Non-IID) perturbations, while accurately capturing the true trend of change in samples. To the best of our knowledge, this is the first evaluation metric focused on the robust estimation of the support and provides its statistical consistency under noise.
翻译:我们提出一种面向生成模型的鲁棒可靠评估指标,通过引入拓扑与统计方法实现严格的支撑估计。现有指标(如Inception Score、Frechet Inception Distance及各类Precision与Recall变体)严重依赖从样本特征中估计的支撑集。然而,尽管评估质量完全取决于支撑估计的可靠性,其可信度却鲜有深入探讨(甚至被忽视)。本文提出拓扑精度与召回率(TopP&R,发音为'topper'),该指标提供系统性的支撑估计方法,仅保留具有统计学显著性的拓扑重要特征,并具备特定置信水平。这不仅使TopP&R对噪声特征具有强鲁棒性,同时确保了统计一致性。理论与实验结果表明,TopP&R对异常值和非独立同分布扰动均具有鲁棒性,并能准确捕捉样本变化的真实趋势。据我们所知,这是首个聚焦于支撑鲁棒估计的评估指标,并证明了其在噪声环境下的统计一致性。