Precision and Recall are two prominent metrics of generative performance, which were proposed to separately measure the fidelity and diversity of generative models. Given their central role in comparing and improving generative models, understanding their limitations are crucially important. To that end, in this work, we identify a critical flaw in the common approximation of these metrics using k-nearest-neighbors, namely, that the very interpretations of fidelity and diversity that are assigned to Precision and Recall can fail in high dimensions, resulting in very misleading conclusions. Specifically, we empirically and theoretically show that as the number of dimensions grows, two model distributions with supports at equal point-wise distance from the support of the real distribution, can have vastly different Precision and Recall regardless of their respective distributions, hence an emergent asymmetry in high dimensions. Based on our theoretical insights, we then provide simple yet effective modifications to these metrics to construct symmetric metrics regardless of the number of dimensions. Finally, we provide experiments on real-world datasets to illustrate that the identified flaw is not merely a pathological case, and that our proposed metrics are effective in alleviating its impact.
翻译:精确率(Precision)与召回率(Recall)是生成性能的两大关键指标,分别用于衡量生成模型的保真度与多样性。鉴于其在生成模型比较与优化中的核心作用,理解其局限性至关重要。为此,本研究揭示了基于k近邻的指标近似方法存在关键缺陷:原本赋予精确率与召回率的"保真度"与"多样性"解释在高维空间中可能失效,导致极具误导性的结论。具体而言,我们通过理论与实验证明:随着维度增加,即使两个模型分布与真实分布支撑集具有相等的逐点距离,其精确率与召回率仍可能因各自分布特性而产生显著差异,从而在高维空间涌现非对称性。基于理论洞见,我们提出简洁有效的指标修正方案,使指标在不同维度下保持对称性。实际数据集实验表明,该缺陷并非病理特例,所提指标能有效缓解其影响。