We propose a functional evaluation metric for generative models based on the relative density ratio (RDR) designed to characterize distributional differences between real and generated samples. We show that the RDR as a functional summary of the goodness-of-fit for the generative model, possesses several desirable theoretical properties. It preserves $\phi$-divergence between two distributions, enables sample-level evaluation that facilitates downstream investigations of feature-specific distributional differences, and has a bounded range that affords clear interpretability and numerical stability. Functional estimation of the RDR is achieved efficiently through convex optimization on the variational form of $\phi$-divergence. We provide theoretical convergence rate guarantees for general estimators based on M-estimator theory, as well as the convergence rates of neural network-based estimators when the true ratio is in the anisotropic Besov space. We demonstrate the power of the proposed RDR-based evaluation through numerical experiments on MNIST, CelebA64, and the American Gut project microbiome data. We show that the estimated RDR not only allows for an effective comparison of the overall performance of competing generative models, but it can also offer a convenient means of revealing the nature of the underlying goodness-of-fit. This enables one to assess support overlap, coverage, and fidelity while pinpointing regions of the sample space where generators concentrate and revealing the features that drive the most salient distributional differences.
翻译:我们提出了一种基于相对密度比(RDR)的生成模型功能性评估指标,旨在刻画真实样本与生成样本之间的分布差异。我们证明,RDR作为生成模型拟合优度的函数性概括,具有若干理想的理论性质:它保持两个分布之间的φ-散度,支持样本级评估以促进特征特异性分布差异的下游研究,且具有有界范围,从而提供清晰的解释性和数值稳定性。RDR的函数估计通过φ-散度变分形式的凸优化高效实现。基于M-估计理论,我们为一般估计量提供了理论收敛速率保证,并给出了当真实比率处于各向异性Besov空间时基于神经网络估计量的收敛速率。通过在MNIST、CelebA64和美国肠道计划微生物组数据上的数值实验,我们展示了所提出的基于RDR的评估方法的强大能力。实验表明,估计的RDR不仅能有效比较竞争生成模型的整体性能,还能提供便捷手段以揭示潜在拟合优度的本质特性。这使得研究者能够评估支持集重叠度、覆盖度和保真度,同时精确定位生成器集中分布的样本空间区域,并揭示导致最显著分布差异的特征驱动因素。