There has been a recent explosion in research into machine-learning-based generative modeling to tackle computational challenges for simulations in high energy physics (HEP). In order to use such alternative simulators in practice, we need well-defined metrics to compare different generative models and evaluate their discrepancy from the true distributions. We present the first systematic review and investigation into evaluation metrics and their sensitivity to failure modes of generative models, using the framework of two-sample goodness-of-fit testing, and their relevance and viability for HEP. Inspired by previous work in both physics and computer vision, we propose two new metrics, the Fr\'echet and kernel physics distances (FPD and KPD, respectively), and perform a variety of experiments measuring their performance on simple Gaussian-distributed, and simulated high energy jet datasets. We find FPD, in particular, to be the most sensitive metric to all alternative jet distributions tested and recommend its adoption, along with the KPD and Wasserstein distances between individual feature distributions, for evaluating generative models in HEP. We finally demonstrate the efficacy of these proposed metrics in evaluating and comparing a novel attention-based generative adversarial particle transformer to the state-of-the-art message-passing generative adversarial network jet simulation model. The code for our proposed metrics is provided in the open source JetNet Python library.
翻译:近年来,基于机器学习的生成建模研究急剧增长,以应对高能物理(HEP)模拟中的计算挑战。为了在实践中应用此类替代模拟器,我们需要明确定义的指标来比较不同生成模型,并评估它们与真实分布的差异。我们首次系统性地综述并研究了评估指标及其对生成模型失效模式的敏感性,采用两样本拟合优度检验框架,并探讨了这些指标在HEP中的适用性和可行性。受物理学和计算机视觉领域先前工作的启发,我们提出了两个新指标——弗雷歇物理距离(FPD)和核物理距离(KPD),并通过一系列实验测量它们在简单高斯分布数据集和模拟高能喷注数据集上的表现。我们发现,FPD尤其对测试的所有替代喷注重分布最为敏感,建议将其与KPD及各特征分布之间的Wasserstein距离共同用于评估HEP生成模型。最后,我们展示了这些提议指标在评估和比较新型基于注意力的生成对抗粒子变换器与最先进的消息传递生成对抗网络喷注模拟模型时的有效性。我们提议指标的代码已开源在JetNet Python库中。