Existing methods for evaluating graph generative models primarily rely on Maximum Mean Discrepancy (MMD) metrics based on graph descriptors. While these metrics can rank generative models, they do not provide an absolute measure of performance. Their values are also highly sensitive to extrinsic parameters, namely kernel and descriptor parametrization, making them incomparable across different graph descriptors. We introduce PolyGraph Discrepancy (PGD), a new evaluation framework that addresses these limitations. It approximates the Jensen-Shannon distance of graph distributions by fitting binary classifiers to distinguish between real and generated graphs, featurized by these descriptors. The data log-likelihood of these classifiers approximates a variational lower bound on the JS distance between the two distributions. Resulting metrics are constrained to the unit interval [0,1] and are comparable across different graph descriptors. We further derive a theoretically grounded summary metric that combines these individual metrics to provide a maximally tight lower bound on the distance for the given descriptors. Thorough experiments demonstrate that PGD provides a more robust and insightful evaluation compared to MMD metrics. The PolyGraph framework for benchmarking graph generative models is made publicly available at https://github.com/BorgwardtLab/polygraph-benchmark.
翻译:现有评估图生成模型的方法主要依赖于基于图描述符的最大均值差异(MMD)度量。虽然这些度量能够对生成模型进行排序,但无法提供绝对性能指标。其数值对外部参数(即核函数与描述符参数化)高度敏感,导致不同图描述符之间的结果不可比较。本文提出PolyGraph差异度(PGD),一种新的评估框架以解决这些局限性。该方法通过训练二元分类器来区分经描述符特征化的真实图与生成图,从而近似图分布的Jensen-Shannon距离。这些分类器的数据对数似然近似于两个分布间JS距离的变分下界。所得度量值被约束在单位区间[0,1]内,且可在不同图描述符间进行比较。我们进一步推导出具有理论依据的汇总度量,通过组合这些独立度量来为给定描述符提供最大紧致的分布距离下界。系统实验表明,与MMD度量相比,PGD能提供更鲁棒且更具洞察力的评估。用于图生成模型基准测试的PolyGraph框架已公开于https://github.com/BorgwardtLab/polygraph-benchmark。