Existing methods for evaluating graph generative models primarily rely on Maximum Mean Discrepancy (MMD) metrics based on graph descriptors. While these metrics can rank generative models, they do not provide an absolute measure of performance. Their values are also highly sensitive to extrinsic parameters, namely kernel and descriptor parametrization, making them incomparable across different graph descriptors. We introduce PolyGraph Discrepancy (PGD), a new evaluation framework that addresses these limitations. It approximates the Jensen-Shannon distance of graph distributions by fitting binary classifiers to distinguish between real and generated graphs, featurized by these descriptors. The data log-likelihood of these classifiers approximates a variational lower bound on the JS distance between the two distributions. Resulting metrics are constrained to the unit interval [0,1] and are comparable across different graph descriptors. We further derive a theoretically grounded summary metric that combines these individual metrics to provide a maximally tight lower bound on the distance for the given descriptors. Thorough experiments demonstrate that PGD provides a more robust and insightful evaluation compared to MMD metrics. The PolyGraph framework for benchmarking graph generative models is made publicly available at https://github.com/BorgwardtLab/polygraph-benchmark.
翻译:现有评估图生成模型的方法主要依赖于基于图描述符的最大均值差异(MMD)指标。尽管这些指标能够对生成模型进行排序,但无法提供绝对性能度量。其数值对外部参数(即核函数与描述符参数化)高度敏感,导致不同图描述符间的结果不可比较。本文提出多图差异(PGD)这一新型评估框架以解决上述局限。该方法通过训练二元分类器区分经描述符特征化的真实图与生成图,从而近似图分布的詹森-香农距离。分类器的数据对数似然近似于两分布间JS距离的变分下界。所得指标被约束在单位区间[0,1]内,且支持不同图描述符间的横向比较。我们进一步推导出具有理论依据的汇总指标,通过整合各独立指标为给定描述符提供最紧致的距离下界。系统化实验表明,与MMD指标相比,PGD能提供更稳健且更具洞察力的评估。用于图生成模型基准测试的PolyGraph框架已在https://github.com/BorgwardtLab/polygraph-benchmark公开。