Many recent works in simulation-based inference (SBI) rely on deep generative models to approximate complex, high-dimensional posterior distributions. However, evaluating whether or not these approximations can be trusted remains a challenge. Most approaches evaluate the posterior estimator only in expectation over the observation space. This limits their interpretability and is not sufficient to identify for which observations the approximation can be trusted or should be improved. Building upon the well-known classifier two-sample test (C2ST), we introduce L-C2ST, a new method that allows for a local evaluation of the posterior estimator at any given observation. It offers theoretically grounded and easy to interpret -- e.g. graphical -- diagnostics, and unlike C2ST, does not require access to samples from the true posterior. In the case of normalizing flow-based posterior estimators, L-C2ST can be specialized to offer better statistical power, while being computationally more efficient. On standard SBI benchmarks, L-C2ST provides comparable results to C2ST and outperforms alternative local approaches such as coverage tests based on highest predictive density (HPD). We further highlight the importance of local evaluation and the benefit of interpretability of L-C2ST on a challenging application from computational neuroscience.
翻译:近年来,基于模拟推理(SBI)的许多工作依赖深度生成模型来逼近复杂高维后验分布。然而,评估这些逼近是否可信仍是一个挑战。现有方法大多仅在观测空间期望意义上评估后验估计器,这限制了其可解释性,且不足以识别哪些观测值的逼近值得信任或需要改进。基于经典分类器双样本检验(C2ST),我们提出L-C2ST——一种能在任意给定观测值处对后验估计器进行局部评估的新方法。该方法提供具有理论依据且易于解释(如图形化)的诊断结果,且无需像C2ST那样获取真实后验样本。对于基于归一化流的后验估计器,L-C2ST可进行专门优化以提升统计功效,同时计算效率更高。在标准SBI基准测试中,L-C2ST的结果与C2ST相当,并优于基于最高预测密度(HPD)的覆盖检验等替代局部方法。我们进一步通过计算神经科学领域的一项挑战性应用,强调了局部评估的重要性及L-C2ST可解释性的优势。