Many recent works in simulation-based inference (SBI) rely on deep generative models to approximate complex, high-dimensional posterior distributions. However, evaluating whether or not these approximations can be trusted remains a challenge. Most approaches evaluate the posterior estimator only in expectation over the observation space. This limits their interpretability and is not sufficient to identify for which observations the approximation can be trusted or should be improved. Building upon the well-known classifier two-sample test (C2ST), we introduce L-C2ST, a new method that allows for a local evaluation of the posterior estimator at any given observation. It offers theoretically grounded and easy to interpret - e.g. graphical - diagnostics, and unlike C2ST, does not require access to samples from the true posterior. In the case of normalizing flow-based posterior estimators, L-C2ST can be specialized to offer better statistical power, while being computationally more efficient. On standard SBI benchmarks, L-C2ST provides comparable results to C2ST and outperforms alternative local approaches such as coverage tests based on highest predictive density (HPD). We further highlight the importance of local evaluation and the benefit of interpretability of L-C2ST on a challenging application from computational neuroscience.
翻译:近年来,许多基于模拟推理(SBI)的工作依赖深度生成模型来逼近复杂的高维后验分布。然而,评估这些逼近结果是否可信仍是一大挑战。大多数方法仅通过观测空间的期望来评估后验估计器,这限制了其可解释性,且不足以确定哪些观测值的逼近结果是可信或需要改进的。基于经典的分类器双样本检验(C2ST),我们提出L-C2ST——一种允许对任意给定观测值进行局部后验估计器评估的新方法。该方法提供具有理论基础且易于解释(例如图形化)的诊断结果,与C2ST不同,它无需访问真实后验的样本。针对基于归一化流的后验估计器,L-C2ST可被特化以在提升统计效能的同时保持计算高效性。在标准SBI基准测试中,L-C2ST与C2ST结果相当,且优于基于最高预测密度(HPD)的覆盖检验等替代局部方法。我们进一步通过计算神经科学领域的一项挑战性应用,强调了局部评估的重要性及L-C2ST可解释性的优势。