To check the accuracy of Bayesian computations, it is common to use rank-based simulation-based calibration (SBC). However, SBC has drawbacks: The test statistic is somewhat ad-hoc, interactions are difficult to examine, multiple testing is a challenge, and the resulting p-value is not a divergence metric. We propose to replace the marginal rank test with a flexible classification approach that learns test statistics from data. This measure typically has a higher statistical power than the SBC rank test and returns an interpretable divergence measure of miscalibration, computed from classification accuracy. This approach can be used with different data generating processes to address likelihood-free inference or traditional inference methods like Markov chain Monte Carlo or variational inference. We illustrate an automated implementation using neural networks and statistically-inspired features, and validate the method with numerical and real data experiments.
翻译:为检验贝叶斯计算的准确性,通常采用基于秩次的经验模拟校准(SBC)方法。然而SBC存在以下缺陷:检验统计量具有特定性,交互效应难以检测,多重比较面临挑战,且所得p值并非散度度量。本文提出用基于数据的灵活分类方法替代边际秩次检验来学习检验统计量。该测度通常比SBC秩次检验具有更高的统计功效,并能通过分类准确度输出可解释的误校准散度度量指标。该方法可适配不同数据生成过程,用于处理隐似然推断或传统推断方法(如马尔可夫链蒙特卡洛或变分推断)。我们展示了基于神经网络与统计启发式特征的自动化实现方案,并通过数值实验与真实数据实验验证了该方法的有效性。