Amortized Bayesian model comparison (BMC) enables fast probabilistic ranking of models via simulation-based training of neural surrogates. However, the accuracy of neural surrogates deteriorates when simulation models are misspecified; the very case where model comparison is most needed. We evaluate four different amortized BMC methods. We supplement traditional simulation-based training of these methods with a \emph{self-consistency} (SC) loss on unlabeled real data to improve BMC estimates under distribution shifts. Using one artificial and two real-world case studies, we compare amortized BMC estimators with and without SC against analytic or bridge sampling benchmarks. In the \emph{closed-world} case (data is generated by one of the candidate models), BMC estimators using classifiers work acceptably well even without SC training. However, these methods also benefit the least from SC training. In the \emph{open-world} scenario (all models misspecified), SC training strongly improves BMC estimators when having access to analytic likelihoods, or when surrogate likelihoods are locally accurate near the true parameter posterior, even for severely misspecified models. We conclude with practical recommendations for amortized BMC and suggestions for future research.
翻译:摊销贝叶斯模型比较(BMC)通过基于模拟训练的神经代理实现模型的快速概率排序。然而,当模拟模型被错误设定时——这正是模型比较最需要的场景——神经代理的精度会下降。我们评估了四种不同的摊销BMC方法。在传统基于模拟训练的基础上,我们补充了针对未标注真实数据的自一致性(self-consistency, SC)损失,以提升分布偏移下的BMC估计。通过一项人工案例与两项真实案例研究,我们比较了有无SC训练的摊销BMC估计器与解析或桥接采样基准。在封闭世界情景(数据由候选模型之一生成)中,使用分类器的BMC估计器即使无SC训练也表现尚可,但这类方法从SC训练中获益最少。在开放世界情景(所有模型均被错误设定)中,当具备解析似然函数,或代理似然在真实参数后验附近局部精确时——即使面对严重错误设定的模型——SC训练能大幅提升BMC估计器性能。我们最终提出摊销BMC的实用建议及未来研究方向。