Amortized Bayesian inference (ABI) offers fast, scalable approximations to posterior densities by training neural surrogates on data simulated from the statistical model. However, ABI methods are highly sensitive to model misspecification: when observed data fall outside the training distribution (generative scope of the statistical models), neural surrogates can behave unpredictably. This makes it a challenge in a model comparison setting, where multiple statistical models are considered, of which at least some are misspecified. Recent work on self-consistency (SC) provides a promising remedy to this issue, accessible even for empirical data (without ground-truth labels). In this work, we investigate how SC can improve amortized model comparison conceptualized in four different ways. Across two synthetic and two real-world case studies, we find that approaches for model comparison that estimate marginal likelihoods through approximate parameter posteriors consistently outperform methods that directly approximate model evidence or posterior model probabilities. SC training improves robustness when the likelihood is available, even under severe model misspecification. The benefits of SC for methods without access of analytic likelihoods are more limited and inconsistent. Our results suggest practical guidance for reliable amortized Bayesian model comparison: prefer parameter posterior-based methods and augment them with SC training on empirical datasets to mitigate extrapolation bias under model misspecification.
翻译:摊销贝叶斯推断(ABI)通过在统计模型生成的数据上训练神经代理,提供了对后验密度的快速、可扩展的近似。然而,ABI方法对模型误设高度敏感:当观测数据超出训练分布(统计模型的生成范围)时,神经代理可能表现出不可预测的行为。这在模型比较场景中构成挑战,因为该场景涉及多个统计模型,其中至少部分模型存在误设。近期关于自一致性(SC)的研究为这一问题提供了有前景的解决方案,即使对于缺乏真实标签的实证数据也可适用。本文探讨了SC如何提升以四种不同方式概念化的摊销模型比较方法。通过两个合成案例和两个真实世界案例研究,我们发现:通过近似参数后验估计边缘似然的方法,在模型比较中始终优于直接近似模型证据或后验模型概率的方法。当似然函数可获取时,SC训练能提升模型在严重误设下的鲁棒性。对于无法解析获取似然函数的方法,SC的益处则较为有限且不一致。我们的研究结果为可靠的摊销贝叶斯模型比较提供了实用指导:优先选择基于参数后验的方法,并在实证数据集上结合SC训练,以缓解模型误设下的外推偏差。