Simulation-based inference (SBI) methods such as approximate Bayesian computation (ABC), synthetic likelihood, and neural posterior estimation (NPE) rely on simulating statistics to infer parameters of intractable likelihood models. However, such methods are known to yield untrustworthy and misleading inference outcomes under model misspecification, thus hindering their widespread applicability. In this work, we propose the first general approach to handle model misspecification that works across different classes of SBI methods. Leveraging the fact that the choice of statistics determines the degree of misspecification in SBI, we introduce a regularized loss function that penalises those statistics that increase the mismatch between the data and the model. Taking NPE and ABC as use cases, we demonstrate the superior performance of our method on high-dimensional time-series models that are artificially misspecified. We also apply our method to real data from the field of radio propagation where the model is known to be misspecified. We show empirically that the method yields robust inference in misspecified scenarios, whilst still being accurate when the model is well-specified.
翻译:基于模拟的推断方法,如近似贝叶斯计算、合成似然法和神经后验估计,依赖模拟统计量来推断难以处理似然模型的参数。然而,此类方法在模型误设定下会产生不可信且具有误导性的推断结果,从而阻碍其广泛应用。本文提出首个适用于不同类别基于模拟推断方法的通用解决方案,以处理模型误设定问题。利用统计量选择决定模型误设定程度这一特性,我们引入正则化损失函数,惩罚那些加剧数据与模型间失配的统计量。以神经后验估计和近似贝叶斯计算为例,我们在人为误设定的高维时间序列模型上验证了该方法卓越的性能。同时,我们将所提方法应用于已知存在模型误设定问题的无线电传播领域真实数据。实验表明,该方法在误设定场景下能实现鲁棒推断,同时在模型正确设定时仍保持准确性。