Simulation-based inference (SBI) methods such as approximate Bayesian computation (ABC), synthetic likelihood, and neural posterior estimation (NPE) rely on simulating statistics to infer parameters of intractable likelihood models. However, such methods are known to yield untrustworthy and misleading inference outcomes under model misspecification, thus hindering their widespread applicability. In this work, we propose the first general approach to handle model misspecification that works across different classes of SBI methods. Leveraging the fact that the choice of statistics determines the degree of misspecification in SBI, we introduce a regularized loss function that penalises those statistics that increase the mismatch between the data and the model. Taking NPE and ABC as use cases, we demonstrate the superior performance of our method on high-dimensional time-series models that are artificially misspecified. We also apply our method to real data from the field of radio propagation where the model is known to be misspecified. We show empirically that the method yields robust inference in misspecified scenarios, whilst still being accurate when the model is well-specified.
翻译:基于仿真的推断方法(如近似贝叶斯计算、合成似然法及神经后验估计)依赖仿真统计量来推断难以求解的似然模型参数。然而,这类方法在模型误设定情况下会产生不可信且具有误导性的推断结果,从而阻碍其广泛适用性。本文提出首个能处理模型误设定问题的通用方法,适用于不同类别的基于仿真的推断方法。我们利用统计量的选择决定了基于仿真的推断中误设定程度这一事实,引入一种正则化损失函数,对加剧数据与模型间失配的统计量施加惩罚。以神经后验估计和近似贝叶斯计算为应用案例,我们在人工误设定的高维时序模型上验证了该方法具有优越性能。同时将所提方法应用于无线电传播领域的实际数据(该场景模型已知存在误设定问题)。实验表明,该方法在误设定场景下能实现鲁棒推断,同时在模型设定正确时仍能保持高精度。