Simulation-based inference (SBI) enables amortized Bayesian inference for simulators with implicit likelihoods. But when we are primarily interested in the quality of predictive simulations, or when the model cannot exactly reproduce the observed data (i.e., is misspecified), targeting the Bayesian posterior may be overly restrictive. Generalized Bayesian Inference (GBI) aims to robustify inference for (misspecified) simulator models, replacing the likelihood-function with a cost function that evaluates the goodness of parameters relative to data. However, GBI methods generally require running multiple simulations to estimate the cost function at each parameter value during inference, making the approach computationally infeasible for even moderately complex simulators. Here, we propose amortized cost estimation (ACE) for GBI to address this challenge: We train a neural network to approximate the cost function, which we define as the expected distance between simulations produced by a parameter and observed data. The trained network can then be used with MCMC to infer GBI posteriors for any observation without running additional simulations. We show that, on several benchmark tasks, ACE accurately predicts cost and provides predictive simulations that are closer to synthetic observations than other SBI methods, especially for misspecified simulators. Finally, we apply ACE to infer parameters of the Hodgkin-Huxley model given real intracellular recordings from the Allen Cell Types Database. ACE identifies better data-matching parameters while being an order of magnitude more simulation-efficient than a standard SBI method. In summary, ACE combines the strengths of SBI methods and GBI to perform robust and simulation-amortized inference for scientific simulators.
翻译:基于模拟的推断(SBI)能够对具有隐式似然性的模拟器实现分摊化贝叶斯推断。但当研究主要关注预测性模拟的质量,或当模型无法精确复现观测数据(即存在设定偏差)时,将贝叶斯后验作为目标可能过于严格。广义贝叶斯推断(GBI)旨在增强(设定错误的)模拟器模型的推断鲁棒性,通过用评估参数相对于数据的质量代价函数替代似然函数。然而,GBI方法通常需要在推断过程中为每个参数值运行多次模拟来估计代价函数,这使得该方法即使对中等复杂度的模拟器也难以在计算上实现。为此,我们提出用于GBI的分摊化代价估计(ACE)方法:训练神经网络逼近代价函数,该函数定义为参数产生的模拟数据与观测数据之间的期望距离。训练后的网络可与MCMC结合,无需额外运行模拟即可为任意观测数据推断GBI后验。我们在多个基准任务上证明,ACE能准确预测代价,并生成比其他SBI方法更接近合成观测数据的预测模拟,尤其适用于设定错误的模拟器。最后,我们将ACE应用于艾伦细胞类型数据库中的真实细胞内记录数据,推断霍奇金-赫胥黎模型的参数。ACE在识别更优数据匹配参数的同时,其模拟效率比标准SBI方法提高了一个数量级。总之,ACE融合了SBI方法与GBI的优势,为科学模拟器实现了鲁棒且模拟分摊化的推断。