Simulation-based inference (SBI) enables amortized Bayesian inference for simulators with implicit likelihoods. But when we are primarily interested in the quality of predictive simulations, or when the model cannot exactly reproduce the observed data (i.e., is misspecified), targeting the Bayesian posterior may be overly restrictive. Generalized Bayesian Inference (GBI) aims to robustify inference for (misspecified) simulator models, replacing the likelihood-function with a cost function that evaluates the goodness of parameters relative to data. However, GBI methods generally require running multiple simulations to estimate the cost function at each parameter value during inference, making the approach computationally infeasible for even moderately complex simulators. Here, we propose amortized cost estimation (ACE) for GBI to address this challenge: We train a neural network to approximate the cost function, which we define as the expected distance between simulations produced by a parameter and observed data. The trained network can then be used with MCMC to infer GBI posteriors for any observation without running additional simulations. We show that, on several benchmark tasks, ACE accurately predicts cost and provides predictive simulations that are closer to synthetic observations than other SBI methods, especially for misspecified simulators. Finally, we apply ACE to infer parameters of the Hodgkin-Huxley model given real intracellular recordings from the Allen Cell Types Database. ACE identifies better data-matching parameters while being an order of magnitude more simulation-efficient than a standard SBI method. In summary, ACE combines the strengths of SBI methods and GBI to perform robust and simulation-amortized inference for scientific simulators.
翻译:基于模拟的推断(SBI)实现了隐式似然模拟器的摊销贝叶斯推断。然而,当我们主要关注预测性模拟的质量,或者模型无法完美复现观测数据(即存在模型设定偏差)时,以贝叶斯后验为目标可能过于严苛。广义贝叶斯推断(GBI)旨在提升(存在设定偏差的)模拟器模型的鲁棒性,它用评估参数相对于数据拟合优度的成本函数替代传统似然函数。然而,GBI方法通常需要在推断过程中对每个参数值运行多次模拟以估计成本函数,这使得即使对中等复杂度的模拟器而言,该方法在计算上也难以实现。本文提出用于GBI的摊销成本估计(ACE)方法以应对这一挑战:我们训练一个神经网络来近似成本函数,该函数定义为参数生成的模拟数据与观测数据之间的期望距离。训练后的网络可随后与马尔可夫链蒙特卡洛方法(MCMC)结合,用于推断任意观测数据的GBI后验分布,而无需额外运行模拟。我们在多个基准测试任务上显示,ACE能准确预测成本,并且相较于其他SBI方法,尤其在存在设定偏差的模拟器上,能提供更接近合成观测数据的预测性模拟。最后,我们应用ACE根据来自艾伦细胞类型数据库的真实细胞内记录推断Hodgkin-Huxley模型的参数。ACE能识别出更优的数据匹配参数,同时其模拟效率比标准SBI方法高出一个数量级。总之,ACE融合了SBI方法与GBI的优势,为科学模拟器提供了兼具鲁棒性与模拟摊销特性的推断框架。