Determining which parameters of a non-linear model could best describe a set of experimental data is a fundamental problem in science and it has gained much traction lately with the rise of complex large-scale simulators (a.k.a. black-box simulators). The likelihood of such models is typically intractable, which is why classical MCMC methods can not be used. Simulation-based inference (SBI) stands out in this context by only requiring a dataset of simulations to train deep generative models capable of approximating the posterior distribution that relates input parameters to a given observation. In this work, we consider a tall data extension in which multiple observations are available and one wishes to leverage their shared information to better infer the parameters of the model. The method we propose is built upon recent developments from the flourishing score-based diffusion literature and allows us to estimate the tall data posterior distribution simply using information from the score network trained on individual observations. We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
翻译:确定非线性模型的哪些参数能够最佳描述一组实验数据是科学中的基本问题,随着复杂大规模模拟器(即黑箱模拟器)的兴起,这一问题近期备受关注。此类模型的似然通常难以处理,因此经典的MCMC方法无法适用。在此背景下,基于模拟的推断(SBI)只需一组模拟数据集即可训练能够近似输入参数与给定观测之间后验分布的深度生成模型。本研究考虑高维数据扩展场景,即存在多个观测值,且希望利用其共享信息更好地推断模型参数。我们所提出的方法建立在近期蓬勃发展的基于分数的扩散文献的最新进展之上,能够仅利用基于单个观测训练的分数网络信息来估计高维数据后验分布。我们通过多种数值实验将所提方法与近期提出的竞争方法进行比较,并证明其在数值稳定性和计算成本方面的优越性。