Determining which parameters of a non-linear model best describe a set of experimental data is a fundamental problem in science and it has gained much traction lately with the rise of complex large-scale simulators. The likelihood of such models is typically intractable, which is why classical MCMC methods can not be used. Simulation-based inference (SBI) stands out in this context by only requiring a dataset of simulations to train deep generative models capable of approximating the posterior distribution that relates input parameters to a given observation. In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model. The proposed method is built upon recent developments from the flourishing score-based diffusion literature and allows to estimate the tall data posterior distribution, while simply using information from a score network trained for a single context observation. We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
翻译:确定非线性模型的哪些参数能够最佳描述一组实验数据是科学中的一个基本问题,随着复杂大规模仿真器的兴起,该问题近来备受关注。此类模型的似然函数通常难以处理,因此经典的MCMC方法无法适用。仿真推断(SBI)在此背景下脱颖而出,它仅需利用仿真数据集来训练能够近似关联输入参数与给定观测值的后验分布的深度生成模型。在本研究中,我们考虑一种高维数据扩展场景,即利用多个可用观测值来更好地推断模型参数。所提出的方法基于蓬勃发展的基于分数的扩散模型领域的最新进展,能够估计高维数据后验分布,同时仅需利用针对单上下文观测训练得到的分数网络信息。我们在多个数值实验中将本方法与近期提出的竞争性方法进行比较,并证明了其在数值稳定性和计算成本方面的优越性。