Identifying the parameters of a non-linear model that best explain observed data is a core task across scientific fields. When such models rely on complex simulators, evaluating the likelihood is typically intractable, making traditional inference methods such as MCMC inapplicable. Simulation-based inference (SBI) addresses this by training deep generative models to approximate the posterior distribution over parameters using simulated data. In this work, we consider the tall data setting, where multiple independent observations provide additional information, allowing sharper posteriors and improved parameter identifiability. Building on the flourishing score-based diffusion literature, F-NPSE (Geffner et al., 2023) estimates the tall data posterior by composing individual scores from a neural network trained only for a single context observation. This enables more flexible and simulation-efficient inference than alternative approaches for tall datasets in SBI. However, it relies on costly Langevin dynamics during sampling. We propose a new algorithm that eliminates the need for Langevin steps by explicitly approximating the diffusion process of the tall data posterior. Our method retains the advantages of compositional score-based inference while being significantly faster and more stable than F-NPSE. We demonstrate its improved performance on toy problems and standard SBI benchmarks, and showcase its scalability by applying it to a complex real-world model from computational neuroscience.
翻译:识别能够最佳解释观测数据的非线性模型参数是科学领域的核心任务。当此类模型依赖于复杂仿真器时,评估似然函数通常难以处理,使得传统推断方法(如MCMC)无法适用。基于仿真的推断(SBI)通过训练深度生成模型来利用仿真数据近似参数的后验分布,从而解决这一问题。在本研究中,我们考虑高维数据场景,其中多个独立观测提供了额外信息,可实现更尖锐的后验分布和更强的参数可辨识性。基于蓬勃发展的基于分数的扩散模型文献,F-NPSE(Geffner等人,2023)通过组合来自仅针对单次上下文观测训练的神经网络的个体分数,来估计高维数据后验。相较于SBI中处理高维数据集的其他方法,这实现了更灵活且仿真效率更高的推断。然而,该方法在采样过程中依赖于计算代价高昂的朗之万动力学。我们提出一种新算法,通过显式近似高维数据后验的扩散过程,消除了对朗之万步骤的需求。我们的方法保留了基于分数的组合推断的优势,同时比F-NPSE显著更快、更稳定。我们在玩具问题和标准SBI基准测试中展示了其改进的性能,并通过将其应用于计算神经科学中的复杂现实世界模型,证明了其可扩展性。