Many scientific models are composed of multiple discrete components, and scientists often make heuristic decisions about which components to include. Bayesian inference provides a mathematical framework for systematically selecting model components, but defining prior distributions over model components and developing associated inference schemes has been challenging. We approach this problem in a simulation-based inference framework: We define model priors over candidate components and, from model simulations, train neural networks to infer joint probability distributions over both model components and associated parameters. Our method, simulation-based model inference (SBMI), represents distributions over model components as a conditional mixture of multivariate binary distributions in the Grassmann formalism. SBMI can be applied to any compositional stochastic simulator without requiring likelihood evaluations. We evaluate SBMI on a simple time series model and on two scientific models from neuroscience, and show that it can discover multiple data-consistent model configurations, and that it reveals non-identifiable model components and parameters. SBMI provides a powerful tool for data-driven scientific inquiry which will allow scientists to identify essential model components and make uncertainty-informed modelling decisions.
翻译:许多科学模型由多个离散组件构成,科学家通常需要启发式地决定应包含哪些组件。贝叶斯推断为系统选择模型组件提供了数学框架,但定义模型组件的先验分布并开发相应的推断方案一直存在挑战。我们在基于模拟的推断框架中解决该问题:我们定义候选组件的模型先验,并通过模型仿真训练神经网络来推断模型组件及相关参数的联合概率分布。我们的方法——基于模拟的模型推断(SBMI)——在格拉斯曼形式体系中,将模型组件的分布表示为多元二元分布的条件混合。SBMI可应用于任何组合式随机模拟器,且无需进行似然函数评估。我们在简单时间序列模型和两个神经科学领域的科学模型上评估SBMI,结果表明该方法能够发现多个数据一致的模型配置,并揭示不可辨识的模型组件与参数。SBMI为数据驱动的科学研究提供了强大工具,使科学家能够识别关键模型组件并做出基于不确定性的建模决策。