Many scientific models are composed of multiple discrete components, and scientists often make heuristic decisions about which components to include. Bayesian inference provides a mathematical framework for systematically selecting model components, but defining prior distributions over model components and developing associated inference schemes has been challenging. We approach this problem in a simulation-based inference framework: We define model priors over candidate components and, from model simulations, train neural networks to infer joint probability distributions over both model components and associated parameters. Our method, simulation-based model inference (SBMI), represents distributions over model components as a conditional mixture of multivariate binary distributions in the Grassmann formalism. SBMI can be applied to any compositional stochastic simulator without requiring likelihood evaluations. We evaluate SBMI on a simple time series model and on two scientific models from neuroscience, and show that it can discover multiple data-consistent model configurations, and that it reveals non-identifiable model components and parameters. SBMI provides a powerful tool for data-driven scientific inquiry which will allow scientists to identify essential model components and make uncertainty-informed modelling decisions.
翻译:许多科学模型由多个离散组件构成,科学家通常依靠启发式方法决定包含哪些组件。贝叶斯推断为系统化选择模型组件提供了数学框架,但定义组件先验分布及开发相应推断方案仍具挑战性。我们在基于仿真的推断框架中解决该问题:定义候选组件上的模型先验,通过模型仿真训练神经网络来联合推断模型组件及其参数的联合概率分布。本方法——基于仿真的模型推断(SBMI)——在格拉斯曼形式体系中,将模型组件上的分布表示为多元二元分布的条件混合。SBMI可应用于任意组合性随机模拟器,无需计算似然函数。我们在简单时间序列模型及两个神经科学科学模型上评估SBMI,结果表明它能发现多种与数据一致的模型构型,并揭示不可辨识的模型组件与参数。SBMI为数据驱动的科学探究提供了强效工具,将助力科学家识别核心模型组件并做出蕴含不确定性的建模决策。