Simulation plays a central role in scientific discovery. In many applications, the bottleneck is no longer running a simulator; it is choosing among large families of plausible simulators, each corresponding to different forward models/hypotheses consistent with observations. Over large model families, classical Bayesian workflows for model selection are impractical. Furthermore, amortized model selection methods typically hard-code a fixed model prior or complexity penalty at training time, requiring users to commit to a particular parsimony assumption before seeing the data. We introduce PRISM, a simulation-based encoder-decoder that infers a joint posterior over both discrete model structures and associated continuous parameters, while enabling test-time control of model complexity via a tunable model prior that the network is conditioned on. We show that PRISM scales to families with combinatorially many (up to billions) of model instantiations on a synthetic symbolic regression task. As a scientific application, we evaluate PRISM on biophysical modeling for diffusion MRI data, showing the ability to perform model selection across several multi-compartment models, on both synthetic and in vivo neuroimaging data.
翻译:仿真在科学发现中扮演着核心角色。在许多应用中,瓶颈已不再是运行仿真器本身,而是在大量合理的仿真器家族中进行选择——每个仿真器对应着与观测数据相符的不同前向模型/假设。面对庞大的模型家族,传统的贝叶斯模型选择流程变得不切实际。此外,摊销式模型选择方法通常在训练时硬编码固定的模型先验或复杂度惩罚,要求用户在观察数据之前就确定特定的简约性假设。我们提出了PRISM,一种基于仿真的编码器-解码器架构,能够推断离散模型结构及其相关连续参数的联合后验分布,同时通过可调节的模型先验(网络以其为条件)实现测试时对模型复杂度的控制。我们证明,在合成符号回归任务中,PRISM能够扩展到包含组合性极强(高达数十亿)模型实例的家族。作为科学应用案例,我们在扩散MRI数据的生物物理建模中评估PRISM,展示了其在合成与活体神经影像数据上对多种多室模型进行模型选择的能力。