Scientific modeling applications often require estimating a distribution of parameters consistent with a dataset of observations - an inference task also known as source distribution estimation. This problem can be ill-posed, however, since many different source distributions might produce the same distribution of data-consistent simulations. To make a principled choice among many equally valid sources, we propose an approach which targets the maximum entropy distribution, i.e., prioritizes retaining as much uncertainty as possible. Our method is purely sample-based - leveraging the Sliced-Wasserstein distance to measure the discrepancy between the dataset and simulations - and thus suitable for simulators with intractable likelihoods. We benchmark our method on several tasks, and show that it can recover source distributions with substantially higher entropy than recent source estimation methods, without sacrificing the fidelity of the simulations. Finally, to demonstrate the utility of our approach, we infer source distributions for parameters of the Hodgkin-Huxley model from experimental datasets with thousands of single-neuron measurements. In summary, we propose a principled method for inferring source distributions of scientific simulator parameters while retaining as much uncertainty as possible.
翻译:科学建模应用通常需要估计与观测数据集一致的参数分布——这一推理任务也称为源分布估计。然而,该问题可能是不适定的,因为许多不同的源分布可能产生相同的数据一致仿真分布。为在众多同等有效的源中做出有原则的选择,我们提出了一种针对最大熵分布的方法,即优先保留尽可能多的不确定性。我们的方法纯粹基于样本——利用切片-瓦瑟斯坦距离衡量数据集与仿真之间的差异——因此适用于似然函数难以处理的仿真器。我们在多项任务上对方法进行基准测试,表明其能在不牺牲仿真保真度的前提下,恢复出比近期源估计方法熵值显著更高的源分布。最后,为展示该方法的实用性,我们从包含数千个单神经元测量数据的实验数据集中,推断出霍奇金-赫胥黎模型参数的源分布。综上所述,我们提出了一种在尽可能保留不确定性的同时,推断科学仿真器参数源分布的原则性方法。