Machine learning assisted Bayesian model comparison: learnt harmonic mean estimator

We resurrect the infamous harmonic mean estimator for computing the marginal likelihood (Bayesian evidence) and solve its problematic large variance. The marginal likelihood is a key component of Bayesian model selection to evaluate model posterior probabilities; however, its computation is challenging. The original harmonic mean estimator, first proposed by Newton and Raftery in 1994, involves computing the harmonic mean of the likelihood given samples from the posterior. It was immediately realised that the original estimator can fail catastrophically since its variance can become very large (possibly not finite). A number of variants of the harmonic mean estimator have been proposed to address this issue although none have proven fully satisfactory. We present the \emph{learnt harmonic mean estimator}, a variant of the original estimator that solves its large variance problem. This is achieved by interpreting the harmonic mean estimator as importance sampling and introducing a new target distribution. The new target distribution is learned to approximate the optimal but inaccessible target, while minimising the variance of the resulting estimator. Since the estimator requires samples of the posterior only, it is agnostic to the sampling strategy used. We validate the estimator on a variety of numerical experiments, including a number of pathological examples where the original harmonic mean estimator fails catastrophically. We also consider a cosmological application, where our approach leads to $\sim$ 3 to 6 times more samples than current state-of-the-art techniques in 1/3 of the time. In all cases our learnt harmonic mean estimator is shown to be highly accurate. The estimator is computationally scalable and can be applied to problems of dimension $O(10^3)$ and beyond. Code implementing the learnt harmonic mean estimator is made publicly available

翻译：我们重新审视了用于计算边际似然（贝叶斯证据）的臭名昭著的调和均值估计器，并解决了其方差过大的问题。边际似然是贝叶斯模型选择中评估模型后验概率的关键组成部分，但其计算极具挑战性。最初由Newton和Raftery于1994年提出的原始调和均值估计器，通过计算基于后验样本的似然函数的调和均值来得到估计值。人们很快意识到该原始估计器可能灾难性地失效，因为其方差可能变得非常大（甚至不有限）。为应对此问题，学者们提出了一系列调和均值估计器的变体，但均未达到完全令人满意的效果。我们提出了*学习型调和均值估计器*，这是一种能解决原始估计器方差过大问题的变体。其核心思想是将调和均值估计器解释为重要性采样，并引入一个新的目标分布。该新目标分布通过学习逼近最优但不可达的目标分布，同时最小化所得估计器的方差。由于该估计器仅需后验样本，因此它对采样策略的选择是无关的。我们通过一系列数值实验验证了该估计器的性能，包括多个原始调和均值估计器灾难性失效的病理学案例。我们还考虑了一个宇宙学应用场景，在该场景中，我们的方法仅用当前最先进技术1/3的时间，即可获得多约3至6倍的样本。在所有案例中，我们的学习型调和均值估计器均展现出高度准确性。该估计器在计算上具有可扩展性，可应用于维度高达$O(10^3)$及以上的问题。实现学习型调和均值估计器的代码已公开提供。