The Metropolis-Hastings (MH) algorithm is one of the most widely used Markov Chain Monte Carlo schemes for generating samples from Bayesian posterior distributions. The algorithm is asymptotically exact, flexible and easy to implement. However, in the context of Bayesian inference for large datasets, evaluating the likelihood on the full data for thousands of iterations until convergence can be prohibitively expensive. This paper introduces a new subsample MH algorithm that satisfies detailed balance with respect to the target posterior and utilises control variates to enable exact, efficient Bayesian inference on datasets with large numbers of observations. Through theoretical results, simulation experiments and real-world applications on certain generalised linear models, we demonstrate that our method requires substantially smaller subsamples and is computationally more efficient than the standard MH algorithm and other exact subsample MH algorithms.
翻译:Metropolis-Hastings (MH) 算法是用于从贝叶斯后验分布生成样本时最广泛使用的马尔可夫链蒙特卡罗方案之一。该算法具有渐近精确性、灵活性强且易于实现的特点。然而,在大规模数据集的贝叶斯推断背景下,为达到收敛而需在完整数据集上进行数千次迭代的似然函数评估,其计算成本可能高得令人望而却步。本文提出了一种新的子采样MH算法,该算法满足关于目标后验分布的细致平衡条件,并利用控制变量技术,使得在海量观测数据集上实现精确且高效的贝叶斯推断成为可能。通过理论分析、模拟实验及在特定广义线性模型上的实际应用,我们证明本方法所需的子样本规模显著更小,并且在计算效率上优于标准MH算法及其他精确子采样MH算法。