It is widely known that the performance of Markov chain Monte Carlo (MCMC) can degrade quickly when targeting computationally expensive posterior distributions, such as when the sample size is large. This has motivated the search for MCMC variants that scale well to large datasets. One popular general approach has been to look at only a subsample of the data at every step. In this note, we point out that well-known MCMC convergence results often imply that these ``subsampling'' MCMC algorithms cannot greatly improve performance. We apply these abstract results to realistic statistical problems and proposed algorithms, and also discuss some design principles suggested by the results. Finally, we develop estimates for the singular values of random matrices bounds that may be of independent interest.
翻译:众所周知,当目标后验分布计算成本高昂时(例如样本量较大),马尔可夫链蒙特卡罗(MCMC)的性能会迅速下降。这促使人们寻找能够很好扩展到大型数据集的MCMC变体。一种流行的通用方法是在每一步仅查看数据的子样本。在本笔记中,我们指出,众所周知的MCMC收敛结果通常意味着这些“子采样”MCMC算法无法显著提高性能。我们将这些抽象结果应用于现实的统计问题和提出的算法,并讨论了结果所暗示的一些设计原则。最后,我们推导了随机矩阵奇异值的估计界,这些结果可能具有独立的研究价值。