Subsampling-based Markov chain Monte Carlo (MCMC) algorithms aim to accelerate Bayesian inference by evaluating the likelihood using only a subset of the data at each iteration. However, in many standard tall-data applications, individual likelihood contributions are inexpensive to evaluate and the resulting reductions in actual computing time are often substantially smaller than the nominal reduction in data size due to computational overhead. We study a different computational regime arising in frequency-domain inference for continuous-time processes observed at equally spaced discrete time points. This gives rise to aliasing, whereby each contribution to the Whittle likelihood requires summation over shifted frequency components, unlike standard discrete-time spectral settings where spectral evaluations do not require such summation. We demonstrate that this structure makes subsampling MCMC, a subsampling-based MCMC approach that estimates the log-likelihood using data subsampling and efficient control variates, particularly effective for reducing computational cost. We illustrate the approach for Bayesian frequency-domain inference in discretely observed continuous-time autoregressive moving average models driven by finite second-moment Lévy processes.
翻译:基于子采样的马尔可夫链蒙特卡洛(MCMC)算法旨在通过每次迭代仅使用数据子集评估似然函数来加速贝叶斯推断。然而,在许多标准的大规模数据应用中,单个似然贡献的计算成本较低,且由于计算开销,实际计算时间的减少量通常远小于数据量的名义缩减量。我们研究了一种不同的计算场景,该场景出现在等间距离散时间点观测的连续时间过程的频域推断中。该场景引发了混叠现象:与无需求和的标准离散时间谱设定不同,Whittle似然的每个贡献项需要对偏移的频率分量进行求和。我们证明,这种结构使得基于子采样的MCMC方法——一种通过数据子采样和高效控制变量估计对数似然的子采样MCMC方法——在降低计算成本方面尤为有效。我们以有限二阶矩Lévy过程驱动的离散观测连续时间自回归移动平均模型为例,展示了该方法在贝叶斯频域推断中的应用。