Inference for models with recursively defined likelihoods is computationally demanding, limiting scalability to large datasets. We propose a stabilised weighted subsampling methodology for accelerated inference based on an unbiased estimator of the log-likelihood. By assigning higher sampling probabilities to early observations, the method reduces the effective depth of recursive likelihood evaluations and hence expected computational cost. However, slow decay leads to frequent inclusion of late observations and high computational cost, while overly aggressive decay can substantially inflate estimator variance. We develop a stabilisation framework, underpinned by theoretical results, that restricts the decay of the sampling probabilities to avoid both variance and computational pathologies through principled hyperparameter tuning. We further consider an unbiased subsampling estimator of the log-likelihood gradient, enabling gradient-based inference. The proposed estimators are generic building blocks for subsampling-based inference and can be embedded within frameworks including stochastic optimisation, variational Bayes, and Markov chain Monte Carlo. Applications to conditional volatility models, including standard and threshold generalised autoregressive conditional heteroskedasticity models, demonstrate substantial computational speed-ups while maintaining inferential accuracy. The proposed approach outperforms uniform subsampling and compares favourably with recent stochastic gradient and divide-and-conquer MCMC methods for dependent data.
翻译:递归定义似然的模型推断计算成本高昂,限制了其在大规模数据集上的可扩展性。我们提出一种基于对数似然无偏估计量的稳定加权子采样方法,以实现加速推断。通过为早期观测值分配更高的采样概率,该方法减少了递归似然评估的有效深度,从而降低了预期计算成本。然而,缓慢衰减会导致后期观测值频繁被纳入,增加计算成本;而过快衰减则可能大幅增大估计量方差。我们基于理论结果建立了一个稳定化框架,通过原则性的超参数调整限制采样概率的衰减速率,以避免方差和计算病理学问题。我们还进一步考虑了对数似然梯度的无偏子采样估计量,以支持基于梯度的推断。所提出的估计量是子采样推断的通用构建模块,可嵌入随机优化、变分贝叶斯和马尔可夫链蒙特卡洛等框架中。在条件波动率模型(包括标准广义自回归条件异方差模型和阈值广义自回归条件异方差模型)上的应用表明,该方法在保持推断精度的同时实现了显著的计算加速。该方法优于均匀子采样,并与近期针对依赖数据的随机梯度和分治MCMC方法相比具有竞争力。