Residual bootstrap is a classical method for statistical inference in regression settings. With massive data sets becoming increasingly common, there is a demand for computationally efficient alternatives to residual bootstrap. We propose a simple and versatile scalable algorithm called subsampled residual bootstrap (SRB) for generalized linear models (GLMs), a large class of regression models that includes the classical linear regression model as well as other widely used models such as logistic, Poisson and probit regression. We prove consistency and distributional results that establish that the SRB has the same theoretical guarantees under the GLM framework as the classical residual bootstrap, while being computationally much faster. We demonstrate the empirical performance of SRB via simulation studies and a real data analysis of the Forest Covertype data from the UCI Machine Learning Repository.
翻译:残差自助法是回归设定中用于统计推断的经典方法。随着大规模数据集的日益普遍,对计算效率更高的残差自助替代方法的需求日益增长。我们针对广义线性模型提出了一种简单而通用的可扩展算法,称为子抽样残差自助法。广义线性模型是一大类回归模型,包括经典线性回归模型以及其他广泛使用的模型,如逻辑回归、泊松回归和概率单位回归。我们证明了在广义线性模型框架下,子抽样残差自助法与经典残差自助法具有相同的理论保证,同时计算速度显著更快。通过模拟研究以及对UCI机器学习库中森林覆盖类型数据的实际数据分析,我们展示了子抽样残差自助法的实证性能。