The technique of subsampling has been extensively employed to address the challenges posed by limited computing resources and meet the needs for expedite data analysis. Various subsampling methods have been developed to meet the challenges characterized by a large sample size with a small number of parameters. However, direct applications of these subsampling methods may not be suitable when the dimension is also high and available computing facilities at hand are only able to analyze a subsample of size similar or even smaller than the dimension. In this case, although there is no high-dimensional problem in the full data, the subsample may have a sample size smaller or smaller than the number of parameters, making it a high-dimensional problem. We call this scenario the high-dimensional subsample from low-dimension full data problem. In this paper, we tackle this problem by proposing a novel subsampling-based approach that combines penalty-based dimension reduction and refitted cross-validation. The asymptotic normality of the refitted cross-validation subsample estimator is established, which plays a crucial role in statistical inference. The proposed method demonstrates appealing performance in numerical experiments on simulated data and a real data application.
翻译:子采样技术已被广泛应用于应对有限计算资源带来的挑战,并满足快速数据分析的需求。针对样本量大而参数数量少的情形,已开发出多种子采样方法。然而,当数据维度同样较高,且可用计算设施仅能分析规模与维度相当甚至更小的子样本时,直接应用这些子采样方法可能并不合适。在这种情况下,虽然全数据不存在高维问题,但子样本的样本量可能小于或远小于参数数量,从而形成高维问题。我们将此情形称为"低维全数据的高维子样本问题"。本文通过提出一种新颖的基于子采样的方法来解决该问题,该方法结合了基于惩罚的降维技术和重拟合交叉验证。我们建立了重拟合交叉验证子样本估计量的渐近正态性,这在统计推断中起着关键作用。所提方法在模拟数据和实际数据应用的数值实验中均表现出优越性能。