We propose a new ensemble prediction method, Random Subset Averaging (RSA), tailored for settings with many covariates, particularly in the presence of strong correlations. RSA constructs candidate models via binomial random subset strategy and aggregates their predictions through a two-round weighting scheme, resulting in a structure analogous to a two-layer neural network. All tuning parameters are selected via cross-validation, requiring no prior knowledge of covariate relevance. We establish the asymptotic optimality of RSA under general conditions, allowing the first-round weights to be data-dependent, and demonstrate that RSA achieves a lower finite-sample risk bound under orthogonal design. Simulation studies demonstrate that RSA consistently delivers superior and stable predictive performance across a wide range of sample sizes, dimensional settings, sparsity levels and correlation structures, outperforming conventional model selection and ensemble learning methods. An empirical application to financial return forecasting further illustrates its practical utility.
翻译:本文提出一种新的集成预测方法——随机子集平均法,该方法特别适用于协变量众多且存在强相关性的场景。RSA通过二项随机子集策略构建候选模型,并采用两轮加权方案聚合其预测结果,形成一种类似于两层神经网络的结构。所有调优参数均通过交叉验证选择,无需预先了解协变量的相关性。我们在一般条件下建立了RSA的渐近最优性,允许第一轮权重依赖于数据,并证明在正交设计下RSA能够达到更低的有限样本风险界。模拟研究表明,在不同样本量、维度设置、稀疏度水平和相关结构下,RSA始终提供优越且稳定的预测性能,其表现优于传统的模型选择与集成学习方法。在金融收益率预测的实证应用中,该方法进一步展现了其实用价值。