We introduce a new framework for dimension reduction in the context of high-dimensional regression. Our proposal is to aggregate an ensemble of random projections, which have been carefully chosen based on the empirical regression performance after being applied to the covariates. More precisely, we consider disjoint groups of independent random projections, apply a base regression method after each projection, and retain the projection in each group based on the empirical performance. We aggregate the selected projections by taking the singular value decomposition of their empirical average and then output the leading order singular vectors. A particularly appealing aspect of our approach is that the singular values provide a measure of the relative importance of the corresponding projection directions, which can be used to select the final projection dimension. We investigate in detail (and provide default recommendations for) various aspects of our general framework, including the projection distribution and the base regression method, as well as the number of random projections used. Additionally, we investigate the possibility of further reducing the dimension by applying our algorithm twice in cases where projection dimension recommended in the initial application is too large. Our theoretical results show that the error of our algorithm stabilises as the number of groups of projections increases. We demonstrate the excellent empirical performance of our proposal in a large numerical study using simulated and real data.
翻译:本文提出了一种新的高维回归降维框架。我们的方法旨在聚合一组经过精心选择的随机投影,这些投影的选择基于其在应用于协变量后的经验回归性能。具体而言,我们考虑多个独立的随机投影组,在每个投影后应用基础回归方法,并根据经验性能保留每组中的最优投影。我们通过对所选投影的经验平均值进行奇异值分解来聚合这些投影,并输出其主导阶奇异向量。该方法一个特别吸引人的特点是,奇异值能够提供对应投影方向相对重要性的度量,可用于选择最终的投影维度。我们详细研究(并提供默认建议)了该通用框架的各个方面,包括投影分布、基础回归方法以及所用随机投影的数量。此外,我们还探讨了在初始应用推荐的投影维度过大时,通过两次应用本算法进一步降维的可能性。理论结果表明,随着投影组数量的增加,算法误差趋于稳定。我们通过包含模拟数据和真实数据的大规模数值研究,证明了所提方法具有优异的经验性能。