Forward regression is a classical and effective tool for variable screening in ultra-high dimensional linear models, but its standard projection-based implementation can be computationally costly and numerically unstable when predictors are strongly collinear. Motivated by this limitation, we propose an orthogonalized forward regression procedure, implemented recursively through Gram-Schmidt updates, that ranks predictors according to their unique contributions after removing the effects of variables already selected. This approach preserves the interpretability of forward regression while substantially reducing the cost of repeated projections. We further develop a path-based model size selection rule using statistics computed directly from the forward sequence, thereby avoiding cross-validation and extensive tuning. The resulting method is particularly well suited to settings in which the number of predictors far exceeds the sample size and strong collinearity renders the conventional forward fitting ineffective. Theoretically, we derive the optimal convergence rate for the proposed Gram-Schmidt forward regression, thereby extending existing results for projection-based forward regression, and further show that it enjoys sure screening property and variable selection consistency under suitable conditions. Simulation studies and empirical examples demonstrate that it provides a favorable balance among computational efficiency, numerical stability, screening accuracy, and predictive performance, especially in highly correlated ultra-high dimensional settings.
翻译:前向回归是超高维线性模型中一种经典且有效的变量筛选工具,但其基于标准投影的实现方式在预测变量存在强共线性时,计算成本高昂且数值稳定性差。受此限制的驱动,我们提出一种正交化前向回归过程,通过Gram-Schmidt更新递归实现,该过程根据变量在去除已选变量影响后的独特贡献进行排序。该方法在保持前向回归可解释性的同时,显著降低了重复投影的计算成本。我们进一步利用前向序列直接计算的统计量,开发了一种基于路径的模型规模选择规则,避免了交叉验证和大量调参。所得方法特别适用于预测变量数量远超样本量且强共线性导致传统前向拟合失效的情形。理论上,我们推导了所提出的Gram-Schmidt前向回归的最优收敛速度,从而拓展了现有基于投影的前向回归的结果,并进一步证明其在适当条件下具有确定筛选性质和变量选择一致性。模拟研究和实证案例表明,该方法在计算效率、数值稳定性、筛选精度和预测性能之间取得了良好的平衡,尤其是在高度相关的超高维设定中。