Forward regression is a crucial methodology for automatically identifying important predictors from a large pool of potential covariates. In contexts with moderate predictor correlation, forward selection techniques can achieve screening consistency. However, this property gradually becomes invalid in the presence of substantially correlated variables, especially in high-dimensional datasets where strong correlations exist among predictors. This dilemma is encountered by other model selection methods in literature as well. To address these challenges, we introduce a novel decorrelated forward (DF) selection framework for generalized mean regression models, including prevalent models, such as linear, logistic, Poisson, and quasi likelihood. The DF selection framework stands out because of its ability to convert generalized mean regression models into linear ones, thus providing a clear interpretation of the forward selection process. It also offers a closed-form expression for forward iteration, to improve practical applicability and efficiency. Theoretically, we establish the screening consistency of DF selection and determine the upper bound of the selected submodel's size. To reduce computational burden, we develop a thresholding DF algorithm that provides a stopping rule for the forward-searching process. Simulations and two real data applications show the outstanding performance of our method compared with some existing model selection methods.
翻译:前向回归是一种从大量潜在协变量中自动识别重要预测因子的关键方法。在预测因子相关性适中的情况下,前向选择技术能够实现筛选一致性。然而,当变量间存在显著相关性时,这一性质逐渐失效,尤其是在预测因子间存在强相关性的高维数据集中。文献中的其他模型选择方法同样面临这一困境。为解决这些挑战,我们提出了一种新颖的去相关前向(DF)选择框架,适用于广义均值回归模型,包括线性回归、逻辑回归、泊松回归和拟似然等常见模型。DF选择框架的突出优势在于能够将广义均值回归模型转化为线性模型,从而为前向选择过程提供清晰的解释。该框架还提供了前向迭代的闭式表达式,以提升实际应用性和效率。在理论上,我们证明了DF选择的筛选一致性,并确定了所选子模型规模的上界。为降低计算负担,我们开发了一种阈值化DF算法,为前向搜索过程提供了停止规则。模拟实验和两个实际数据应用表明,与现有的一些模型选择方法相比,本方法具有优异的性能。