Efron et al. (2004) introduced least angle regression (LAR) as an algorithm for linear predictions, intended as an alternative to forward selection with connections to penalized regression. However, LAR has remained somewhat of a "black box," where some basic behavioral properties of LAR output are not well understood, including an appropriate termination point for the algorithm. We provide a novel framework for inference with LAR, which also allows LAR to be understood from new perspectives with several newly developed mathematical properties. The LAR algorithm at a data level can viewed as estimating a population counterpart "path" that organizes a response mean along regressor variables which are ordered according to a decreasing series of population "correlation" parameters; such parameters are shown to have meaningful interpretations for explaining variable contributions whereby zero correlations denote unimportant variables. In the output of LAR, estimates of all non-zero population correlations turn out to have independent normal distributions for use in inference, while estimates of zero-valued population correlations have a certain non-normal joint distribution. These properties help to provide a formal rule for stopping the LAR algorithm. While the standard bootstrap for regression can fail for LAR, a modified bootstrap provides a practical and formally justified tool for interpreting the entrance of variables and quantifying uncertainty in estimation. The LAR inference method is studied through simulation and illustrated with data examples.
翻译:Efron等人(2004)提出了最小角回归(LAR)作为一种线性预测算法,旨在作为前向选择的替代方法,并与惩罚回归存在理论关联。然而,LAR在某种程度上仍是一个"黑箱",其输出结果的一些基本行为特性尚未得到充分理解,包括算法的合适终止点。我们提出了一个用于LAR推断的新框架,该框架同时使LAR能够通过若干新发展的数学性质获得新的理解视角。在数据层面上,LAR算法可视为估计一个总体对应"路径",该路径沿着回归变量组织响应均值,这些变量按照总体"相关"参数递减序列排序;此类参数被证明对解释变量贡献具有实际意义,其中零相关表示不重要变量。在LAR输出中,所有非零总体相关参数的估计值均呈现独立正态分布,可用于统计推断;而零值总体相关参数的估计值则具有特定的非正态联合分布。这些性质有助于为LAR算法提供形式化的停止规则。虽然标准回归自助法可能对LAR失效,但改进的自助法为解释变量进入顺序和量化估计不确定性提供了实用且具有理论依据的工具。本文通过模拟研究并借助数据实例说明了LAR推断方法。