Variable selection comprises an important step in many modern statistical inference procedures. In the regression setting, when estimators cannot shrink irrelevant signals to zero, covariates without relationships to the response often manifest small but non-zero regression coefficients. The ad hoc procedure of discarding variables whose coefficients are smaller than some threshold is often employed in practice. We formally analyze a version of such thresholding procedures and develop a simple thresholding method that consistently estimates the set of relevant variables under mild regularity assumptions. Using this thresholding procedure, we propose a sparse, $\sqrt{n}$-consistent and asymptotically normal estimator whose non-zero elements do not exhibit shrinkage. The performance and applicability of our approach are examined via numerical studies of simulated and real data.
翻译:变量选择是现代统计推断过程中的重要步骤。在回归分析中,当估计量无法将无关信号压缩至零时,与响应变量无关的协变量往往呈现微小但非零的回归系数。实践中常采用丢弃系数小于某阈值的变量的临时性处理方式。本文对此类阈值化程序的一种形式进行系统分析,提出一种在温和正则性假设下能一致估计相关变量集合的简易阈值化方法。基于该阈值程序,我们构建了一个稀疏的、具有$\sqrt{n}$相合性且渐近正态的估计量,其非零元素不产生收缩效应。通过模拟数据与真实数据的数值研究,验证了所提方法的性能与适用性。