In the context of high-dimensional Gaussian linear regression for ordered variables, we study the variable selection procedure via the minimization of the penalized least-squares criterion. We focus on model selection where the penalty function depends on an unknown multiplicative constant commonly calibrated for prediction. We propose a new proper calibration of this hyperparameter to simultaneously control predictive risk and false discovery rate. We obtain non-asymptotic bounds on the False Discovery Rate with respect to the hyperparameter and we provide an algorithm to calibrate it. This algorithm is based on quantities that can typically be observed in real data applications. The algorithm is validated in an extensive simulation study and is compared with several existing variable selection procedures. Finally, we study an extension of our approach to the case in which an ordering of the variables is not available.
翻译:在有序变量的高维高斯线性回归背景下,我们研究了通过最小化惩罚最小二乘准则的变量选择方法。我们重点关注惩罚函数依赖于通常为预测而校准的未知乘法常数的模型选择问题。我们提出对此超参数进行新的适当校准,以同时控制预测风险与错误发现率。我们获得了错误发现率关于该超参数的非渐近界,并提供了一种校准算法。该算法基于实际数据应用中通常可观测的统计量。通过大量模拟研究验证了算法的有效性,并与多种现有变量选择方法进行了比较。最后,我们探讨了该方法在变量排序不可用情况下的扩展方案。