In the context of the high-dimensional Gaussian linear regression for ordered variables, we study the variable selection procedure via the minimization of the penalized least-squares criterion. We focus on model selection where the penalty function depends on an unknown multiplicative constant commonly calibrated for prediction. We propose a new proper calibration of this hyperparameter to simultaneously control predictive risk and false discovery rate. We obtain non-asymptotic theoretical bounds on the False Discovery Rate with respect to the hyperparameter and we provide an algorithm to calibrate it. It is based on completely observable quantities in view of applications. Our algorithm is validated by an extensive simulation study and is compared with some existing variable selection procedures. Finally, we propose a study to generalize our approach in complete variable selection.
翻译:在有序变量的高维高斯线性回归背景下,我们研究了通过最小化惩罚最小二乘准则进行变量选择的过程。我们关注模型选择,其中惩罚函数依赖于一个未知的乘性常数,该常数通常为预测目的而校准。我们提出了一种针对该超参数的新型恰当校准方法,以同时控制预测风险与错误发现率。我们给出了关于错误发现率相对于超参数的非渐近理论界,并提供了一种校准算法。该算法基于完全可观测的量,便于实际应用。我们的算法通过广泛的模拟研究得到验证,并与若干现有变量选择方法进行了比较。最后,我们提出了一项研究,旨在将我们的方法推广至完全变量选择场景。