In the context of the high-dimensional Gaussian linear regression for ordered variables, we study the variable selection procedure via the minimization of the penalized least-squares criterion. We focus on model selection where the penalty function depends on an unknown multiplicative constant commonly calibrated for prediction. We propose a new proper calibration of this hyperparameter to simultaneously control predictive risk and false discovery rate. We obtain non-asymptotic bounds on the False Discovery Rate with respect to the hyperparameter and we provide an algorithm to calibrate it. This algorithm is based on quantities that can typically be observed in real data applications. The algorithm is validated in an extensive simulation study and is compared with some existing variable selection procedures. Finally, we study an extension of our approach to the case in which an ordering of the variables is not available.
翻译:在有序变量的高维高斯线性回归背景下,我们研究通过最小化惩罚最小二乘准则的变量选择过程。我们聚焦于模型选择问题,其中惩罚函数依赖于通常为预测目的而校准的未知乘性常数。我们提出对该超参数进行新的适当校准,以同时控制预测风险和错误发现率。我们获得了关于超参数的错误发现率的非渐近界,并提供了校准该参数的算法。该算法基于实际数据应用中通常可观测的量。通过广泛的模拟研究验证了该算法,并与现有的若干变量选择方法进行了比较。最后,我们研究了将所提方法推广至变量顺序不可用情况的问题。