Despite a large and significant body of recent work focused on estimating the out-of-sample risk of regularized models in the high dimensional regime, a theoretical understanding of this problem for non-differentiable penalties such as generalized LASSO and nuclear norm is missing. In this paper we resolve this challenge. We study this problem in the proportional high dimensional regime where both the sample size n and number of features p are large, and n/p and the signal-to-noise ratio (per observation) remain finite. We provide finite sample upper bounds on the expected squared error of leave-one-out cross-validation (LO) in estimating the out-of-sample risk. The theoretical framework presented here provides a solid foundation for elucidating empirical findings that show the accuracy of LO.
翻译:尽管近期大量重要工作集中于高维场景下正则化模型的样本外风险估计,但对于广义LASSO和核范数等不可微惩罚项,该问题的理论理解仍存在空白。本文解决了这一挑战。我们在比例高维框架下研究此问题,其中样本量n与特征数p均较大,且n/p比值与(每次观测的)信噪比保持有限。我们给出了留一交叉验证在估计样本外风险时预期平方误差的有限样本上界。本研究所提出的理论框架为阐明留一交叉验证准确性的实证发现提供了坚实基础。