We study the problem of variable selection in convex nonparametric least squares (CNLS). Whereas the least absolute shrinkage and selection operator (Lasso) is a popular technique for least squares, its variable selection performance is unknown in CNLS problems. In this work, we investigate the performance of the Lasso CNLS estimator and find out it is usually unable to select variables efficiently. Exploiting the unique structure of the subgradients in CNLS, we develop a structured Lasso by combining $\ell_1$-norm and $\ell_{\infty}$-norm. To improve its predictive performance, we propose a relaxed version of the structured Lasso where we can control the two effects--variable selection and model shrinkage--using an additional tuning parameter. A Monte Carlo study is implemented to verify the finite sample performances of the proposed approaches. In the application of Swedish electricity distribution networks, when the regression model is assumed to be semi-nonparametric, our methods are extended to the doubly penalized CNLS estimators. The results from the simulation and application confirm that the proposed structured Lasso performs favorably, generally leading to sparser and more accurate predictive models, relative to the other variable selection methods in the literature.
翻译:我们研究凸非参数最小二乘(CNLS)中的变量选择问题。尽管最小绝对收缩与选择算子(Lasso)是处理最小二乘问题的常用技术,但其在CNLS问题中的变量选择性能尚不明确。本文考察了Lasso CNLS估计量的性能,发现其通常无法有效选择变量。通过利用CNLS中子梯度的独特结构,我们结合$\ell_1$范数与$\ell_{\infty}$范数,提出了一种结构化Lasso方法。为提升其预测性能,我们进一步提出一种松弛版本的结构化Lasso,其中可通过一个额外的调节参数来控制变量选择与模型收缩两种效应。我们通过蒙特卡洛模拟验证了所提方法在有限样本下的表现。在瑞典电力配送网络的应用中,当回归模型设定为半非参数形式时,我们的方法被扩展为双重惩罚CNLS估计量。模拟与应用结果均表明,相较于文献中其他变量选择方法,所提出的结构化Lasso表现优异,通常能构建更稀疏且预测更准确的模型。