For linear models that may have asymmetric errors, we study variable selection by cross-validation. The data are split into training and validation sets, with the number of observations in the validation set much larger than in the training set. For the model coefficients, the expectile or adaptive LASSO expectile estimators are calculated on the training set. These estimators will be used to calculate the cross-validation mean score (CVS) on the validation set. We show that the model that minimizes CVS is consistent in two cases: when the number of explanatory variables is fixed or when it depends on the number of observations. Monte Carlo simulations confirm the theoretical results and demonstrate the superiority of our estimation method compared to two others in the literature. The usefulness of the CV expectile model selection technique is illustrated by applying it to real data sets.
翻译:针对可能存在非对称误差的线性模型,本研究采用交叉验证方法进行变量选择。将数据划分为训练集与验证集,其中验证集的观测样本量远大于训练集。基于训练集计算模型系数的期望分位数估计量或自适应LASSO期望分位数估计量,并利用这些估计量在验证集上计算交叉验证平均得分(CVS)。我们证明,在两种情况下(解释变量数量固定或随观测样本量变化时),最小化CVS的模型均具有一致性。蒙特卡洛模拟验证了理论结果,并证明本估计方法相较于文献中其他两种方法具有优越性。通过实际数据集的应用,展示了CV期望分位数模型选择技术的实用价值。