Model selection by cross-validation in an expectile linear regression

For linear models that may have asymmetric errors, we study variable selection by cross-validation. The data are split into training and validation sets, with the number of observations in the validation set much larger than in the training set. For the model coefficients, the expectile or adaptive LASSO expectile estimators are calculated on the training set. These estimators will be used to calculate the cross-validation mean score (CVS) on the validation set. We show that the model that minimizes CVS is consistent in two cases: when the number of explanatory variables is fixed or when it depends on the number of observations. Monte Carlo simulations confirm the theoretical results and demonstrate the superiority of our estimation method compared to two others in the literature. The usefulness of the CV expectile model selection technique is illustrated by applying it to real data sets.

翻译：针对可能存在非对称误差的线性模型，本研究采用交叉验证方法进行变量选择。将数据划分为训练集与验证集，其中验证集的观测样本量远大于训练集。基于训练集计算模型系数的期望分位数估计量或自适应LASSO期望分位数估计量，并利用这些估计量在验证集上计算交叉验证平均得分（CVS）。我们证明，在两种情况下（解释变量数量固定或随观测样本量变化时），最小化CVS的模型均具有一致性。蒙特卡洛模拟验证了理论结果，并证明本估计方法相较于文献中其他两种方法具有优越性。通过实际数据集的应用，展示了CV期望分位数模型选择技术的实用价值。

相关内容

交叉验证

关注 2

交叉验证，有时也称为旋转估计或样本外测试，是用于评估统计结果如何的各种类似模型验证技术中的任何一种分析将概括为一个独立的数据集。它主要用于设置，其目的是预测，和一个想要估计如何准确地一个预测模型在实践中执行。在预测问题中，通常会给模型一个已知数据的数据集，在该数据集上进行训练（训练数据集）以及未知数据（或首次看到的数据）的数据集（根据该数据集测试模型）（称为验证数据集或测试集）。交叉验证的目标是测试模型预测未用于估计数据的新数据的能力，以发现诸如过度拟合或选择偏倚之类的问题，并提供有关如何进行建模的见解。该模型将推广到一个独立的数据集（例如，未知数据集，例如来自实际问题的数据集）。一轮交叉验证涉及分割一个样品的数据到互补的子集，在一个子集执行所述分析（称为训练集），以及验证在另一子集中的分析（称为验证集合或测试集）。为了减少可变性，在大多数方法中，使用不同的分区执行多轮交叉验证，并将验证结果组合（例如取平均值）在各轮中，以估计模型的预测性能。总而言之，交叉验证结合了预测中适用性的度量（平均），以得出模型预测性能的更准确估计。

基于因果推断的推荐系统去偏研究

专知会员服务

21+阅读 · 2024年11月10日

大模型如何决策？Google伯克利MIT等最新《基础模型决策:问题、方法和机会》论文，详述序列决策与大语言模型的技术交叉

专知会员服务

98+阅读 · 2023年3月10日

【ACMMM2021】通用近似交叉验证的模型选择：监督、半监督与比对学习

专知会员服务

16+阅读 · 2021年10月10日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日