Cross Validation for Penalized Quantile Regression with a Case-Weight Adjusted Solution Path

Cross validation is widely used for selecting tuning parameters in regularization methods, but it is computationally intensive in general. To lessen its computational burden, approximation schemes such as generalized approximate cross validation (GACV) are often employed. However, such approximations may not work well when non-smooth loss functions are involved. As a case in point, approximate cross validation schemes for penalized quantile regression do not work well for extreme quantiles. In this paper, we propose a new algorithm to compute the leave-one-out cross validation scores exactly for quantile regression with ridge penalty through a case-weight adjusted solution path. Resorting to the homotopy technique in optimization, we introduce a case weight for each individual data point as a continuous embedding parameter and decrease the weight gradually from one to zero to link the estimators based on the full data and those with a case deleted. This allows us to design a solution path algorithm to compute all leave-one-out estimators very efficiently from the full-data solution. We show that the case-weight adjusted solution path is piecewise linear in the weight parameter, and using the solution path, we examine case influences comprehensively and observe that different modes of case influences emerge, depending on the specified quantiles, data dimensions and penalty parameter. We further illustrate the utility of the proposed algorithm in real-world applications.

翻译：交叉验证在正则化方法中广泛用于选择调优参数，但其计算通常较为密集。为减轻计算负担，常采用广义近似交叉验证（GACV）等近似方案。然而，当涉及非光滑损失函数时，此类近似方法可能效果不佳。以惩罚分位数回归为例，其近似交叉验证方案在极端分位数情况下表现欠佳。本文提出一种新算法，通过样本权重调整的解路径精确计算带岭惩罚的分位数回归的留一交叉验证得分。借助优化中的同伦技术，我们将每个数据点的样本权重作为连续嵌入参数引入，并将权重从1逐渐降至0，从而建立基于完整数据的估计量与删除单个样本后的估计量之间的联系。这使得我们能够设计一种解路径算法，从全数据解出发高效计算所有留一估计量。我们证明样本权重调整的解路径在权重参数上是分段线性的，并利用该解路径全面考察样本影响，发现根据指定的分位数、数据维度和惩罚参数的不同，会呈现多种样本影响模式。我们进一步通过实际应用案例说明了所提算法的实用性。

相关内容

交叉验证

关注 2

交叉验证，有时也称为旋转估计或样本外测试，是用于评估统计结果如何的各种类似模型验证技术中的任何一种分析将概括为一个独立的数据集。它主要用于设置，其目的是预测，和一个想要估计如何准确地一个预测模型在实践中执行。在预测问题中，通常会给模型一个已知数据的数据集，在该数据集上进行训练（训练数据集）以及未知数据（或首次看到的数据）的数据集（根据该数据集测试模型）（称为验证数据集或测试集）。交叉验证的目标是测试模型预测未用于估计数据的新数据的能力，以发现诸如过度拟合或选择偏倚之类的问题，并提供有关如何进行建模的见解。该模型将推广到一个独立的数据集（例如，未知数据集，例如来自实际问题的数据集）。一轮交叉验证涉及分割一个样品的数据到互补的子集，在一个子集执行所述分析（称为训练集），以及验证在另一子集中的分析（称为验证集合或测试集）。为了减少可变性，在大多数方法中，使用不同的分区执行多轮交叉验证，并将验证结果组合（例如取平均值）在各轮中，以估计模型的预测性能。总而言之，交叉验证结合了预测中适用性的度量（平均），以得出模型预测性能的更准确估计。

Query2box: 使用盒嵌入对向量空间中的知识图谱进行推理，Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings

专知会员服务

46+阅读 · 2020年5月11日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日