Cross-validation is the standard approach for tuning parameter selection in many non-parametric regression problems. However its use is less common in change-point regression, perhaps as its prediction error-based criterion may appear to permit small spurious changes and hence be less well-suited to estimation of the number and location of change-points. We show that in fact the problems of cross-validation with squared error loss are more severe and can lead to systematic under- or over-estimation of the number of change-points, and highly suboptimal estimation of the mean function in simple settings where changes are easily detectable. We propose two simple approaches to remedy these issues, the first involving the use of absolute error rather than squared error loss, and the second involving modifying the holdout sets used. For the latter, we provide conditions that permit consistent estimation of the number of change-points for a general change-point estimation procedure. We show these conditions are satisfied for optimal partitioning using new results on its performance when supplied with the incorrect number of change-points. Numerical experiments show that the absolute error approach in particular is competitive with common change-point methods using classical tuning parameter choices when error distributions are well-specified, but can substantially outperform these in misspecified models. An implementation of our methodology is available in the R package crossvalidationCP on CRAN.
翻译:交叉验证是许多非参数回归问题中调参选择的标准方法。然而,在变点回归中其使用较少,或许因其基于预测误差的准则可能允许微小的虚假变化,从而不太适合估计变点的数量和位置。我们表明,实际上平方误差损失下的交叉验证问题更为严重,可能导致系统性地低估或高估变点数量,并在变化易于检测的简单设定中导致均值函数的高度次优估计。我们提出两种简单方法来补救这些问题:第一种涉及使用绝对误差而非平方误差损失,第二种涉及修改留出集。对于后者,我们提供了允许在一般变点估计流程中一致估计变点数量的条件。我们通过关于其在提供错误数量变点时性能的新结果,证明这些条件满足最优分割。数值实验表明,绝对误差方法在误差分布良好设定时,尤其能够与使用经典调参选择的常见变点方法竞争,但在错误设定模型中可以显著优于这些方法。我们的方法实现可在CRAN上的R包crossvalidationCP中获得。