Is Cross-Validation the Gold Standard to Evaluate Model Performance?

Cross-Validation (CV) is the default choice for evaluating the performance of machine learning models. Despite its wide usage, their statistical benefits have remained half-understood, especially in challenging nonparametric regimes. In this paper we fill in this gap and show that in fact, for a wide spectrum of models, CV does not statistically outperform the simple "plug-in" approach where one reuses training data for testing evaluation. Specifically, in terms of both the asymptotic bias and coverage accuracy of the associated interval for out-of-sample evaluation, $K$-fold CV provably cannot outperform plug-in regardless of the rate at which the parametric or nonparametric models converge. Leave-one-out CV can have a smaller bias as compared to plug-in; however, this bias improvement is negligible compared to the variability of the evaluation, and in some important cases leave-one-out again does not outperform plug-in once this variability is taken into account. We obtain our theoretical comparisons via a novel higher-order Taylor analysis that allows us to derive necessary conditions for limit theorems of testing evaluations, which applies to model classes that are not amenable to previously known sufficient conditions. Our numerical results demonstrate that plug-in performs indeed no worse than CV across a wide range of examples.

翻译：交叉验证（CV）是评估机器学习模型性能的默认选择。尽管其应用广泛，但其统计优势仍未得到充分理解，尤其是在具有挑战性的非参数体系中。本文填补了这一空白，并证明事实上，对于广泛的模型谱系，CV在统计上并未优于简单的"插件"方法——即在测试评估中重复使用训练数据。具体而言，在样本外评估的渐近偏差和相关区间的覆盖精度方面，无论参数或非参数模型的收敛速率如何，$K$折CV在理论上都无法超越插件法。留一法CV相较于插件法可能具有更小的偏差；然而，这种偏差改进与评估的变异性相比微不足道，并且在某些重要情况下，一旦考虑这种变异性，留一法同样无法超越插件法。我们通过一种新颖的高阶泰勒分析获得理论比较结果，该分析使我们能够推导测试评估极限定理的必要条件，适用于先前已知充分条件无法处理的模型类别。我们的数值结果表明，在广泛的示例中，插件法的表现确实不逊于CV。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日