Cross-validation techniques for risk estimation and model selection are widely used in statistics and machine learning. However, the understanding of the theoretical properties of learning via model selection with cross-validation risk estimation is quite low in face of its widespread use. In this context, this paper presents learning via model selection with cross-validation risk estimation as a general systematic learning framework within classical statistical learning theory and establishes distribution-free deviation bounds in terms of VC dimension, giving detailed proofs of the results and considering both bounded and unbounded loss functions. We also deduce conditions under which the deviation bounds of learning via model selection are tighter than that of learning via empirical risk minimization in the whole hypotheses space, supporting the better performance of model selection frameworks observed empirically in some instances.
翻译:交叉验证技术用于风险估计和模型选择,在统计学和机器学习中被广泛采用。然而,尽管其应用普遍,但关于通过交叉验证风险估计进行模型选择学习的理论性质理解仍相当有限。在此背景下,本文在经典统计学习理论框架内,将基于交叉验证风险估计的模型选择学习确立为一种通用系统化学习框架,并以VC维为基础建立了无分布偏差界,给出了结果的详细证明,同时考虑了有界和无界损失函数。我们还推导了使得模型选择学习的偏差界比整个假设空间上经验风险最小化学习的偏差界更紧凑的条件,从而佐证了在某些实例中经验观察到的模型选择框架性能更优的现象。