High complexity models are notorious in machine learning for overfitting, a phenomenon in which models well represent data but fail to generalize an underlying data generating process. A typical procedure for circumventing overfitting computes empirical risk on a holdout set and halts once (or flags that/when) it begins to increase. Such practice often helps in outputting a well-generalizing model, but justification for why it works is primarily heuristic. We discuss the overfitting problem and explain why standard asymptotic and concentration results do not hold for evaluation with training data. We then proceed to introduce and argue for a hypothesis test by means of which both model performance may be evaluated using training data, and overfitting quantitatively defined and detected. We rely on said concentration bounds which guarantee that empirical means should, with high probability, approximate their true mean to conclude that they should approximate each other. We stipulate conditions under which this test is valid, describe how the test may be used for identifying overfitting, articulate a further nuance according to which distributional shift may be flagged, and highlight an alternative notion of learning which usefully captures generalization in the absence of uniform PAC guarantees.
翻译:高复杂度模型在机器学习中因过拟合而臭名昭著,即模型能良好表征数据却无法泛化至潜在的数据生成过程。规避过拟合的典型流程是在留存集上计算经验风险,并在该风险开始上升时立即终止(或标记该时刻/阶段)。这种做法常有助于输出泛化良好的模型,但其有效性的论证主要基于启发式经验。我们讨论了过拟合问题,并解释了为何标准渐近性与集中性结论不能用于训练数据的评估。随后,我们提出并论证了一种假设检验方法,该方法既能利用训练数据评估模型性能,又能定量定义和检测过拟合。我们依赖于所述集中性界——这些界保证了经验均值应以高概率近似其真实均值——进而推断这些经验均值应彼此近似。我们规定了该检验有效的条件,描述了如何用其识别过拟合,阐述了用于标记分布偏移的进一步细微之处,并强调了在缺乏统一PAC保证情况下能有效捕捉泛化性的另一种学习概念。