Statistical learning theory and the Probably Approximately Correct (PAC) criterion are the common approach to mathematical learning theory. PAC is widely used to analyze learning problems and algorithms, and have been studied thoroughly. Uniform worst case bounds on the convergence rate have been well established using, e.g., VC theory or Radamacher complexity. However, in a typical scenario the performance could be much better. In this paper, we consider PAC learning using a somewhat different tradeoff, the error exponent - a well established analysis method in Information Theory - which describes the exponential behavior of the probability that the risk will exceed a certain threshold as function of the sample size. We focus on binary classification and find, under some stability assumptions, an improved distribution dependent error exponent for a wide range of problems, establishing the exponential behavior of the PAC error probability in agnostic learning. Interestingly, under these assumptions, agnostic learning may have the same error exponent as realizable learning. The error exponent criterion can be applied to analyze knowledge distillation, a problem that so far lacks a theoretical analysis.
翻译:统计学习理论与“可能近似正确”(PAC)准则是数学学习理论的常见方法。PAC被广泛用于分析学习问题与算法,并已被深入研究。基于VC理论或Rademacher复杂度等方法,收敛速度的均匀最坏情况界已得到充分建立。然而,在典型场景中,其性能可能远优于此。本文考虑PAC学习的一种略有不同的权衡——误差指数(信息论中一种成熟的分析方法),该指数描述了风险超过某一阈值的概率随样本量变化的指数行为。我们聚焦于二分类问题,并在若干稳定性假设下,为广泛问题找到了改进的、依赖于分布的误差指数,从而建立了不可知学习中PAC误差概率的指数行为。有趣的是,在这些假设下,不可知学习可能具有与可实现学习相同的误差指数。该误差指数准则可应用于分析知识蒸馏——一个迄今缺乏理论分析的问题。