This paper presents a novel information-theoretic perspective on generalization in machine learning by framing the learning problem within the context of lossy compression and applying finite blocklength analysis. In our approach, the sampling of training data formally corresponds to an encoding process, and the model construction to a decoding process. By leveraging finite blocklength analysis, we derive lower bounds on sample complexity and generalization error for a fixed randomized learning algorithm and its associated optimal sampling strategy. Our bounds explicitly characterize the degree of overfitting of the learning algorithm and the mismatch between its inductive bias and the task as distinct terms. This separation provides a significant advantage over existing frameworks. Additionally, we decompose the overfitting term to show its theoretical connection to existing metrics found in information-theoretic bounds and stability theory, unifying these perspectives under our proposed framework.
翻译:本文通过将学习问题置于有损压缩框架内,并应用有限块长分析,提出了一种关于机器学习泛化性的信息论新视角。在该方法中,训练数据的采样在形式上对应于编码过程,而模型构建则对应于解码过程。通过利用有限块长分析,我们针对固定的随机学习算法及其最优采样策略,推导出样本复杂度和泛化误差的下界。我们的界限将学习算法的过拟合程度及其归纳偏置与任务之间的失配明确刻画为不同的项。这种分离相较于现有框架提供了显著优势。此外,我们对过拟合项进行分解,展示了其在理论上与信息论界限和稳定性理论中现有度量指标的联系,从而将这些视角统一于我们所提出的框架之下。