We formalize the generalization error bound using Rademacher complexity in the Lean 4 theorem prover. Generalization error quantifies the gap between a learning machine's performance on given training data versus unseen test data, and Rademacher complexity serves as an estimate of this error based on the complexity of learning machines, or hypothesis class. Unlike traditional methods such as PAC learning and VC dimension, Rademacher complexity is applicable across diverse machine learning scenarios including deep learning and kernel methods. We formalize key concepts and theorems, including the empirical and population Rademacher complexities, and establish generalization error bounds through formal proofs of McDiarmid's inequality, Hoeffding's lemma, and symmetrization arguments.
翻译:我们在Lean 4定理证明器中,利用Rademacher复杂度对泛化误差界进行了形式化验证。泛化误差量化了学习机在给定训练数据与未见测试数据上性能之间的差距,而Rademacher复杂度则基于学习机(或假设类)的复杂度对此误差进行估计。与传统的PAC学习和VC维等方法不同,Rademacher复杂度适用于包括深度学习和核方法在内的多种机器学习场景。我们形式化了关键概念与定理,包括经验Rademacher复杂度和总体Rademacher复杂度,并通过McDiarmid不等式、Hoeffding引理及对称化论证的形式化证明,建立了泛化误差界。