We present the first comprehensive Lean 4 formalization of statistical learning theory (SLT) grounded in empirical process theory. Our en-to-end formal infrastructure implement the missing contents in latest Lean library, including a complete development of Gaussian Lipschitz concentration, Dudley's entropy integral theorem for sub-Gaussian processes, and an application to least-squares (sparse) regression with a sharp rate. The project was carried out using a human-AI collaborative workflow, in which humans design proof strategies and AI agents execute tactical proof construction, leading to the human-verified Lean 4 toolbox for SLT. Beyond implementation, the formalization process exposes and resolves implicit assumptions and missing details in standard SLT textbooks, enforcing a granular, line-by-line understanding of the theory. This work establishes a reusable formal foundation and opens the door for future developments in machine learning theory. The code is provided in https://github.com/YuanheZ/lean-stat-learning-theory.
翻译:我们提出了首个基于经验过程理论的统计学习理论(SLT)的Lean 4完整形式化框架。该端到端形式化基础设施填补了最新Lean库中的缺失内容,包括高斯利普希茨集中性的完整推导、次高斯过程的达德利熵积分定理,以及其在具有尖锐率的(稀疏)最小二乘回归中的应用。该项目采用人机协作的流程实现:人类设计证明策略,AI代理执行战术性证明构造,最终形成了经人类验证的SLT Lean 4工具包。除实现本身外,形式化过程揭示并解决了标准SLT教材中隐含的假设与遗漏细节,强制对理论进行逐行细粒度理解。本工作建立了一个可复用的形式化基础,为机器学习理论的未来发展打开大门。代码详见https://github.com/YuanheZ/lean-stat-learning-theory。