TorchLean: Formalizing Neural Networks in Lean

Neural networks are increasingly deployed in scientific, safety critical, and mission critical pipelines, yet verification and analysis are often performed outside the programming environment that defines and runs the model. This creates a semantic gap between the executed network and the analyzed artifact: guarantees can depend on implicit conventions about operator semantics, tensor layouts, preprocessing, floating-point behavior, graph transformations, accelerated kernels, and external certificates. We present TorchLean, a unified framework for formalizing, executing, and verifying neural networks in Lean 4. TorchLean treats learned models as executable programs and mathematical objects with a shared semantics for computation, verification, and theorem proving. The framework provides a PyTorch style API for typed tensors, layers, objectives, optimizers, automatic differentiation, and graph programs, with eager and compiled execution paths that lower to a common computation-graph representation. TorchLean supports exact and finite-precision tensor semantics, verified reverse-mode differentiation, interval and affine bound propagation, CROWN/LiRPA style certificate checking, import/export workflows, and CUDA-backed execution through explicit FFI boundaries. It also includes semantic layers for attention and FlashAttention, state-space sequence models, diffusion and sampling processes, probability kernels, reinforcement-learning objectives and Markov decision processes, and self-supervised objectives such as masked autoencoding, JEPA-style predictive views, and variance/correlation-based anti-collapse losses. Together, these components provide a semantic foundation for verified machine learning, where executable neural network artifacts, verification procedures, runtime boundaries, and mathematical claims can be stated and related inside one theorem-proving environment.

翻译：神经网络越来越多地部署在科学、安全关键及任务关键型管线中，但其验证与分析往往在定义和运行模型的编程环境之外进行。这造成了执行网络与被分析构件之间的语义鸿沟：保证可能依赖于关于算子语义、张量布局、预处理、浮点行为、图变换、加速内核及外部证书的隐式约定。我们提出TorchLean——一个在Lean 4中形式化、执行及验证神经网络的统一框架。TorchLean将学习模型视为可执行程序与数学对象，并赋予计算、验证及定理证明共享的语义基础。该框架提供PyTorch风格的API，支持类型化张量、层、目标函数、优化器、自动微分及图程序，具备即时与编译两种执行路径，并降级至统一的计算图表示。TorchLean支持精确与有限精度张量语义、经过验证的反向模式微分、区间与仿射边界传播、CROWN/LiRPA风格的证书检查、导入/导出工作流，以及通过显式FFI边界实现的CUDA后端执行。它还包含注意力机制与FlashAttention的语义层、状态空间序列模型、扩散与采样过程、概率核、强化学习目标与马尔可夫决策过程，以及自监督目标（如掩码自编码、JEPA风格的预测视图及基于方差/相关性的抗坍缩损失）。这些组件共同为经过验证的机器学习提供语义基础，使得可执行神经网络构件、验证过程、运行时边界及数学声明能够在一个定理证明环境中被陈述并相互关联。