The Neural Compiler: Program-to-Network Translation for Hybrid Scientific Machine Learning

Scientific machine learning often requires combining known physics with unknown parameters or correction terms learned from data. Existing approaches either ignore known structure, encode it as a soft penalty, or require hand-written PyTorch code for each equation. We present The Neural Compiler, a system that translates programs written in a first-order Scheme-like expression language into frozen, differentiable PyTorch modules. These modules match the source program to floating-point precision and provide gradients through autograd. In hybrid models, the compiled module encodes known physics exactly while learned components model the unknown remainder. We evaluate the compiler across six experiment domains: Feynman physics equations, Lotka-Volterra dynamics, a damped pendulum, a one-dimensional heat equation, three-dimensional vector mechanics, and compositional generalization. Compiled modules match hand-coded PyTorch implementations numerically for single equations, showing no accuracy loss from compilation. With only 1 to 4 trainable parameters, compiled models recover physical constants to less than 1 percent error in most cases, while standard PINN baselines with more than 8500 parameters show 7 to 93 percent error. Compiled modules also compose with zero error, while neural approximations can accumulate large errors in deep composition chains. The main value of the compiler is not improved accuracy over hand-coded equations, but systematic composability: it generates correct, differentiable modules from symbolic specifications without rewriting each equation by hand. The system supports 51 primitive operations, including vector and matrix algebra, enabling PDE discretizations and hybrid scientific models. This string-in, module-out interface also provides a natural target for large language models that translate scientific descriptions into executable differentiable modules.

翻译：科学机器学习通常需要将已知物理规律与从数据中学习的未知参数或修正项相结合。现有方法或忽略已知结构，或将其编码为软惩罚项，或需为每个方程手工编写PyTorch代码。本文提出神经编译器，该系统可将用一阶Scheme类表达式语言编写的程序翻译为冻结的可微分PyTorch模块。这些模块在浮点精度级别匹配源程序，并通过自动微分提供梯度。在混合模型中，编译后的模块精确编码已知物理规律，而学习组件则负责建模未知余项。我们在六个实验领域评估该编译器：费曼物理方程、洛特卡-沃尔泰拉动力学、阻尼摆、一维热方程、三维矢量力学以及组合泛化能力。对于单一方程，编译模块在数值上匹配手工编写的PyTorch实现，未出现编译精度损失。在仅含1至4个可训练参数的情况下，编译模型在多数案例中对物理常数的误差低于1%，而含超过8500个参数的标准PINN基线方法误差范围达7%至93%。编译模块在组合时保持零误差，而神经网络近似在深层组合链中会累积较大误差。该编译器的主要价值并非提升相对于手工编码方程的精度，而是系统的可组合性：它能从符号化规约生成正确、可微分的模块，无需为每个方程重写代码。系统支持包含向量与矩阵代数在内的51种基本运算，可处理偏微分方程离散化与混合科学模型。这种"字符串输入-模块输出"接口也为将科学描述翻译为可执行微分模块的大语言模型提供了天然目标。