Bias Inheritance in Neural-Symbolic Discovery of Constitutive Closures Under Function-Class Mismatch

We investigate the data-driven discovery of constitutive closures in nonlinear reaction-diffusion systems with known governing PDE structures. Our objective is to robustly recover diffusion and reaction laws from spatiotemporal observations while avoiding the common pitfall where low residuals or short-horizon predictions are conflated with physical recovery. We propose a three-stage neural-symbolic framework: (1) learning numerical surrogates under physical constraints using a noise-robust weak-form-driven objective; (2) compressing these surrogates into restricted interpretable symbolic families (e.g., polynomial, rational, and saturation forms); and (3) validating the symbolic closures through explicit forward re-simulation on unseen initial conditions. Extensive numerical experiments reveal two distinct regimes. Under matched-library settings, weak polynomial baselines behave as correctly specified reference estimators, showing that neural surrogates do not uniformly outperform classical bases. Conversely, under function-class mismatch, neural surrogates provide necessary flexibility and can be compressed into compact symbolic laws with minimal rollout degradation. However, we identify a critical "bias inheritance" mechanism where symbolic compression does not automatically repair constitutive bias. Across various observation regimes, the true error of the symbolic closure closely tracks that of the neural surrogate, yielding a bias inheritance ratio near one. These findings demonstrate that the primary bottleneck in neural-symbolic modeling lies in the initial numerical inverse problem rather than the subsequent symbolic compression. We underscore that constitutive claims must be rigorously supported by forward validation rather than residual minimization alone.

翻译：我们研究了在已知偏微分方程结构的非线性反应扩散系统中，基于数据驱动的本构闭合发现。我们的目标是鲁棒地从时空观测数据中恢复扩散与反应定律，同时避免将低残差或短期预测误认为物理可复原性的常见陷阱。我们提出一个三阶段神经符号框架：(1)在物理约束下使用抗噪声弱形式驱动目标学习数值替代模型；(2)将这些替代模型压缩为受限的可解释符号族（例如多项式、有理式和饱和形式）；(3)通过在未见初始条件下进行显式前向重模拟来验证符号闭合。大量数值实验揭示了两种截然不同的情形。在匹配库设置下，弱多项式基线表现为正确指定的参考估计量，表明神经替代模型并非普遍优于经典基函数。相反，在函数类失配下，神经替代模型提供了必要的灵活性，并能够被压缩为紧凑的符号定律且卷积分量退化最小。然而，我们识别出一个关键的“偏差继承”机制：符号压缩并不会自动修复本构偏差。在各种观测条件下，符号闭合的真实误差紧密追随神经替代模型的误差，产生接近一的偏差继承比。这些发现表明，神经符号建模的主要瓶颈在于初始数值反问题，而非后续的符号压缩。我们强调，本构主张必须通过前向验证而非仅依赖残差最小化来严格支撑。