Recent advances in machine learning theory showed that interpolation to noisy samples using over-parameterized machine learning algorithms always leads to inconsistency. However, this work surprisingly discovers that interpolated machine learning can exhibit benign overfitting and consistency when using physics-informed learning for supervised tasks governed by partial differential equations (PDEs) describing laws of physics. An analysis provides an asymptotic Sobolev norm learning curve for kernel ridge(less) regression addressing linear inverse problems involving elliptic PDEs. The results reveal that the PDE operators can stabilize variance and lead to benign overfitting for fixed-dimensional problems, contrasting standard regression settings. The impact of various inductive biases introduced by minimizing different Sobolev norms as implicit regularization is also examined. Notably, the convergence rate is independent of the specific (smooth) inductive bias for both ridge and ridgeless regression. For regularized least squares estimators, all (smooth enough) inductive biases can achieve optimal convergence rates when the regularization parameter is properly chosen. The smoothness requirement recovers a condition previously found in the Bayesian setting and extends conclusions to minimum norm interpolation estimators.
翻译:机器学习理论的最新进展表明,使用过参数化机器学习算法对含噪样本进行插值总会导致不一致性。然而,本研究令人惊讶地发现,当使用物理信息学习处理由描述物理定律的偏微分方程(PDEs)控制的监督任务时,插值机器学习能够展现出良性过拟合与一致性。分析为涉及椭圆型PDE的线性逆问题提供了核岭(无)回归的渐近Sobolev范数学习曲线。结果表明,与标准回归设定不同,PDE算子能够稳定方差并导致固定维数问题中的良性过拟合。本文还考察了通过最小化不同Sobolev范数作为隐式正则化所引入的各类归纳偏置的影响。值得注意的是,对于岭回归和无岭回归,收敛速率均与具体的(平滑)归纳偏置无关。对于正则化最小二乘估计量,当正则化参数选择恰当时,所有(足够平滑的)归纳偏置均能达到最优收敛速率。平滑性要求恢复了贝叶斯设定中先前发现的条件,并将结论推广至最小范数插值估计量。