Learning Lipschitz Operators with respect to Gaussian Measures with Near-Optimal Sample Complexity

Operator learning, the approximation of mappings between infinite-dimensional function spaces using ideas from machine learning, has gained increasing research attention in recent years. Approximate operators, learned from data, hold promise to serve as efficient surrogate models for problems in computational science and engineering, complementing traditional numerical methods. However, despite their empirical success, our understanding of the underpinning mathematical theory is in large part still incomplete. In this paper, we study the approximation of Lipschitz operators in expectation with respect to Gaussian measures. We prove higher Gaussian Sobolev regularity of Lipschitz operators and establish lower and upper bounds on the Hermite polynomial approximation error. We further consider the reconstruction of Lipschitz operators from $m$ arbitrary (adaptive) linear samples. A key finding is the tight characterization of the smallest achievable error for all possible (adaptive) sampling and reconstruction maps in terms of $m$. It is shown that Hermite polynomial approximation is an optimal recovery strategy, but we have the following curse of sample complexity: No method to approximate Lipschitz operators based on finitely many samples can achieve algebraic convergence rates in $m$. On the positive side, we prove that a sufficiently fast spectral decay of the covariance operator of the Gaussian measure guarantees convergence rates which are arbitrarily close to any algebraic rate in the large data limit $m \to \infty$. Finally, we focus on the recovery of Lipschitz operators from finitely many point samples. We consider Christoffel sampling and weighted least-squares approximation, and present an algorithm which provably achieves near-optimal sample complexity.

翻译：算子学习——利用机器学习思想近似无限维函数空间之间的映射——近年来日益受到研究关注。从数据中学习的近似算子有望作为计算科学与工程问题的高效代理模型，补充传统数值方法。然而，尽管其经验上取得成功，我们对支撑其的数学理论的理解在很大程度上仍不完整。本文研究关于高斯测度的Lipschitz算子的期望近似。我们证明了Lipschitz算子的高阶高斯Sobolev正则性，并建立了Hermite多项式逼近误差的上下界。我们进一步考虑从$m$个任意（自适应）线性样本重构Lipschitz算子。一个关键发现是：所有可能（自适应）采样与重构映射的最小可达成误差，其关于$m$的紧致刻画被建立。研究表明Hermite多项式逼近是最优恢复策略，但我们面临以下样本复杂度诅咒：任何基于有限样本近似Lipschitz算子的方法均无法实现关于$m$的代数收敛速率。从积极角度看，我们证明高斯测度协方差算子的充分快速谱衰减可保证收敛速率在大数据极限$m \to \infty$下任意接近任何代数速率。最后，我们聚焦于从有限点样本恢复Lipschitz算子的问题。我们考虑Christoffel采样与加权最小二乘逼近，并提出一种可证明达到近乎最优样本复杂度的算法。