The challenge of approximating functions in infinite-dimensional spaces from finite samples is widely regarded as formidable. In this study, we delve into the challenging problem of the numerical approximation of Sobolev-smooth functions defined on probability spaces. Our particular focus centers on the Wasserstein distance function, which serves as a relevant example. In contrast to the existing body of literature focused on approximating efficiently pointwise evaluations, we chart a new course to define functional approximants by adopting three machine learning-based approaches: 1. Solving a finite number of optimal transport problems and computing the corresponding Wasserstein potentials. 2. Employing empirical risk minimization with Tikhonov regularization in Wasserstein Sobolev spaces. 3. Addressing the problem through the saddle point formulation that characterizes the weak form of the Tikhonov functional's Euler-Lagrange equation. As a theoretical contribution, we furnish explicit and quantitative bounds on generalization errors for each of these solutions. In the proofs, we leverage the theory of metric Sobolev spaces and we combine it with techniques of optimal transport, variational calculus, and large deviation bounds. In our numerical implementation, we harness appropriately designed neural networks to serve as basis functions. These networks undergo training using diverse methodologies. This approach allows us to obtain approximating functions that can be rapidly evaluated after training. Consequently, our constructive solutions significantly enhance at equal accuracy the evaluation speed, surpassing that of state-of-the-art methods by several orders of magnitude.
翻译:从有限样本逼近无穷维空间中的函数被广泛认为是极具挑战性的问题。本研究深入探讨了定义在概率空间上的Sobolev光滑函数的数值逼近难题,特别聚焦于作为典型例子的Wasserstein距离函数。与现有文献侧重高效逼近逐点评估不同,我们开辟新路径,通过三种基于机器学习的方法定义函数逼近器:1. 求解有限个最优传输问题并计算相应的Wasserstein势;2. 在Wasserstein Sobolev空间中采用带Tikhonov正则化的经验风险最小化;3. 通过表征Tikhonov泛函欧拉-拉格朗日方程弱形式的鞍点公式处理该问题。作为理论贡献,我们为每种解提供了显式且量化的泛化误差界。在证明中,我们运用度量Sobolev空间理论,并将其与最优传输、变分计算及大偏差界技术相结合。数值实现中,我们采用精心设计的神经网络作为基函数,这些网络通过不同方法进行训练。该方案使我们能获得训练后可快速评估的逼近函数,因此在同等精度下,我们的构造性解将评估速度提升数个数量级,超越现有最优方法。