The challenge of approximating functions in infinite-dimensional spaces from finite samples is widely regarded as formidable. In this study, we delve into the challenging problem of the numerical approximation of Sobolev-smooth functions defined on probability spaces. Our particular focus centers on the Wasserstein distance function, which serves as a relevant example. In contrast to the existing body of literature focused on approximating efficiently pointwise evaluations, we chart a new course to define functional approximants by adopting three machine learning-based approaches: 1. Solving a finite number of optimal transport problems and computing the corresponding Wasserstein potentials. 2. Employing empirical risk minimization with Tikhonov regularization in Wasserstein Sobolev spaces. 3. Addressing the problem through the saddle point formulation that characterizes the weak form of the Tikhonov functional's Euler-Lagrange equation. As a theoretical contribution, we furnish explicit and quantitative bounds on generalization errors for each of these solutions. In the proofs, we leverage the theory of metric Sobolev spaces and we combine it with techniques of optimal transport, variational calculus, and large deviation bounds. In our numerical implementation, we harness appropriately designed neural networks to serve as basis functions. These networks undergo training using diverse methodologies. This approach allows us to obtain approximating functions that can be rapidly evaluated after training. Consequently, our constructive solutions significantly enhance at equal accuracy the evaluation speed, surpassing that of state-of-the-art methods by several orders of magnitude.
翻译:从有限样本逼近无穷维空间中的函数这一挑战被广泛认为是艰巨的。本研究深入探讨了定义在概率空间上的Sobolev光滑函数的数值逼近这一难题。我们特别聚焦于作为典型例子的Wasserstein距离函数。与现有侧重高效点态逼近的文献不同,我们通过采用三种基于机器学习的方法开辟了定义函数逼近的新途径:1. 求解有限个最优输运问题并计算对应的Wasserstein势;2. 在Wasserstein Sobolev空间中采用带Tikhonov正则化的经验风险最小化;3. 通过表征Tikhonov泛函Euler-Lagrange方程弱形式的鞍点公式来解决该问题。作为理论贡献,我们为每种方案提供了泛化误差的显式量化界。在证明中,我们运用了度量Sobolev空间理论,并将其与最优输运、变分计算及大偏差界等技术相结合。在数值实现中,我们采用精心设计的神经网络作为基函数,并通过多样化方法对其进行训练。这种策略使我们能够获得在训练后可快速评估的逼近函数。因此,我们的构造性方案在同等精度下将评估速度提升了数个数量级,显著超越了现有技术水平。