This manuscript addresses the problem of approximating an unknown function from point evaluations. When obtaining these point evaluations is costly, minimising the required sample size becomes crucial, and it is unreasonable to reserve a sufficiently large test sample for estimating the approximation accuracy. Therefore, an approximation with a certified quasi-optimality factor is required. This article shows that such an approximation can be obtained when the sought function lies in a \emph{reproducing kernel Hilbert space} (RKHS) and is to be approximated in a finite-dimensional linear subspace. However, selecting the sample points to minimise the quasi-optimality factor requires optimising over an infinite set of points and computing exact inner products in RKHS, which is often infeasible in practice. Extending results from optimal sampling for $L^2$ approximation, the present manuscript proves that random points, drawn independently from the Christoffel sampling distribution associated with $\mcal{V}_d$, can yield a controllable quasi-optimality factor with high probability. Inspired by this result, a novel sampling scheme, coined subspace-informed volume sampling, is introduced and evaluated in numerical experiments, where it outperforms classical i.i.d.\ Christoffel sampling and continuous volume sampling. To reduce the size of such a random sample, an additional greedy subsampling scheme with provable suboptimality bounds is introduced. Our presentation is of independent interest to the community researching the \emph{parametrised background data weak} (PBDW) method, as it offers a simpler interpretation of the method.
翻译:本文研究基于点值估计未知函数的问题。当获取点值代价高昂时,最小化所需样本量至关重要,且预留足够大的测试样本来估计逼近精度是不现实的。因此,需要获得具有可证明拟最优性因子的逼近。本文证明,当目标函数位于再生核希尔伯特空间(RKHS)中且需在有限维线性子空间内逼近时,可获得此类逼近。然而,为最小化拟最优性因子而选择采样点,需在无限点集上进行优化并计算RKHS中的精确内积,这在实际中往往不可行。通过拓展$L^2$逼近中最优采样的相关结果,本文证明从与$\mcal{V}_d$关联的Christoffel采样分布中独立抽取的随机点,能以高概率产生可控的拟最优性因子。受此启发,本文提出一种新颖的采样方案——子空间信息体积采样,并在数值实验中验证其性能优于经典独立同分布Christoffel采样与连续体积采样。为缩减此类随机样本的规模,进一步提出具有可证明次优性界的贪婪子采样方案。本研究对参数化背景数据弱(PBDW)方法研究领域具有独立价值,因其为该提供了更简洁的理论阐释。