We consider the problem of approximating a function from $L^2$ by an element of a given $m$-dimensional space $V_m$, associated with some feature map $\varphi$, using evaluations of the function at random points $x_1,\dots,x_n$. After recalling some results on optimal weighted least-squares using independent and identically distributed points, we consider weighted least-squares using projection determinantal point processes (DPP) or volume sampling. These distributions introduce dependence between the points that promotes diversity in the selected features $\varphi(x_i)$. We first provide a generalized version of volume-rescaled sampling yielding quasi-optimality results in expectation with a number of samples $n = O(m\log(m))$, that means that the expected $L^2$ error is bounded by a constant times the best approximation error in $L^2$. Also, further assuming that the function is in some normed vector space $H$ continuously embedded in $L^2$, we further prove that the approximation is almost surely bounded by the best approximation error measured in the $H$-norm. This includes the cases of functions from $L^\infty$ or reproducing kernel Hilbert spaces. Finally, we present an alternative strategy consisting in using independent repetitions of projection DPP (or volume sampling), yielding similar error bounds as with i.i.d. or volume sampling, but in practice with a much lower number of samples. Numerical experiments illustrate the performance of the different strategies.
翻译:我们考虑利用函数在随机点$x_1,\dots,x_n$处的取值,从$L^2$空间中逼近一个函数的问题,该逼近由给定$m$维空间$V_m$(关联特征映射$\varphi$)中的元素实现。在回顾基于独立同分布点的最优加权最小二乘结果后,我们研究了使用投影行列式点过程(DPP)或体积采样的加权最小二乘方法。这些分布引入点之间的依赖性,促进所选特征$\varphi(x_i)$的多样性。首先,我们给出体积重缩放采样的广义版本,该版本在样本数$n = O(m\log(m))$条件下具有期望准最优性,即期望$L^2$误差以常数倍于$L^2$最佳逼近误差为界。此外,若进一步假设函数属于连续嵌入$L^2$的赋范向量空间$H$,我们证明逼近结果几乎必然以$H$范数度量的最佳逼近误差为界。这涵盖了$L^\infty$函数或再生核希尔伯特空间等情形。最后,我们提出替代策略——使用投影DPP(或体积采样)的独立重复,该方法可获得与独立同分布采样或体积采样相似的误差界,但实际所需样本数大幅降低。数值实验验证了不同策略的性能。