Weighted least-squares approximation with determinantal point processes and generalized volume sampling

from arxiv, In this second version, conjecture (13) on DPP and (16) on volume sampling have been modified, including a convexity requirement. Proofs of propositions 5.4 and 5.12 have been modified accordingly. Remarks 5.5 and 5.6 have been added to discuss alternatives to conjecture (13) on DPP

We consider the problem of approximating a function from $L^2$ by an element of a given $m$-dimensional space $V_m$, associated with some feature map $\varphi$, using evaluations of the function at random points $x_1,\dots,x_n$. After recalling some results on optimal weighted least-squares using independent and identically distributed points, we consider weighted least-squares using projection determinantal point processes (DPP) or volume sampling. These distributions introduce dependence between the points that promotes diversity in the selected features $\varphi(x_i)$. We first provide a generalized version of volume-rescaled sampling yielding quasi-optimality results in expectation with a number of samples $n = O(m\log(m))$, that means that the expected $L^2$ error is bounded by a constant times the best approximation error in $L^2$. Also, further assuming that the function is in some normed vector space $H$ continuously embedded in $L^2$, we further prove that the approximation is almost surely bounded by the best approximation error measured in the $H$-norm. This includes the cases of functions from $L^\infty$ or reproducing kernel Hilbert spaces. Finally, we present an alternative strategy consisting in using independent repetitions of projection DPP (or volume sampling), yielding similar error bounds as with i.i.d. or volume sampling, but in practice with a much lower number of samples. Numerical experiments illustrate the performance of the different strategies.

翻译：我们考虑利用随机点$x_1,\dots,x_n$处的函数估值，从$L^2$空间逼近给定$m$维空间$V_m$（与某个特征映射$\varphi$相关）中的元素的问题。在回顾独立同分布点下最优加权最小二乘的相关结论后，我们研究了基于投影行列式点过程（DPP）或体积采样的加权最小二乘方法。这些分布引入点之间的依赖性，促进了所选特征$\varphi(x_i)$的多样性。首先，我们提出广义体积重缩放采样方法，在样本数$n = O(m\log(m))$下实现了期望意义上的准最优性，即期望$L^2$误差受限于$L^2$最佳逼近误差的常数倍。进一步假设函数属于连续嵌入$L^2$的赋范向量空间$H$，我们证明该逼近几乎必然受限于$H$-范数度量的最佳逼近误差。这涵盖了来自$L^\infty$或再生核希尔伯特空间的函数情形。最后，我们提出一种替代策略：使用投影DPP（或体积采样）的独立重复实现，其误差界与独立同分布采样或体积采样类似，但实际所需样本数大幅降低。数值实验展示了不同策略的性能。