We investigate the limiting behavior of discrete determinantal point processes (DPPs) towards continuous DPPs when the size of the set to sample from goes to infinity. We propose a non-asymptotic characterization of this limit in terms of the concentration of statistics associated to these processes, which we refer to as "weak coherency". This allows to translate statistical guarantees from the limiting process to the original, discrete one. Our main result describes sufficient conditions for weak coherency to hold. In particular, our study encompasses settings where both the kernel of the continuous process and its underlying space are inaccessible, or when the discrete marginal kernel is a noisy version of its continuous counterpart. We illustrate our theory on several examples. We prove that a discrete multivariate orthogonal polynomial ensemble can be used to produce coresets strictly smaller than independent sampling for the same error. We propose a process achieving repulsive sampling on an unknown manifold from a set of points sampled from an unknown density. Finally, we show that continuous DPPs can be obtained as limits on random graphs with Bernoulli edges, even when only observing the graph structure. We obtain interesting byproduct results along the way.
翻译:本文研究了当采样集合尺寸趋于无穷时,离散确定性点过程(DPPs)向连续DPPs的极限行为。我们提出了一种基于过程相关统计量集中性的非渐近极限刻画方法,称之为"弱相干性"。该方法使得极限过程的统计保证能够转化为原始离散过程的统计保证。我们的主要结果描述了弱相干性成立的充分条件。特别地,本研究涵盖以下两种情形:连续过程核函数及其底层空间均不可访问的情形,以及离散边缘核函数是连续对应核函数噪声版本的情形。我们在若干示例中验证了理论结果:证明了离散多元正交多项式系综可用于生成比独立采样更小的核心集,同时保持相同误差水平;提出了在未知流形上从未知密度点集中实现排斥性采样的过程;最后证明了即使仅观测图结构,连续DPPs仍可作为伯努利边随机图的极限而获得。研究过程中我们还得到了若干有趣的副产品结果。