Quadratic Speedups in Parallel Sampling from Determinantal Distributions

We study the problem of parallelizing sampling from distributions related to determinants: symmetric, nonsymmetric, and partition-constrained determinantal point processes, as well as planar perfect matchings. For these distributions, the partition function, a.k.a. the count, can be obtained via matrix determinants, a highly parallelizable computation; Csanky proved it is in NC. However, parallel counting does not automatically translate to parallel sampling, as classic reductions between the two are inherently sequential. We show that a nearly quadratic parallel speedup over sequential sampling can be achieved for all the aforementioned distributions. If the distribution is supported on subsets of size $k$ of a ground set, we show how to approximately produce a sample in $\widetilde{O}(k^{\frac{1}{2} + c})$ time with polynomially many processors for any constant $c>0$. In the two special cases of symmetric determinantal point processes and planar perfect matchings, our bound improves to $\widetilde{O}(\sqrt k)$ and we show how to sample exactly in these cases. As our main technical contribution, we fully characterize the limits of batching for the steps of sampling-to-counting reductions. We observe that only $O(1)$ steps can be batched together if we strive for exact sampling, even in the case of nonsymmetric determinantal point processes. However, we show that for approximate sampling, $\widetilde{\Omega}(k^{\frac{1}{2}-c})$ steps can be batched together, for any entropically independent distribution, which includes all mentioned classes of determinantal point processes. Entropic independence and related notions have been the source of breakthroughs in Markov chain analysis in recent years, so we expect our framework to prove useful for distributions beyond those studied in this work.

翻译：我们研究了与行列式相关分布并行采样的问题：包括对称、非对称和分区约束的行列式点过程，以及平面完美匹配。对于这些分布，配分函数（即计数值）可通过矩阵行列式获得，而矩阵行列式计算是高度可并行化的；Csanky已证明该问题属于NC复杂度类。然而，并行计数并不能自动转化为并行采样，因为两类经典归约本质上具有顺序性。我们证明，对于上述所有分布，均能实现较顺序采样近乎二次的并行加速。若分布定义在基集大小为$k$的子集上，我们展示了如何用多项式个处理器在$\widetilde{O}(k^{\frac{1}{2} + c})$时间内近似生成一个样本，其中$c>0$为任意常数。在对称行列式点过程和平面完美匹配这两个特例中，我们的界改进为$\widetilde{O}(\sqrt k)$，并展示了这些情况下如何实现精确采样。作为主要技术贡献，我们完整刻画了采样-计数归约步骤中批处理能力的极限。我们观察到，对于非对称行列式点过程，若寻求精确采样，最多只能将$O(1)$步合并为一批。然而，对于任意熵独立分布（包括所有已提及的行列式点过程类别），我们证明在近似采样中可将$\widetilde{\Omega}(k^{\frac{1}{2}-c})$步合并为一批。熵独立性及相关概念近年来已成为马尔可夫链分析领域的突破性进展来源，我们预期该框架将能够应用于本文研究范围之外的其他分布。