Given a set of $n$ vectors in $\mathbb{R}^d$, the goal of the \emph{determinant maximization} problem is to pick $k$ vectors with the maximum volume. Determinant maximization is the MAP-inference task for determinantal point processes (DPP) and has recently received considerable attention for modeling diversity. As most applications for the problem use large amounts of data, this problem has been studied in the relevant \textit{composable coreset} setting. In particular, [Indyk-Mahabadi-OveisGharan-Rezaei--SODA'20, ICML'19] showed that one can get composable coresets with optimal approximation factor of $\tilde O(k)^k$ for the problem, and that a local search algorithm achieves an almost optimal approximation guarantee of $O(k)^{2k}$. In this work, we show that the widely-used Greedy algorithm also provides composable coresets with an almost optimal approximation factor of $O(k)^{3k}$, which improves over the previously known guarantee of $C^{k^2}$, and supports the prior experimental results showing the practicality of the greedy algorithm as a coreset. Our main result follows by showing a local optimality property for Greedy: swapping a single point from the greedy solution with a vector that was not picked by the greedy algorithm can increase the volume by a factor of at most $(1+\sqrt{k})$. This is tight up to the additive constant $1$. Finally, our experiments show that the local optimality of the greedy algorithm is even lower than the theoretical bound on real data sets.
翻译:给定 $\mathbb{R}^d$ 中的 $n$ 个向量集合,\emph{行列式最大化}问题的目标是选取 $k$ 个向量使得其体积最大。行列式最大化是行列式点过程(DPP)中的最大后验推断任务,近年来因在建模多样性方面的应用而备受关注。由于该问题的大多数应用涉及海量数据,因此在相关的\textit{可组合核心集}设置下对其进行了研究。具体而言,[Indyk-Mahabadi-OveisGharan-Rezaei--SODA'20, ICML'19] 证明了该问题可构造近似因子为 $\tilde O(k)^k$ 的最优可组合核心集,且局部搜索算法可实现 $O(k)^{2k}$ 的近乎最优近似保证。本文中,我们证明广泛使用的贪心算法同样能提供近似因子为 $O(k)^{3k}$ 的近乎最优可组合核心集,这改进了先前已知的 $C^{k^2}$ 保证,并支持了先前关于贪心算法作为核心集实用性的实验结果。我们的主要结果源于展示贪心算法的局部最优性质:将贪心解中的单个点与贪心算法未选取的向量交换,可使体积至多增加 $(1+\sqrt{k})$ 倍。该结果在加法常数 $1$ 的意义下是紧的。最后,实验表明贪心算法的局部最优性在实际数据集上甚至低于理论界。