$k$-中心聚类的适度降维 (Moderate Dimension Reduction for $k$-Center Clustering)

from arxiv, 24 pages, appeared in SoCG 2024. v3: minor corrections in page 8. v4: added references on robust coresets in section 3. v5: edited section 5 to use Kirszbraun extension theorem

The Johnson-Lindenstrauss (JL) Lemma introduced the concept of dimension reduction via a random linear map, which has become a fundamental technique in many computational settings. For a set of $n$ points in $\mathbb{R}^d$ and any fixed $\epsilon>0$, it reduces the dimension $d$ to $O(\log n)$ while preserving, with high probability, all the pairwise Euclidean distances within factor $1+\epsilon$. Perhaps surprisingly, the target dimension can be lower if one only wishes to preserve the optimal value of a certain problem on the pointset, e.g., Euclidean max-cut or $k$-means. However, for some notorious problems, like diameter (aka furthest pair), dimension reduction via the JL map to below $O(\log n)$ does not preserve the optimal value within factor $1+\epsilon$. We propose to focus on another regime, of \emph{moderate dimension reduction}, where a problem's value is preserved within factor $\alpha>1$ using target dimension $\tfrac{\log n}{poly(\alpha)}$. We establish the viability of this approach and show that the famous $k$-center problem is $\alpha$-approximated when reducing to dimension $O(\tfrac{\log n}{\alpha^2}+\log k)$. Along the way, we address the diameter problem via the special case $k=1$. Our result extends to several important variants of $k$-center (with outliers, capacities, or fairness constraints), and the bound improves further with the input's doubling dimension. While our $poly(\alpha)$-factor improvement in the dimension may seem small, it actually has significant implications for streaming algorithms, and easily yields an algorithm for $k$-center in dynamic geometric streams, that achieves $O(\alpha)$-approximation using space $poly(kdn^{1/\alpha^2})$. This is the first algorithm to beat $O(n)$ space in high dimension $d$, as all previous algorithms require space at least $\exp(d)$. Furthermore, it extends to the $k$-center variants mentioned above.

翻译：Johnson-Lindenstrauss (JL) 引理通过随机线性映射引入了降维的概念，现已成为众多计算场景中的基础技术。对于 $\mathbb{R}^d$ 空间中的 $n$ 个点集及任意固定的 $\epsilon>0$，该技术能以高概率将所有点对间的欧氏距离保持在 $1+\epsilon$ 因子内，同时将维度 $d$ 降至 $O(\log n)$。或许令人惊讶的是，如果仅需保持点集上特定问题的最优值（例如欧氏最大割或 $k$-均值），目标维度可以更低。然而，对于一些棘手问题，如直径（亦称最远点对），通过 JL 映射将维度降至 $O(\log n)$ 以下，无法在 $1+\epsilon$ 因子内保持最优值。我们提出关注另一种**适度降维**机制，即使用目标维度 $\tfrac{\log n}{poly(\alpha)}$ 将问题值保持在 $\alpha>1$ 因子内。我们证实了该方法的可行性，并证明著名的 $k$-中心问题在降维至 $O(\tfrac{\log n}{\alpha^2}+\log k)$ 时可实现 $\alpha$ 近似。在此过程中，我们通过特例 $k=1$ 探讨了直径问题。我们的结果可推广至 $k$-中心的若干重要变体（含离群点、容量限制或公平性约束），且该界限随输入数据的倍增维度进一步改善。虽然我们在维度上取得的 $poly(\alpha)$ 因子改进看似微小，但其对流算法具有重要影响，并可轻松推导出动态几何数据流中 $k$-中心问题的算法，该算法使用 $poly(kdn^{1/\alpha^2})$ 空间实现 $O(\alpha)$ 近似。这是在维度 $d$ 较高时首个突破 $O(n)$ 空间复杂度的算法，因为以往所有算法至少需要 $\exp(d)$ 空间。此外，该算法可扩展至上述 $k$-中心变体问题。