Moderate Dimension Reduction for $k$-Center Clustering

The Johnson-Lindenstrauss (JL) Lemma introduced the concept of dimension reduction via a random linear map, which has become a fundamental technique in many computational settings. For a set of $n$ points in $\mathbb{R}^d$ and any fixed $\epsilon>0$, it reduces the dimension $d$ to $O(\log n)$ while preserving, with high probability, all the pairwise Euclidean distances within factor $1+\epsilon$. Perhaps surprisingly, the target dimension can be lower if one only wishes to preserve the optimal value of a certain problem on the pointset, e.g., Euclidean max-cut or $k$-means. However, for some notorious problems, like diameter (aka furthest pair), dimension reduction via the JL map to below $O(\log n)$ does not preserve the optimal value within factor $1+\epsilon$. We propose to focus on another regime, of \emph{moderate dimension reduction}, where a problem's value is preserved within factor $\alpha>1$ using target dimension $\tfrac{\log n}{poly(\alpha)}$. We establish the viability of this approach and show that the famous $k$-center problem is $\alpha$-approximated when reducing to dimension $O(\tfrac{\log n}{\alpha^2}+\log k)$. Along the way, we address the diameter problem via the special case $k=1$. Our result extends to several important variants of $k$-center (with outliers, capacities, or fairness constraints), and the bound improves further with the input's doubling dimension. While our $poly(\alpha)$-factor improvement in the dimension may seem small, it actually has significant implications for streaming algorithms, and easily yields an algorithm for $k$-center in dynamic geometric streams, that achieves $O(\alpha)$-approximation using space $poly(kdn^{1/\alpha^2})$. This is the first algorithm to beat $O(n)$ space in high dimension $d$, as all previous algorithms require space at least $\exp(d)$. Furthermore, it extends to the $k$-center variants mentioned above.

翻译：Johnson-Lindenstrauss (JL) 引理提出了通过随机线性映射进行降维的概念，该技术已成为许多计算场景中的基本工具。对于 $\mathbb{R}^d$ 中的 $n$ 个点集和任意固定的 $\epsilon>0$，该引理将维度 $d$ 降至 $O(\log n)$，同时以高概率保持所有欧氏距离在 $1+\epsilon$ 因子内。或许令人惊讶的是，如果仅需保持点集上某特定问题（例如欧几里得最大割或 $k$-均值）的最优值，则目标维度可以更低。然而，对于某些棘手问题（如直径问题，即最远点对），通过 JL 映射降维至 $O(\log n)$ 以下无法在 $1+\epsilon$ 因子内保持最优值。我们提出关注另一种新范式——\emph{适度降维}，即使用目标维度 $\tfrac{\log n}{\mathrm{poly}(\alpha)}$ 在 $\alpha>1$ 因子内保持问题值。我们验证了该方法的可行性，并证明著名的 $k$-中心问题在降至维度 $O(\tfrac{\log n}{\alpha^2}+\log k)$ 时可以实现 $\alpha$-近似。在此过程中，我们通过特殊情形 $k=1$ 处理了直径问题。该结果可推广至 $k$-中心的多个重要变体（含离群点、容量或公平性约束），且当输入具有双倍维数时界值进一步优化。尽管我们在维度上的 $\mathrm{poly}(\alpha)$ 因子改进看似微小，但其对流式算法具有重要影响：可轻易得到动态几何流中的 $k$-中心算法，该算法使用 $\mathrm{poly}(kdn^{1/\alpha^2})$ 空间实现 $O(\alpha)$-近似。这是首个在高维 $d$ 下突破 $O(n)$ 空间的算法，因为所有先前算法至少需要 $\exp(d)$ 空间。此外，该结果还可推广至上述 $k$-中心变体。