Moderate Dimension Reduction for $k$-Center Clustering

The Johnson-Lindenstrauss (JL) Lemma introduced the concept of dimension reduction via a random linear map, which has become a fundamental technique in many computational settings. For a set of $n$ points in $\mathbb{R}^d$ and any fixed $\epsilon>0$, it reduces the dimension $d$ to $O(\log n)$ while preserving, with high probability, all the pairwise Euclidean distances within factor $1+\epsilon$. Perhaps surprisingly, the target dimension can be lower if one only wishes to preserve the optimal value of a certain problem, e.g., max-cut or $k$-means. However, for some notorious problems, like diameter (aka furthest pair), dimension reduction via the JL map to below $O(\log n)$ does not preserve the optimal value within factor $1+\epsilon$. We propose to focus on another regime, of \emph{moderate dimension reduction}, where a problem's value is preserved within factor $\alpha=O(1)$ (or even larger) using target dimension $\log n / \mathrm{poly}(\alpha)$. We establish the viability of this approach and show that the famous $k$-center problem is $\alpha$-approximated when reducing to dimension $O(\tfrac{\log n}{\alpha^2}+\log k)$. Along the way, we address the diameter problem via the special case $k=1$. Our result extends to several important variants of $k$-center (with outliers, capacities, or fairness constraints), and the bound improves further with the input's doubling dimension. While our $poly(\alpha)$-factor improvement in the dimension may seem small, it actually has significant implications for streaming algorithms, and easily yields an algorithm for $k$-center in dynamic geometric streams, that achieves $O(\alpha)$-approximation using space $\mathrm{poly}(kdn^{1/\alpha^2})$. This is the first algorithm to beat $O(n)$ space in high dimension $d$, as all previous algorithms require space at least $\exp(d)$. Furthermore, it extends to the $k$-center variants mentioned above.

翻译：Johnson-Lindenstrauss引理通过随机线性映射引入了维度约简的概念，已成为众多计算场景中的基础技术。对于$\mathbb{R}^d$中的$n$个点集及任意固定$\epsilon>0$，该技术可将维度$d$降至$O(\log n)$，同时以高概率保持所有欧氏距离在$1+\epsilon$因子内。令人意外的是，若仅需保持特定问题（如最大割或$k$-均值）的最优值，目标维度还可进一步降低。然而，对于某些棘手问题（如直径，即最远点对），通过JL映射将维度降至$O(\log n)$以下时，无法在$1+\epsilon$因子内保持最优值。本文提出关注另一种机制——\emph{中等维度约简}，即通过使用目标维度$\log n / \mathrm{poly}(\alpha)$，在因子$\alpha=O(1)$（甚至更大）内保持问题值。我们证实了该方法的可行性，并表明著名的$k$-中心问题在维度降至$O(\tfrac{\log n}{\alpha^2}+\log k)$时可实现$\alpha$-近似。过程中，我们通过特例$k=1$处理了直径问题。该结果可推广至$k$-中心的多个重要变体（含离群点、容量或公平性约束），且边界随输入的双重维度进一步提升。尽管我们的维度中$\mathrm{poly}(\alpha)$因子改进看似微小，但其对流式算法具有重要影响，并能轻松导出动态几何流上的$k$-中心算法，该算法使用空间$\mathrm{poly}(kdn^{1/\alpha^2})$实现$O(\alpha)$-近似。这是首个在高维$d$中突破$O(n)$空间的算法——此前所有算法至少需要$\exp(d)$空间，且该算法可扩展至上述$k$-中心变体。