The Johnson-Lindenstrauss (JL) Lemma introduced the concept of dimension reduction via a random linear map, which has become a fundamental technique in many computational settings. For a set of $n$ points in $\mathbb{R}^d$ and any fixed $\epsilon>0$, it reduces the dimension $d$ to $O(\log n)$ while preserving, with high probability, all the pairwise Euclidean distances within factor $1+\epsilon$. Perhaps surprisingly, the target dimension can be lower if one only wishes to preserve the optimal value of a certain problem on the pointset, e.g., Euclidean max-cut or $k$-means. However, for some notorious problems, like diameter (aka furthest pair), dimension reduction via the JL map to below $O(\log n)$ does not preserve the optimal value within factor $1+\epsilon$. We propose to focus on another regime, of \emph{moderate dimension reduction}, where a problem's value is preserved within factor $\alpha>1$ using target dimension $\tfrac{\log n}{poly(\alpha)}$. We establish the viability of this approach and show that the famous $k$-center problem is $\alpha$-approximated when reducing to dimension $O(\tfrac{\log n}{\alpha^2}+\log k)$. Along the way, we address the diameter problem via the special case $k=1$. Our result extends to several important variants of $k$-center (with outliers, capacities, or fairness constraints), and the bound improves further with the input's doubling dimension. While our $poly(\alpha)$-factor improvement in the dimension may seem small, it actually has significant implications for streaming algorithms, and easily yields an algorithm for $k$-center in dynamic geometric streams, that achieves $O(\alpha)$-approximation using space $poly(kdn^{1/\alpha^2})$. This is the first algorithm to beat $O(n)$ space in high dimension $d$, as all previous algorithms require space at least $\exp(d)$. Furthermore, it extends to the $k$-center variants mentioned above.
翻译:Johnson-Lindenstrauss (JL) 引理通过随机线性映射引入了降维的概念,现已成为众多计算场景中的基础技术。对于 $\mathbb{R}^d$ 空间中的 $n$ 个点集及任意固定的 $\epsilon>0$,该技术能以高概率将所有点对间的欧氏距离保持在 $1+\epsilon$ 因子内,同时将维度 $d$ 降至 $O(\log n)$。或许令人惊讶的是,如果仅需保持点集上特定问题的最优值(例如欧氏最大割或 $k$-均值),目标维度可以更低。然而,对于一些棘手问题,如直径(亦称最远点对),通过 JL 映射将维度降至 $O(\log n)$ 以下,无法在 $1+\epsilon$ 因子内保持最优值。我们提出关注另一种**适度降维**机制,即使用目标维度 $\tfrac{\log n}{poly(\alpha)}$ 将问题值保持在 $\alpha>1$ 因子内。我们证实了该方法的可行性,并证明著名的 $k$-中心问题在降维至 $O(\tfrac{\log n}{\alpha^2}+\log k)$ 时可实现 $\alpha$ 近似。在此过程中,我们通过特例 $k=1$ 探讨了直径问题。我们的结果可推广至 $k$-中心的若干重要变体(含离群点、容量限制或公平性约束),且该界限随输入数据的倍增维度进一步改善。虽然我们在维度上取得的 $poly(\alpha)$ 因子改进看似微小,但其对流算法具有重要影响,并可轻松推导出动态几何数据流中 $k$-中心问题的算法,该算法使用 $poly(kdn^{1/\alpha^2})$ 空间实现 $O(\alpha)$ 近似。这是在维度 $d$ 较高时首个突破 $O(n)$ 空间复杂度的算法,因为以往所有算法至少需要 $\exp(d)$ 空间。此外,该算法可扩展至上述 $k$-中心变体问题。