Parameterized Approximation for Robust Clustering in Discrete Geometric Spaces

We consider the well-studied Robust $(k, z)$-Clustering problem, which generalizes the classic $k$-Median, $k$-Means, and $k$-Center problems. Given a constant $z\ge 1$, the input to Robust $(k, z)$-Clustering is a set $P$ of $n$ weighted points in a metric space $(M,\delta)$ and a positive integer $k$. Further, each point belongs to one (or more) of the $m$ many different groups $S_1,S_2,\ldots,S_m$. Our goal is to find a set $X$ of $k$ centers such that $\max_{i \in [m]} \sum_{p \in S_i} w(p) \delta(p,X)^z$ is minimized. This problem arises in the domains of robust optimization [Anthony, Goyal, Gupta, Nagarajan, Math. Oper. Res. 2010] and in algorithmic fairness. For polynomial time computation, an approximation factor of $O(\log m/\log\log m)$ is known [Makarychev, Vakilian, COLT $2021$], which is tight under a plausible complexity assumption even in the line metrics. For FPT time, there is a $(3^z+\epsilon)$-approximation algorithm, which is tight under GAP-ETH [Goyal, Jaiswal, Inf. Proc. Letters, 2023]. Motivated by the tight lower bounds for general discrete metrics, we focus on \emph{geometric} spaces such as the (discrete) high-dimensional Euclidean setting and metrics of low doubling dimension, which play an important role in data analysis applications. First, for a universal constant $\eta_0 >0.0006$, we devise a $3^z(1-\eta_{0})$-factor FPT approximation algorithm for discrete high-dimensional Euclidean spaces thereby bypassing the lower bound for general metrics. We complement this result by showing that even the special case of $k$-Center in dimension $\Theta(\log n)$ is $(\sqrt{3/2}- o(1))$-hard to approximate for FPT algorithms. Finally, we complete the FPT approximation landscape by designing an FPT $(1+\epsilon)$-approximation scheme (EPAS) for the metric of sub-logarithmic doubling dimension.

翻译：我们研究了经典的鲁棒$(k,z)$聚类问题，该问题推广了经典的$k$-中位数、$k$-均值和$k$-中心问题。对于常数$z\ge 1$，鲁棒$(k,z)$聚类问题的输入为度量空间$(M,\delta)$中一组包含$n$个加权点的集合$P$，以及正整数$k$。此外，每个点属于$m$个不同组$S_1,S_2,\ldots,S_m$中的一个（或多个）。我们的目标是找到一个包含$k$个中心的集合$X$，使得$\max_{i \in [m]} \sum_{p \in S_i} w(p) \delta(p,X)^z$最小化。该问题出现在鲁棒优化[Anthony, Goyal, Gupta, Nagarajan, Math. Oper. Res. 2010]和算法公平性领域。对于多项式时间计算，已知近似因子为$O(\log m/\log\log m)$ [Makarychev, Vakilian, COLT $2021$]，即使在直线度量下，该因子在合理的复杂性假设下也是紧的。对于FPT时间，存在$(3^z+\epsilon)$-近似算法，该算法在GAP-ETH假设下是紧的[Goyal, Jaiswal, Inf. Proc. Letters, 2023]。受一般离散度量下紧下界的启发，我们聚焦于\emph{几何}空间，例如（离散）高维欧几里得空间和低倍增维度量，这些空间在数据分析应用中扮演重要角色。首先，针对通用常数$\eta_0 >0.0006$，我们为离散高维欧几里得空间设计了一个$3^z(1-\eta_{0})$因子的FPT近似算法，从而突破了通用度量的下界。我们补充了该结果，证明即使是在维度$\Theta(\log n)$的$k$-中心特例中，对于FPT算法，$(\sqrt{3/2}- o(1))$近似是困难的。最后，我们通过为亚对数倍增维度量设计FPT $(1+\epsilon)$-近似方案（EPAS），完善了FPT近似的整体图景。