Exact Algorithms and Lower Bounds for Stable Instances of Euclidean k-Means

We investigate the complexity of solving stable or perturbation-resilient instances of $k$-Means and $k$-Median clustering in fixed dimension Euclidean metrics (more generally doubling metrics). The notion of stable (perturbation resilient) instances was introduced by Bilu and Linial [2010] and Awasthi et al. [2012]. In our context we say a $k$-Means instance is $\alpha$-stable if there is a unique OPT which remains optimum if distances are (non-uniformly) stretched by a factor of at most $\alpha$. Stable clustering instances have been studied to explain why heuristics such as Lloyd's algorithm perform well in practice. In this work we show that for any fixed $\epsilon>0$, $(1+\epsilon)$-stable instances of $k$-Means in doubling metrics can be solved in polynomial time. More precisely we show a natural multiswap local search algorithm finds OPT for $(1+\epsilon)$-stable instances of $k$-Means and $k$-Median in a polynomial number of iterations. We complement this result by showing that under a new PCP theorem, this is essentially tight: that when the dimension d is part of the input, there is a fixed $\epsilon_0>0$ s.t. there is not even a PTAS for $(1+\epsilon_0)$-stable $k$-Means in $R^d$ unless NP=RP. To do this, we consider a robust property of CSPs; call an instance stable if there is a unique optimum solution $x^*$ and for any other solution $x'$, the number of unsatisfied clauses is proportional to the Hamming distance between $x^*$ and $x'$. Dinur et al. have already shown stable QSAT is hard to approximate for some constant Q, our hypothesis is simply that stable QSAT with bounded variable occurrence is also hard. Given this hypothesis we consider "stability-preserving" reductions to prove our hardness for stable k-Means. Such reductions seem to be more fragile than standard L-reductions and may be of further use to demonstrate other stable optimization problems are hard.

翻译：我们研究了在固定维欧几里得度量（更一般地，倍率度量）中，求解稳定或抗扰动实例的k均值与k中位数聚类问题的复杂性。稳定（抗扰动）实例的概念由 Bilu 与 Linial [2010] 及 Awasthi 等人 [2012] 提出。在此背景下，我们称一个k均值实例是α-稳定的：当距离被最多α倍（非均匀）拉伸时，存在唯一的最优解（OPT）仍保持最优。稳定聚类实例被用于解释为何像Lloyd算法这样的启发式方法在实践中表现良好。本工作证明，对任意固定ε>0，倍率度量中(1+ε)-稳定的k均值实例可在多项式时间内求解。更精确地，我们证明一种自然的多重交换局部搜索算法，在多项式次迭代内可为(1+ε)-稳定的k均值与k中位数实例找到最优解。作为补充，我们证明在新PCP定理下，本结果本质上是紧的：当维度d作为输入的一部分时，存在固定ε0>0，使得若NP≠RP，则Rd中(1+ε0)-稳定的k均值甚至不存在多项式时间近似方案（PTAS）。为此，我们考虑约束满足问题（CSP）的一个鲁棒性质：称一个实例是稳定的，当存在唯一最优解x*，且对任意其他解x'，不满足的子句数量正比于x*与x'之间的汉明距离。Dinur等人已证明对某常数Q，稳定QSAT难以近似；我们的假设仅为：变量出现有界的稳定QSAT也是难解的。在此假设下，我们设计“稳定性保持”归约来证明稳定k均值的难解性。此类归约比标准L-归约更脆弱，但可能进一步用于论证其他稳定优化问题的难解性。