$k$-Clustering in $\mathbb{R}^d$ (e.g., $k$-median and $k$-means) is a fundamental machine learning problem. While near-linear time approximation algorithms were known in the classical setting for a dataset with cardinality $n$, it remains open to find sublinear-time quantum algorithms. We give quantum algorithms that find coresets for $k$-clustering in $\mathbb{R}^d$ with $\tilde{O}(\sqrt{nk}d^{3/2})$ query complexity. Our coreset reduces the input size from $n$ to $\mathrm{poly}(k\epsilon^{-1}d)$, so that existing $\alpha$-approximation algorithms for clustering can run on top of it and yield $(1 + \epsilon)\alpha$-approximation. This eventually yields a quadratic speedup for various $k$-clustering approximation algorithms. We complement our algorithm with a nearly matching lower bound, that any quantum algorithm must make $\Omega(\sqrt{nk})$ queries in order to achieve even $O(1)$-approximation for $k$-clustering.
翻译:$k$-聚类在$\mathbb{R}^d$中(例如,$k$-中位数和$k$-均值)是一个基础的机器学习问题。虽然在经典设定下,对于具有基数$n$的数据集已经存在近线性时间的近似算法,但寻找亚线性时间的量子算法仍是一个开放问题。我们提出了量子算法,能够在$\tilde{O}(\sqrt{nk}d^{3/2})$查询复杂度下为$\mathbb{R}^d$中的$k$-聚类找到核心集。我们的核心集将输入规模从$n$缩减到$\mathrm{poly}(k\epsilon^{-1}d)$,使得现有的$\alpha$-近似聚类算法可以在其基础上运行,并产生$(1 + \epsilon)\alpha$-近似。这最终为多种$k$-聚类近似算法带来了二次加速。我们通过一个几乎匹配的下界来补充我们的算法,即任何量子算法必须进行$\Omega(\sqrt{nk})$次查询,才能实现$k$-聚类的$O(1)$-近似。