We give a quantum approximation scheme (i.e., $(1 + \varepsilon)$-approximation for every $\varepsilon > 0$) for the classical $k$-means clustering problem in the QRAM model with a running time that has only polylogarithmic dependence on the number of data points. More specifically, given a dataset $V$ with $N$ points in $\mathbb{R}^d$ stored in QRAM data structure, our quantum algorithm runs in time $\tilde{O} \left( 2^{\tilde{O}(\frac{k}{\varepsilon})} \eta^2 d\right)$ and with high probability outputs a set $C$ of $k$ centers such that $cost(V, C) \leq (1+\varepsilon) \cdot cost(V, C_{OPT})$. Here $C_{OPT}$ denotes the optimal $k$-centers, $cost(.)$ denotes the standard $k$-means cost function (i.e., the sum of the squared distance of points to the closest center), and $\eta$ is the aspect ratio (i.e., the ratio of maximum distance to minimum distance). This is the first quantum algorithm with a polylogarithmic running time that gives a provable approximation guarantee of $(1+\varepsilon)$ for the $k$-means problem. Also, unlike previous works on unsupervised learning, our quantum algorithm does not require quantum linear algebra subroutines and has a running time independent of parameters (e.g., condition number) that appear in such procedures.
翻译:我们提出了一种在QRAM模型下针对经典k-means聚类问题的量子近似方案(即对于任意ε>0,提供(1+ε)-近似),其运行时间仅与数据点数量呈多对数关系。具体而言,给定一个包含N个点(位于ℝ^d中)并存储在QRAM数据结构中的数据集V,我们的量子算法在时间\(\tilde{O} \left( 2^{\tilde{O}(\frac{k}{\varepsilon})} \eta^2 d\right)\)内运行,并以高概率输出一个包含k个中心点的集合C,使得\(cost(V, C) \leq (1+\varepsilon) \cdot cost(V, C_{OPT})\)。这里\(C_{OPT}\)表示最优的k个中心点,\(cost(.)\)表示标准的k-means代价函数(即各点到最近中心点的平方距离之和),而η是纵横比(即最大距离与最小距离之比)。这是首个运行时间呈多对数关系且能为k-means问题提供可证明(1+ε)近似保证的量子算法。此外,与先前关于无监督学习的研究不同,我们的量子算法无需量子线性代数子程序,其运行时间独立于此类过程中出现的参数(例如条件数)。