We give a quantum approximation scheme (i.e., $(1 + \varepsilon)$-approximation for every $\varepsilon > 0$) for the classical $k$-means clustering problem in the QRAM model with a running time that has only polylogarithmic dependence on the number of data points. More specifically, given a dataset $V$ with $N$ points in $\mathbb{R}^d$ stored in QRAM data structure, our quantum algorithm runs in time $\tilde{O} \left( 2^{\tilde{O}(\frac{k}{\varepsilon})} \eta^2 d\right)$ and with high probability outputs a set $C$ of $k$ centers such that $cost(V, C) \leq (1+\varepsilon) \cdot cost(V, C_{OPT})$. Here $C_{OPT}$ denotes the optimal $k$-centers, $cost(.)$ denotes the standard $k$-means cost function (i.e., the sum of the squared distance of points to the closest center), and $\eta$ is the aspect ratio (i.e., the ratio of maximum distance to minimum distance). This is the first quantum algorithm with a polylogarithmic running time that gives a provable approximation guarantee of $(1+\varepsilon)$ for the $k$-means problem. Also, unlike previous works on unsupervised learning, our quantum algorithm does not require quantum linear algebra subroutines and has a running time independent of parameters (e.g., condition number) that appear in such procedures.
翻译:我们提出了一种针对经典$k$-均值聚类问题的量子近似方案(即对任意$\varepsilon > 0$均可获得$(1 + \varepsilon)$近似解),该方案在QRAM模型下的运行时间仅与数据点数量呈多对数依赖关系。具体而言,给定存储在QRAM数据结构中的数据集$V$(包含$N$个$\mathbb{R}^d$空间中的点),我们的量子算法以$\tilde{O} \left( 2^{\tilde{O}(\frac{k}{\varepsilon})} \eta^2 d\right)$时间运行,并以高概率输出包含$k$个中心的集合$C$,使得$cost(V, C) \leq (1+\varepsilon) \cdot cost(V, C_{OPT})$。其中$C_{OPT}$表示最优的$k$中心集合,$cost(.)$表示标准$k$-均值代价函数(即各点到最近中心的平方距离之和),$\eta$为纵横比(即最大距离与最小距离之比)。这是首个具有多对数运行时间且能为$k$-均值问题提供可证明$(1+\varepsilon)$近似保证的量子算法。此外,与以往无监督学习的研究不同,本量子算法无需量子线性代数子程序,且运行时间独立于此类过程中出现的参数(例如条件数)。