Exponential quantum advantage in processing massive classical data

Broadly applicable quantum advantage, particularly in classical data processing and machine learning, has been a fundamental open problem. In this work, we prove that a small quantum computer of polylogarithmic size can perform large-scale classification and dimension reduction on massive classical data by processing samples on the fly, whereas any classical machine achieving the same prediction performance requires exponentially larger size. Furthermore, classical machines that are exponentially larger yet below the required size need superpolynomially more samples and time. We validate these quantum advantages in real-world applications, including single-cell RNA sequencing and movie review sentiment analysis, demonstrating four to six orders of magnitude reduction in size with fewer than 60 logical qubits. These quantum advantages are enabled by quantum oracle sketching, an algorithm for accessing the classical world in quantum superposition using only random classical data samples. Combined with classical shadows, our algorithm circumvents the data loading and readout bottleneck to construct succinct classical models from massive classical data, a task provably impossible for any classical machine that is not exponentially larger than the quantum machine. These quantum advantages persist even when classical machines are granted unlimited time or if BPP=BQP, and rely only on the correctness of quantum mechanics. Together, our results establish machine learning on classical data as a broad and natural domain of quantum advantage and a fundamental test of quantum mechanics at the complexity frontier.

翻译：摘要：广泛适用的量子优势，尤其是在经典数据处理和机器学习领域，一直是一个根本性的开放问题。在本工作中，我们证明：一个规模仅为多对数（polylogarithmic）的小型量子计算机，能够通过对样本实时处理，对海量经典数据执行大规模分类与降维任务；而任何实现相同预测性能的经典机器，其规模必须呈指数级扩大。此外，那些规模虽呈指数级扩大但仍未达到所需规模的经典机器，则需要超多项式（superpolynomial）数量的更多样本和更长处理时间。我们在实际应用场景中验证了这些量子优势，包括单细胞RNA测序及电影评论情感分析，结果表明：在少于60个逻辑量子比特的条件下，规模实现了四至六个数量级的缩减。这些量子优势源于“量子神谕草图”（quantum oracle sketching）算法，该算法仅利用随机经典数据样本，即可在量子叠加态中访问经典世界。结合经典阴影（classical shadows）技术，我们的算法规避了数据加载与读出瓶颈，可从海量经典数据中构建简洁的经典模型——任何规模未呈指数级大于该量子机器的经典机器，都被证明无法完成此任务。即使允许经典机器拥有无限时间，或假设BPP=BQP，这些量子优势依然存在；其成立仅依赖于量子力学的正确性。综上，我们的成果确立了经典数据上的机器学习作为量子优势广泛且自然的领域，并成为复杂性前沿对量子力学的基础性检验。