As large language models (LLMs) demonstrate outstanding performance across various tasks, attention-driven models have profoundly transformed the field of machine learning. Since attention computations account for the primary computational overhead in both model inference and training, efficiently computing attention matrices has become one of the core challenges in accelerating large language models. It is well-known that quantum machines possess computational advantages over classical machines, and the role of quantum computing in LLMs remains largely unexplored. In this work, we focus on leveraging the Grover search algorithm to efficiently compute a sparse attention matrix. Through comparisons with classical algorithms, we demonstrate that our method achieves quantum acceleration in polynomial time. Additionally, we observe that the generated quantum attention matrices naturally exhibit low-rank structures, providing further theoretical support for efficient modeling. Moreover, within the specific context of attention matrix computation, we conduct a systematic and detailed analysis of the error and time complexity of the proposed algorithm.
翻译:随着大语言模型(LLM)在各种任务中展现出卓越性能,基于注意力的模型已深刻改变了机器学习领域。由于注意力计算占据了模型推理和训练过程中的主要计算开销,高效计算注意力矩阵已成为加速大语言模型的核心挑战之一。众所周知,量子机器相较于经典机器具有计算优势,而量子计算在LLM中的作用在很大程度上尚未被探索。在本工作中,我们重点研究如何利用Grover搜索算法高效计算稀疏注意力矩阵。通过与经典算法的比较,我们证明了该方法能在多项式时间内实现量子加速。此外,我们观察到生成的量子注意力矩阵天然呈现低秩结构,这为高效建模提供了进一步的理论支持。进一步地,在注意力矩阵计算的具体背景下,我们对所提出算法的误差与时间复杂度进行了系统而详细的分析。