We propose a variational quantum implementation of self-attention (QSA), the core operation in transformers and large language models, which predicts future elements of a sequence by forming overlap-weighted combinations of past data. At variance with previous approaches, our QSA realizes the required nonlinearity through interference of state overlaps and returns a Renyi-1/2 cross-entropy loss directly as the expectation value of an observable, avoiding the need to decode amplitude-encoded predictions into classical logits. Furthermore, QSA naturally accommodates a constrained, trainable data-embedding that ties quantum state overlaps to data-level similarities. We find a gate complexity dominant scaling O(T d^2) for QSA, versus O(T^2 d) classically, suggesting an advantage in the practical regime where the sequence length T dominates the embedding size d. In simulations, we show that our QSA-based quantum transformer learns sequence prediction on classical data and on many-body transverse-field Ising quantum trajectories, establishing trainable attention as a practical primitive for quantum dynamical modeling.
翻译:我们提出了一种自注意力机制(QSA)的变分量子实现方案,该机制是Transformer架构及大型语言模型的核心运算单元,通过构建基于重叠加权的历史数据组合来预测序列的未来元素。与先前方法不同,本研究的QSA通过量子态重叠的干涉效应实现所需的非线性运算,并直接以可观测量期望值的形式返回Renyi-1/2交叉熵损失,从而避免了将振幅编码的预测结果解码为经典逻辑值的需求。此外,QSA天然支持一种约束化的可训练数据嵌入方案,该方案将量子态重叠与数据层面的相似性进行关联。研究发现QSA的门复杂度主导项为O(T d^2),而经典实现为O(T^2 d),这表明在序列长度T远大于嵌入维度d的实际应用场景中,量子方案具有潜在优势。通过数值模拟,我们证明了基于QSA的量子Transformer能够学习经典数据的序列预测任务,并能处理多体横场伊辛模型的量子演化轨迹,从而确立了可训练注意力机制作为量子动力学建模的实用基础模块。