QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling

Modeling long-range dependencies in sequential data remains a central challenge in machine learning. Transformers address this challenge through attention mechanisms, but their quadratic complexity with respect to sequence length limits scalability to long contexts. State-space models (SSMs) provide an efficient alternative with linear-time computation by evolving a latent state through recurrent updates, but their memory is typically formed via additive or linear transitions, which can limit their ability to capture complex global interactions across tokens. In this work, we introduce one of the first studies to leverage the superposition property of quantum systems to enhance state-based sequence modeling. In particular, we propose Quantum Long-Attention Memory (QLAM), a hybrid quantum-classical memory mechanism that can be viewed as a quantum extension of state-space models. Instead of maintaining a classical latent state updated through additive dynamics, QLAM represents the hidden state as a quantum state whose amplitudes encode a superposition of historical information. The state evolves through parameterized quantum circuits conditioned on the input, enabling a non-classical, globally update mechanism. In this way, QLAM preserves the recurrent and linear-time structure of SSMs while fundamentally enriching the memory representation through quantum superposition. Unlike attention mechanisms that explicitly compute pairwise interactions, QLAM implicitly captures global dependencies through the evolution of the quantum state, and retrieves task-relevant information via query-dependent measurements. We evaluate QLAM on sequential variants of standard image classification benchmarks, including sMNIST, sFashion-MNIST, and sCIFAR-10, where images are flattened into token sequences. Across all tasks, QLAM consistently improves over recurrent baselines and transformer-based models.

翻译：在序列数据中建模长距离依赖关系仍是机器学习领域的核心挑战。Transformer通过注意力机制应对这一挑战，但其序列长度相关的二次复杂度限制了在长上下文场景下的可扩展性。状态空间模型（SSMs）通过递归更新演化隐状态，以线性时间计算提供高效替代方案，但其记忆通常由加法或线性转换形成，限制了捕捉令牌间复杂全局交互的能力。本文首次系统性探索利用量子系统的叠加特性增强基于状态的序列建模。具体而言，我们提出量子长注意力记忆（QLAM），一种可视为状态空间模型量子扩展的混合量子-经典记忆机制。QLAM不维护通过加法动态更新的经典隐状态，而是将隐藏状态表示为量子态，其振幅编码历史信息的叠加。该状态通过基于输入条件化的参数化量子电路演化，实现非经典全局更新机制。由此，QLAM在保持SSMs递归线性时间结构的同时，通过量子叠加从根本上丰富记忆表征。与显式计算成对交互的注意力机制不同，QLAM通过量子态演化隐式捕获全局依赖关系，并通过查询依赖的测量提取任务相关信息。我们在标准图像分类基准的序列化变体（包括sMNIST、sFashion-MNIST和sCIFAR-10，其中图像被展平为令牌序列）上评估QLAM。在所有任务中，QLAM较之递归基线模型与基于Transformer的模型均实现一致提升。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

LaCache：用于高效长上下文建模的大语言模型梯状KV缓存机制

专知会员服务

11+阅读 · 2025年7月23日

【ICML2025】大语言模型中有效长上下文建模的长短对齐方法

专知会员服务

13+阅读 · 2025年6月16日

非Transformer不可？最新《状态空间模型（SSM）》综述

专知会员服务

75+阅读 · 2024年4月16日

【NeurIPS 2021】流形上的注意力机制：规范等变的Transformer

专知会员服务

30+阅读 · 2021年12月2日