Given the query, key and value matrices $Q, K, V\in \mathbb{R}^{n\times d}$, the attention module is defined as $\mathrm{Att}(Q, K, V)=D^{-1}AV$ where $A=\exp(QK^\top/\sqrt{d})$ with $\exp(\cdot)$ applied entrywise, $D=\mathrm{diag}(A{\bf 1}_n)$. The attention module is the backbone of modern transformers and large language models, but explicitly forming the softmax matrix $D^{-1}A$ incurs $Ω(n^2)$ time, motivating numerous approximation schemes that reduce runtime to $\widetilde O(nd)$ via sparsity or low-rank factorization. We propose a quantum data structure that approximates any row of $\mathrm{Att}(Q, K, V)$ using only row queries to $Q, K, V$. Our algorithm preprocesses these matrices in $\widetilde{O}\left( ε^{-1} n^{0.5} \left( s_λ^{2.5} + s_λ^{1.5} d + α^{0.5} d \right) \right)$ time, where $ε$ is the target accuracy, $s_λ$ is the $λ$-statistical dimension of the exponential kernel defined by $Q$ and $K$, and $α$ measures the row distortion of $V$ that is at most $d/{\rm srank}(V)$, the stable rank of $V$. Each row query can be answered in $\widetilde{O}(s_λ^2 + s_λd)$ time. To our knowledge, this is the first quantum data structure that approximates rows of the attention matrix in sublinear time with respect to $n$. Our approach relies on a quantum Nyström approximation of the exponential kernel, quantum multivariate mean estimation for computing $D$, and quantum leverage score sampling for the multiplication with $V$.
翻译:给定查询、键和值矩阵 $Q, K, V\in \mathbb{R}^{n\times d}$,注意力模块定义为 $\mathrm{Att}(Q, K, V)=D^{-1}AV$,其中 $A=\exp(QK^\top/\sqrt{d})$($\exp(\cdot)$ 逐元素作用),$D=\mathrm{diag}(A{\bf 1}_n)$。注意力模块是现代Transformer与大型语言模型的核心组件,但显式构造softmax矩阵 $D^{-1}A$ 需要 $Ω(n^2)$ 时间,这催生了众多通过稀疏化或低秩分解将运行时间降至 $\widetilde O(nd)$ 的近似方案。我们提出一种量子数据结构,仅通过对 $Q, K, V$ 的行查询即可近似 $\mathrm{Att}(Q, K, V)$ 的任意行。我们的算法以 $\widetilde{O}\left( ε^{-1} n^{0.5} \left( s_λ^{2.5} + s_λ^{1.5} d + α^{0.5} d \right) \right)$ 时间预处理这些矩阵,其中 $ε$ 为目标精度,$s_λ$ 是由 $Q$ 和 $K$ 定义的指数核的 $λ$ 统计维度,$α$ 度量 $V$ 的行失真(其上限为 $d/{\rm srank}(V)$,即 $V$ 的稳定秩)。每个行查询可在 $\widetilde{O}(s_λ^2 + s_λd)$ 时间内完成。据我们所知,这是首个能在关于 $n$ 的亚线性时间内近似注意力矩阵各行的量子数据结构。我们的方法依赖于指数核的量子Nyström近似、用于计算 $D$ 的量子多元均值估计,以及用于与 $V$ 相乘的量子杠杆得分采样。