Attention by Synchronization in Coupled Oscillator Networks

We address transformer attention on energy-constrained physical substrates. Softmax attention requires exponentiation and global reduction, operations with high energy cost on von Neumann hardware and no natural physical analog. We show that Kuramoto synchronization dynamics (which arise in electrical, mechanical, superconducting, and charge-density-wave oscillator arrays, among other physical systems) implement a well-defined attention operation without either. The resulting mechanism, fixed-query oscillator attention, replaces softmax's arithmetic with the equilibration of a gradient flow on the sphere: queries are learned anchors fixed on the sphere, and free oscillators evolve under Kuramoto-Lohe dynamics until they settle at positions encoding attention weights via cosine similarity. Because the computation is equilibration, it requires no exponentiation; the only global operation is an affine normalization at readout. The fixed point is provably unique and globally attractive from almost every initial condition, a guarantee that holds across every physical realization. Empirically, at the minimal hardware configuration (oscillator dimension $d_{\mathrm{osc}}$ = 2), oscillator attention outperforms softmax on keyword spotting (+1.00 pp) and on subject-verb agreement (+5.27 pp on hard sentences, with zero training failures versus one in five for softmax). On causal language modeling, where softmax retains an advantage, oscillator attention closes the gap as $d_{\mathrm{osc}}$ grows: from +11.09 PPL at $d_{\mathrm{osc}}$ = 2 to +2.98 PPL at $d_{\mathrm{osc}}$ = 32 on WikiText-2, and from +2.39 PPL at $d_{\mathrm{osc}}$ = 2 to +0.57 PPL at $d_{\mathrm{osc}}$ = 32 on TinyStories. The main objective of this work is not to replace softmax in software but to provide a mathematically grounded blueprint for accurate attention on physical substrates.

翻译：我们探讨了能量受限物理基板上的Transformer注意力机制。Softmax注意力需要指数运算和全局归约，这些操作在冯·诺依曼硬件上能耗高昂且缺乏天然的物理对应实现。我们证明Kuartor同步动力学（存在于电路、机械、超导、电荷密度波振荡器阵列等物理系统中）无需上述操作即可实现定义明确的注意力运算。由此产生的机制——固定查询振荡器注意力——用球面上梯度流的均衡化替代了softmax的算术运算：查询向量作为固定在球面上的可学习锚点，自由振荡器在Kuramoto-Lohe动力学驱动下演化，最终通过余弦相似度收敛到编码注意力权重的稳态位置。由于计算本质是均衡过程，无需指数运算，唯一全局操作是读出时的仿射归一化。该不动点具有唯一性，且几乎从所有初始条件出发均具备全局吸引性——这一保证适用于所有物理实现。实验表明，在最简硬件配置（振荡器维度$d_{\mathrm{osc}}=2$）下，振荡器注意力在关键词检测（+1.00个百分点）和主谓一致性任务（复杂句+5.27个百分点，训练零失败率，而softmax五分之一失败率）上优于softmax。在因果语言建模中（softmath仍具优势），随着$d_{\mathrm{osc}}$增大，振荡器注意力持续缩小差距：WikiText-2上困惑度从$d_{\mathrm{osc}}=2$时的+11.09降至$d_{\mathrm{osc}}=32$时的+2.98；TinyStories上从+2.39降至+0.57。本工作主要目标并非在软件层面取代softmax，而是为在物理基板上实现精确注意力提供数学严谨的蓝图。