This paper introduces the Polynomial Mixer (PoM), a novel token mixing mechanism with linear complexity that serves as a drop-in replacement for self-attention. PoM aggregates input tokens into a compact representation through a learned polynomial function, from which each token retrieves contextual information. We prove that PoM satisfies the contextual mapping property, ensuring that transformers equipped with PoM remain universal sequence-to-sequence approximators. We replace standard self-attention with PoM across five diverse domains: text generation, handwritten text recognition, image generation, 3D modeling, and Earth observation. PoM matches the performance of attention-based models while drastically reducing computational cost when working with long sequences. The code is available at https://github.com/davidpicard/pom.
翻译:本文提出多项式混合器(PoM),一种具有线性复杂度的新型令牌混合机制,可作为自注意力的直接替代方案。PoM通过学习的多项式函数将输入令牌聚合为紧凑表示,每个令牌从中检索上下文信息。我们证明了PoM满足上下文映射性质,确保配备PoM的Transformer依然是通用的序列到序列逼近器。我们在五个不同领域(文本生成、手写文本识别、图像生成、3D建模与地球观测)中用PoM替代标准自注意力。PoM在匹配基于注意力模型性能的同时,大幅降低了处理长序列时的计算成本。代码已开源至 https://github.com/davidpicard/pom。