Multi-head self-attention (MHSA) is a key building block in modern vision Transformers, yet its quadratic complexity in the number of tokens remains a major bottleneck for real-time and resource-constrained deployment. We present PnP-Nystra, a training-free Nyström-based linear attention module designed as a plug-and-play replacement for MHSA in {pretrained} image restoration Transformers, with provable kernel approximation error guarantees. PnP-Nystra integrates directly into window-based architectures such as SwinIR, Uformer, and Dehazeformer, yielding efficient inference without finetuning. Across denoising, deblurring, dehazing, and super-resolution on images, PnP-Nystra delivers $1.8$--$3.6\times$ speedups on an NVIDIA RTX 4090 GPU and $1.8$--$7\times$ speedups on CPU inference. Compared with the strongest training-free linear-attention baselines we evaluate, our method incurs the smallest quality drop and stays closest to the original model's outputs.
翻译:多头自注意力(MHSA)是现代视觉Transformer的关键构建模块,但其在token数量上的二次复杂度仍是实时与资源受限部署的主要瓶颈。我们提出了PnP-Nystra,一种基于Nyström方法的无训练线性注意力模块,设计为预训练图像复原Transformer中MHSA的即插即用替代方案,并具有可证明的核近似误差保证。PnP-Nystra可直接集成到基于窗口的架构中,如SwinIR、Uformer和Dehazeformer,无需微调即可实现高效推理。在图像去噪、去模糊、去雾和超分辨率任务中,PnP-Nystra在NVIDIA RTX 4090 GPU上实现了$1.8$--$3.6\times$的加速,在CPU推理上实现了$1.8$--$7\times$的加速。与我们评估的最强无训练线性注意力基线相比,本方法产生的质量下降最小,且最接近原始模型的输出结果。