WildCat: Near-Linear Attention in Theory and Practice

We introduce WildCat, a high-accuracy, low-cost approach to compressing the attention mechanism in neural networks. While attention is a staple of modern network architectures, it is also notoriously expensive to deploy due to resource requirements that scale quadratically with the input sequence length $n$. WildCat avoids these quadratic costs by only attending over a small weighted coreset. Crucially, we select the coreset using a fast but spectrally-accurate subsampling algorithm -- randomly pivoted Cholesky -- and weight the elements optimally to minimise reconstruction error. Remarkably, given bounded inputs, WildCat approximates exact attention with super-polynomial $O(n^{-\sqrt{\log(\log(n))}})$ error decay while running in near-linear $O(n^{1+o(1)})$ time. In contrast, prior practical approximations either lack error guarantees or require quadratic runtime to guarantee such high fidelity. We couple this advance with a GPU-optimized PyTorch implementation and a suite of benchmark experiments demonstrating the benefits of WildCat for image generation, image classification, and language model KV cache compression.

翻译：本文提出WildCat，一种高精度、低成本的神经网络注意力机制压缩方法。注意力机制作为现代网络架构的核心组件，因其计算资源需求随输入序列长度$n$呈二次方增长而难以实际部署。WildCat通过仅对小型加权核心集进行注意力计算来规避二次方复杂度。我们采用快速且谱精度保持的随机主元Cholesky子采样算法选取核心集，并通过优化权重配置最小化重构误差。值得注意的是，在输入有界条件下，WildCat能以超多项式误差衰减率$O(n^{-\sqrt{\log(\log(n))}})$逼近精确注意力，同时保持近乎线性的$O(n^{1+o(1)})$时间复杂度。相比之下，现有实用化近似方法要么缺乏误差保证，要么需要二次方时间复杂度才能实现同等精度保障。我们将该算法与GPU优化的PyTorch实现相结合，通过图像生成、图像分类及语言模型KV缓存压缩等基准实验验证了WildCat的优越性。

相关内容

注意力机制

关注 0

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

「深度学习视觉注意力」最新2022研究综述，概述50种软硬注意力机制方法

专知会员服务

113+阅读 · 2022年4月20日