Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

from arxiv, Accepted at ICML 2024; v2 updated to camera-ready version; Code available at https://github.com/MAGICS-LAB/OutEffHop; Models are on Hugging Face: https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f

We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm{OutEffHop}$) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlier-efficient attention mechanism (${\rm Softmax}_1$): it is an approximation of the memory retrieval process of $\mathrm{OutEffHop}$. Methodologically, this allows us to introduce novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, with superior post-quantization performance. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of standard modern Hopfield models, including fixed point convergence and exponential storage capacity. Empirically, we demonstrate the efficacy of the proposed model across large-scale transformer-based and Hopfield-based models (including BERT, OPT, ViT, and STanHop-Net), benchmarking against state-of-the-art methods like $\mathtt{Clipped\_Softmax}$ and $\mathtt{Gated\_Attention}$. Notably, $\mathrm{OutEffHop}$ achieves an average reduction of 22+\% in average kurtosis and 26+\% in the maximum infinity norm of model outputs across four models. Code is available at \href{https://github.com/MAGICS-LAB/OutEffHop}{GitHub}; models are on \href{https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f}{Hugging Face Hub}; future updates are on \href{https://arxiv.org/abs/2404.03828}{arXiv}.

翻译：我们提出了一种异常值高效现代霍普菲尔德模型（称为 $\mathrm{OutEffHop}$），并利用其解决训练超大规模基于Transformer模型时的异常值低效问题。我们的主要贡献是构建了一种新颖的联想记忆模型，该模型能够实现**异常值高效**的联想记忆检索。有趣的是，该记忆模型为一种异常值高效注意力机制（${\rm Softmax}_1$）提供了基于模型的解释：它是对 $\mathrm{OutEffHop}$ 记忆检索过程的近似。在方法论上，这使我们能够引入新型的异常值高效霍普菲尔德层，作为传统注意力机制的高性能替代方案，并具有优异的量化后性能。理论上，该异常值高效现代霍普菲尔德模型保留并改进了标准现代霍普菲尔德模型的优良特性，包括不动点收敛性和指数级存储容量。实证方面，我们在多种基于Transformer和基于霍普菲尔德的大型模型（包括 BERT、OPT、ViT 和 STanHop-Net）上验证了所提出模型的有效性，并与 $\mathtt{Clipped\_Softmax}$ 和 $\mathtt{Gated\_Attention}$ 等最先进方法进行了基准测试。值得注意的是，$\mathrm{OutEffHop}$ 在四个模型上平均将输出峰度降低了22%以上，将最大无穷范数降低了26%以上。代码发布于 \href{https://github.com/MAGICS-LAB/OutEffHop}{GitHub}；模型发布于 \href{https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f}{Hugging Face Hub}；后续更新发布于 \href{https://arxiv.org/abs/2404.03828}{arXiv}。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日