You can remove GPT2's LayerNorm by fine-tuning

The LayerNorm (LN) layer in GPT-style transformer models has long been a hindrance to mechanistic interpretability. LN is a crucial component required to stabilize the training of large language models, and LN or the similar RMSNorm have been used in practically all large language models based on the transformer architecture. The non-linear nature of the LN layers is a hindrance for mechanistic interpretability as it hinders interpretation of the residual stream, and makes it difficult to decompose the model into circuits. Some researchers have gone so far as to name "reasons interpretability researchers hate layer norm." In this paper we show that it is possible to remove the LN layers from a pre-trained GPT2-small model by fine-tuning on a fraction (500M tokens) of the training data. We demonstrate that this LN-free model achieves similar performance to the original model on the OpenWebText and ThePile datasets (-0.05 cross-entropy loss), and the Hellaswag benchmark (-0.5% accuracy). We provide our implementation at https://github.com/ApolloResearch/gpt2_noLN, and fine-tuned GPT2-small models at https://huggingface.co/apollo-research/gpt2_noLN. Our work not only provides a simplified model for mechanistic interpretability research, but also provides evidence that the LN layers, at inference time, do not play a crucial role in transformer models.

翻译：GPT风格Transformer模型中的LayerNorm（LN）层长期以来一直是机制可解释性研究的障碍。LN是稳定大语言模型训练的关键组件，基于Transformer架构的所有大语言模型几乎都采用了LN或类似的RMSNorm。LN层的非线性特性阻碍了残差流的解释，使得模型难以分解为电路模块，因而成为机制可解释性研究的瓶颈。甚至有研究者将其列为“可解释性研究者厌恶LayerNorm的若干理由”。本文证明，通过对预训练的GPT2-small模型进行少量训练数据（5亿词元）的微调，可以完全移除其中的LN层。实验表明，这种无LN模型在OpenWebText和ThePile数据集上达到与原模型相近的性能（交叉熵损失仅增加0.05），在Hellaswag基准测试中准确率仅下降0.5%。我们在https://github.com/ApolloResearch/gpt2_noLN 开源了实现代码，并在https://huggingface.co/apollo-research/gpt2_noLN 提供了微调后的GPT2-small模型。这项工作不仅为机制可解释性研究提供了简化模型，同时证明在推理阶段LN层对Transformer模型并非关键组件。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日