While large language models (LLMs) have enabled learning knowledge from the pre-training corpora, the acquired knowledge may be fundamentally incorrect or outdated over time, which necessitates rectifying the knowledge of the language model (LM) after the training. A promising approach involves employing a hyper-network to generate parameter shift, whereas existing hyper-networks suffer from inferior scalability in synchronous editing operation amount. To mitigate the problem, we propose the MAssive Language Model Editing Network (MALMEN), which formulates the parameter shift aggregation as the least square problem, subsequently updating the LM parameters using the normal equation. To accommodate editing multiple facts simultaneously with limited memory budgets, we separate the computation on the hyper-network and LM, enabling arbitrary batch size on both neural networks. Our method is evaluated by editing up to thousands of facts on LMs with different architectures, i.e., BERT-base, GPT-2, T5-XL (2.8B), and GPT-J (6B), across various knowledge-intensive NLP tasks, i.e., closed book fact-checking and question answering. Remarkably, MALMEN is capable of editing hundreds of times more facts than strong baselines with the identical hyper-network architecture and outperforms editor specifically designed for GPT. Our code is available at https://github.com/ChenmienTan/malmen.
翻译:尽管大语言模型(LLMs)能够从预训练语料中学习知识,但其所获取的知识可能从根本上存在错误或随时间过时,因此需要在训练后修正语言模型(LM)的知识。目前一种有前景的方法是使用超网络生成参数偏移,然而现有超网络在同步编辑操作数量方面存在可扩展性较差的缺陷。为解决该问题,我们提出大规模语言模型编辑网络(MALMEN),将参数偏移聚合问题形式化为最小二乘问题,进而利用正规方程更新语言模型参数。为在有限内存预算下同时编辑多个事实,我们将超网络与语言模型的计算过程分离,使两个神经网络均支持任意批处理大小。该方法通过在不同架构的语言模型(BERT-base、GPT-2、T5-XL(2.8B) 和 GPT-J(6B))上编辑多达数千个事实进行评测,涵盖闭卷事实验证和问答等多项知识密集型自然语言处理任务。值得注意的是,MALMEN在相同超网络架构下可编辑比强基线模型多数百倍的事实,且其性能超越专门为GPT设计的编辑器。我们的代码已开源至 https://github.com/ChenmienTan/malmen。