The feed-forward networks (FFNs) in transformers are recognized as a group of key-value neural memories to restore abstract high-level knowledge. In this work, we conduct an empirical ablation study on updating keys (the 1st layer in the FFNs layer) or values (the 2nd layer in the FFNs layer). We compare those two methods in various knowledge editing and fine-tuning tasks of large language models to draw insights to understand FFNs further. Code is available at $\href{https://github.com/qiuzh20/Tuning-keys-v.s.-values}{this\,repo}$.
翻译:前馈网络(FFNs)在Transformer中被视为一组用于恢复抽象高层知识的键值神经记忆。本研究通过实证消融实验,对比了更新键(FFN层中的第一层)与值(FFN层中的第二层)两种方法。我们在大语言模型的多项知识编辑与微调任务中比较了这两种方法,以进一步理解FFNs的机制。代码开源地址为:$\href{https://github.com/qiuzh20/Tuning-keys-v.s.-values}{this\,repo}$。