Knowledge editing and machine unlearning are two popular approaches for large language models (LLMs) to stay up-to-date. However, the knowledge updating mechanism of LLMs remains largely unexplored due to insufficient, isolated, and small-scale evaluation. For instance, are LLMs similar to humans in modifying certain knowledge? What differs editing and unlearning as training data increases? This paper proposes KnowledgeSmith, a unified framework to systematically understand the updating mechanism of LLMs. We first cast editing and unlearning as instances of one constrained optimization problem. Then, we propose an automatic dataset generator that provides structured interventions across multiple graph levels and data scales, enabling controlled studies of how different modification strategies propagate through model knowledge. Extensive experiments demonstrate nuanced insights over knowledge propagation, plasticity scaling, consistency, and robustness. For instance, our results show that LLMs do not exhibit similar updating as humans for different levels of knowledge, and there exists consistency-capacity trade-off. We hope our findings can offer suggestions to the design of more reliable and scalable strategies. Code: https://github.com/AIFrontierLab/KnowledgeSmith.git
翻译:知识编辑与机器遗忘是使大语言模型(LLMs)保持知识更新的两种主流方法。然而,由于现有评估存在规模有限、相互孤立且覆盖不足的问题,LLMs的知识更新机制在很大程度上仍未得到充分探索。例如,LLMs在修改特定知识时是否与人类行为相似?随着训练数据量的增加,编辑与遗忘方法会呈现何种差异?本文提出KnowledgeSmith这一统一框架,以系统性地探究LLMs的更新机制。我们首先将编辑与遗忘问题统一表述为约束优化问题的两种实例。随后,我们设计了一种自动化数据集生成器,能够在多层级知识图谱结构和不同数据规模上提供结构化干预,从而实现对不同修改策略在模型知识中传播路径的受控研究。大量实验揭示了关于知识传播、可塑性缩放、一致性与鲁棒性等方面的细致发现。例如,实验结果表明:LLMs在不同层级知识的更新行为上与人类并不相似,且存在一致性与容量之间的权衡关系。我们希望本研究能为设计更可靠、可扩展的更新策略提供参考。代码地址:https://github.com/AIFrontierLab/KnowledgeSmith.git