LeKUBE: A Legal Knowledge Update BEnchmark

Recent advances in Large Language Models (LLMs) have significantly shaped the applications of AI in multiple fields, including the studies of legal intelligence. Trained on extensive legal texts, including statutes and legal documents, the legal LLMs can capture important legal knowledge/concepts effectively and provide important support for downstream legal applications such as legal consultancy. Yet, the dynamic nature of legal statutes and interpretations also poses new challenges to the use of LLMs in legal applications. Particularly, how to update the legal knowledge of LLMs effectively and efficiently has become an important research problem in practice. Existing benchmarks for evaluating knowledge update methods are mostly designed for the open domain and cannot address the specific challenges of the legal domain, such as the nuanced application of new legal knowledge, the complexity and lengthiness of legal regulations, and the intricate nature of legal reasoning. To address this gap, we introduce the Legal Knowledge Update BEnchmark, i.e. LeKUBE, which evaluates knowledge update methods for legal LLMs across five dimensions. Specifically, we categorize the needs of knowledge updates in the legal domain with the help of legal professionals, and then hire annotators from law schools to create synthetic updates to the Chinese Criminal and Civil Code as well as sets of questions of which the answers would change after the updates. Through a comprehensive evaluation of state-of-the-art knowledge update methods, we reveal a notable gap between existing knowledge update methods and the unique needs of the legal domain, emphasizing the need for further research and development of knowledge update mechanisms tailored for legal LLMs.

翻译：近年来，大型语言模型（LLMs）的显著进展深刻影响了人工智能在多个领域的应用，包括法律智能研究。通过在大量法律文本（如法规和法律文件）上训练，法律LLMs能够有效捕捉重要的法律知识/概念，并为法律咨询等下游法律应用提供重要支持。然而，法律法规及其解释的动态性也给LLMs在法律应用中的使用带来了新的挑战。特别是，如何高效且有效地更新LLMs中的法律知识已成为实践中一个重要的研究问题。现有的知识更新方法评测基准大多针对开放领域设计，无法应对法律领域特有的挑战，例如新法律知识的细微应用、法律条文的复杂性与冗长性以及法律推理的错综复杂性。为弥补这一空白，我们提出了法律知识更新评测基准（Legal Knowledge Update BEnchmark），即LeKUBE，该基准从五个维度评估针对法律LLMs的知识更新方法。具体而言，我们在法律专业人士的协助下对法律领域的知识更新需求进行分类，随后聘请法学院标注人员对中国《刑法》和《民法典》创建合成更新，并构建了在更新后答案会发生变化的问题集。通过对最先进的知识更新方法进行全面评估，我们发现现有知识更新方法与法律领域独特需求之间存在显著差距，强调了有必要针对法律LLMs进一步研发专门的知识更新机制。