LeKUBE: A Legal Knowledge Update BEnchmark

Recent advances in Large Language Models (LLMs) have significantly shaped the applications of AI in multiple fields, including the studies of legal intelligence. Trained on extensive legal texts, including statutes and legal documents, the legal LLMs can capture important legal knowledge/concepts effectively and provide important support for downstream legal applications such as legal consultancy. Yet, the dynamic nature of legal statutes and interpretations also poses new challenges to the use of LLMs in legal applications. Particularly, how to update the legal knowledge of LLMs effectively and efficiently has become an important research problem in practice. Existing benchmarks for evaluating knowledge update methods are mostly designed for the open domain and cannot address the specific challenges of the legal domain, such as the nuanced application of new legal knowledge, the complexity and lengthiness of legal regulations, and the intricate nature of legal reasoning. To address this gap, we introduce the Legal Knowledge Update BEnchmark, i.e. LeKUBE, which evaluates knowledge update methods for legal LLMs across five dimensions. Specifically, we categorize the needs of knowledge updates in the legal domain with the help of legal professionals, and then hire annotators from law schools to create synthetic updates to the Chinese Criminal and Civil Code as well as sets of questions of which the answers would change after the updates. Through a comprehensive evaluation of state-of-the-art knowledge update methods, we reveal a notable gap between existing knowledge update methods and the unique needs of the legal domain, emphasizing the need for further research and development of knowledge update mechanisms tailored for legal LLMs.

翻译：近年来，大型语言模型（LLMs）的进展显著影响了人工智能在多个领域的应用，包括法律智能研究。通过在大量法律文本（如法规和法律文件）上进行训练，法律LLMs能够有效捕捉重要的法律知识/概念，并为法律咨询等下游法律应用提供重要支持。然而，法律条文与解释的动态性也给LLMs在法律应用中的使用带来了新的挑战。特别是，如何有效且高效地更新LLMs中的法律知识已成为实践中一个重要的研究问题。现有的评估知识更新方法的评测基准大多针对开放领域设计，无法应对法律领域的特定挑战，例如新法律知识的细微应用、法律条文的复杂性与冗长性，以及法律推理的错综复杂性。为填补这一空白，我们引入了法律知识更新评测基准，即LeKUBE，该基准从五个维度评估面向法律LLMs的知识更新方法。具体而言，我们在法律专业人士的协助下对法律领域的知识更新需求进行了分类，随后聘请法学院的标注人员对《中华人民共和国刑法》与《中华人民共和国民法典》创建了合成更新，并构建了在更新后答案会随之改变的问题集。通过对最先进的知识更新方法进行全面评估，我们揭示了现有知识更新方法与法律领域独特需求之间的显著差距，强调了有必要针对法律LLMs进一步研发专门的知识更新机制。