VLegal-Bench: Cognitively Grounded Benchmark for Vietnamese Legal Reasoning of Large Language Models

The rapid advancement of large language models (LLMs) has enabled new possibilities for applying artificial intelligence within the legal domain. Nonetheless, the complexity, hierarchical organization, and frequent revisions of Vietnamese legislation pose considerable challenges for evaluating how well these models interpret and utilize legal knowledge. To address this gap, the Vietnamese Legal Benchmark (VLegal-Bench) is introduced, the first comprehensive benchmark designed to systematically assess LLMs on Vietnamese legal tasks. Informed by Bloom's cognitive taxonomy, VLegal-Bench encompasses multiple levels of legal understanding through tasks designed to reflect practical usage scenarios. The benchmark comprises 10,450 samples generated through a rigorous annotation pipeline, where legal experts label and cross-validate each instance using our annotation system to ensure every sample is grounded in authoritative legal documents and mirrors real-world legal assistant workflows, including general legal questions and answers, retrieval-augmented generation, multi-step reasoning, and scenario-based problem solving tailored to Vietnamese law. By providing a standardized, transparent, and cognitively informed evaluation framework, VLegal-Bench establishes a solid foundation for assessing LLM performance in Vietnamese legal contexts and supports the development of more reliable, interpretable, and ethically aligned AI-assisted legal systems. To facilitate access and reproducibility, we provide a public landing page for this benchmark at https://vilegalbench.cmcai.vn/.

翻译：大语言模型（LLM）的快速发展为人工智能在法律领域的应用带来了新的可能性。然而，越南法律体系的复杂性、层级结构以及频繁修订，对评估这些模型如何理解和运用法律知识构成了重大挑战。为弥补这一空白，本文引入了越南法律基准（VLegal-Bench），这是首个旨在系统评估LLM在越南法律任务上性能的综合基准。该基准以布鲁姆认知分类学为指导，通过设计反映实际使用场景的任务，涵盖了多个层次的法律理解能力。该基准包含10,450个样本，这些样本通过严格的标注流程生成，由法律专家使用我们的标注系统对每个实例进行标注和交叉验证，以确保每个样本均基于权威法律文件，并模拟现实世界法律助理的工作流程，包括一般法律问答、检索增强生成、多步骤推理以及针对越南法律量身定制的基于场景的问题解决。通过提供一个标准化、透明且基于认知科学的评估框架，VLegal-Bench为评估LLM在越南法律语境下的性能奠定了坚实基础，并支持开发更可靠、可解释且符合伦理的人工智能辅助法律系统。为促进访问和可复现性，我们在 https://vilegalbench.cmcai.vn/ 提供了该基准的公开主页。