While Large Language Models (LLMs) have demonstrated impressive general capabilities, their direct application in the legal domain is often hindered by a lack of precise domain knowledge and complexity of performing rigorous multi-step judicial reasoning. To address this gap, we present LegalOne, a family of foundational models specifically tailored for the Chinese legal domain. LegalOne is developed through a comprehensive three-phase pipeline designed to master legal reasoning. First, during mid-training phase, we propose Plasticity-Adjusted Sampling (PAS) to address the challenge of domain adaptation. This perplexity-based scheduler strikes a balance between the acquisition of new knowledge and the retention of original capabilities, effectively establishing a robust legal foundation. Second, during supervised fine-tuning, we employ Legal Agentic CoT Distillation (LEAD) to distill explicit reasoning from raw legal texts. Unlike naive distillation, LEAD utilizes an agentic workflow to convert complex judicial processes into structured reasoning trajectories, thereby enforcing factual grounding and logical rigor. Finally, we implement a Curriculum Reinforcement Learning (RL) strategy. Through a progressive reinforcement process spanning memorization, understanding, and reasoning, LegalOne evolves from simple pattern matching to autonomous and reliable legal reasoning. Experimental results demonstrate that LegalOne achieves state-of-the-art performance across a wide range of legal tasks, surpassing general-purpose LLMs with vastly larger parameter counts through enhanced knowledge density and efficiency. We publicly release the LegalOne weights and the LegalKit evaluation framework to advance the field of Legal AI, paving the way for deploying trustworthy and interpretable foundation models in high-stakes judicial applications.
翻译:尽管大型语言模型(LLM)已展现出卓越的通用能力,但其在法律领域的直接应用常因缺乏精确领域知识及难以执行严谨的多步骤司法推理而受限。为弥补这一不足,我们提出了LegalOne——一个专为中文法律领域定制的基础模型家族。LegalOne通过一个旨在掌握法律推理的综合性三阶段流程开发而成。首先,在中期训练阶段,我们提出可塑性调整采样(PAS)以应对领域适应挑战。这种基于困惑度的调度器在新知识获取与原始能力保留之间取得平衡,有效建立了坚实的法律基础。其次,在监督微调阶段,我们采用法律智能体思维链蒸馏(LEAD)从原始法律文本中提炼显式推理。与简单蒸馏不同,LEAD利用智能体工作流将复杂司法流程转化为结构化推理轨迹,从而强化事实依据与逻辑严谨性。最后,我们实施了课程强化学习(RL)策略。通过涵盖记忆、理解与推理的渐进式强化过程,LegalOne从简单的模式匹配演进至自主可靠的法律推理。实验结果表明,LegalOne在广泛的法律任务中取得了最先进的性能,凭借增强的知识密度与效率,超越了参数量大得多的通用LLM。我们公开发布了LegalOne的权重及LegalKit评估框架,以推动法律人工智能领域的发展,为在高风险司法应用中部署可信且可解释的基础模型铺平道路。