In this paper, we introduce SaulLM-7B, a large language model (LLM) tailored for the legal domain. With 7 billion parameters, SaulLM-7B is the first LLM designed explicitly for legal text comprehension and generation. Leveraging the Mistral 7B architecture as its foundation, SaulLM-7B is trained on an English legal corpus of over 30 billion tokens. SaulLM-7B exhibits state-of-the-art proficiency in understanding and processing legal documents. Additionally, we present a novel instructional fine-tuning method that leverages legal datasets to further enhance SaulLM-7B's performance in legal tasks. SaulLM-7B is released under the MIT License.
翻译:本文介绍了SaulLM-7B,一个专为法律领域定制的大型语言模型(LLM)。该模型拥有70亿参数,是首个专门为法律文本理解与生成而设计的LLM。SaulLM-7B以Mistral 7B架构为基础,基于超过300亿个token的英文法律语料库进行训练。它在理解和处理法律文件方面展现出最先进的水平。此外,我们提出了一种创新的指令微调方法,该方法利用法律数据集进一步增强SaulLM-7B在法律任务中的表现。SaulLM-7B已根据MIT许可证发布。