A contract is a type of legal document commonly used in organizations. Contract review is an integral and repetitive process to avoid business risk and liability. Contract analysis requires the identification and classification of key provisions and paragraphs within an agreement. Identification and validation of contract clauses can be a time-consuming and challenging task demanding the services of trained and expensive lawyers, paralegals or other legal assistants. Classification of legal provisions in contracts using artificial intelligence and natural language processing is complex due to the requirement of domain-specialized legal language for model training and the scarcity of sufficient labeled data in the legal domain. Using general-purpose models is not effective in this context due to the use of specialized legal vocabulary in contracts which may not be recognized by a general model. To address this problem, we propose the use of a pre-trained large language model which is subsequently calibrated on legal taxonomy. We propose LegalPro-BERT, a BERT transformer architecture model that we fine- tune to efficiently handle classification task for legal provisions. We conducted experiments to measure and compare metrics with current benchmark results. We found that LegalPro-BERT outperforms the previous benchmark used for comparison in this research.
翻译:合同是组织中常用的法律文件类型。合同审查是避免商业风险和责任的一个不可或缺且重复的过程。合同分析需要识别和分类协议中的关键条款和段落。合同条款的识别和验证可能是一项耗时且具有挑战性的任务,需要训练有素且昂贵的律师、律师助理或其他法律助理的服务。由于模型训练需要领域特定的法律语言,以及法律领域中标注数据的稀缺,使用人工智能和自然语言处理对合同中的法律条款进行分类是复杂的。由于合同中使用专业法律词汇,通用模型可能无法识别这些词汇,因此在此背景下使用通用模型效果不佳。为了解决这个问题,我们提出使用预训练的大型语言模型,随后在法律分类体系上进行校准。我们提出了LegalPro-BERT,这是一个BERT变压器架构模型,我们对其进行微调以高效处理法律条款分类任务。我们进行了实验以测量指标并与当前基准结果进行比较。我们发现LegalPro-BERT的表现优于本研究中用于比较的先前基准。