Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.
翻译:法律文本的阅读理解是一项极具挑战性的任务,这主要源于法律条款的冗长复杂性和专家标注数据集的匮乏。为应对这一挑战,我们提出基于美国律师协会2021年《公开目标交易要点研究》的并购协议理解数据集(MAUD),该专家注释阅读理解数据集包含超过39000个样本及47000余条总标注量。经微调的Transformer基线模型展现出良好性能,在多数问题上的表现显著优于随机基准。然而在大量子集问题上,模型性能仍存在显著提升空间。作为唯一的专家标注并购协议数据集,MAUD在法律界与自然语言处理领域都具有重要的基准价值。