This paper explores the task of automatic prediction of text spans in a legal problem description that support a legal area label. We use a corpus of problem descriptions written by laypeople in English that is annotated by practising lawyers. Inherent subjectivity exists in our task because legal area categorisation is a complex task, and lawyers often have different views on a problem, especially in the face of legally-imprecise descriptions of issues. Experiments show that training on majority-voted spans outperforms training on disaggregated ones.
翻译:本文探讨了在法律问题描述中自动预测支持法律领域标签的文本跨度的任务。我们使用了一个由非专业人士撰写的英文问题描述语料库,并由执业律师进行标注。由于法律领域分类是一项复杂任务,且律师对问题常持有不同观点,尤其是在面对法律表述不精确的问题描述时,我们的任务中存在着固有的主观性。实验表明,基于多数投票跨度的训练效果优于基于非聚合跨度的训练。