Citation recommendation is the task of finding appropriate citations based on a given piece of text. The proposed datasets for this task consist mainly of several scientific fields, lacking some core ones, such as law. Furthermore, citation recommendation is used within the legal domain to identify supporting arguments, utilizing non-scholarly legal articles. In order to alleviate the limitations of existing studies, we gather the first scholarly legal dataset for the task of citation recommendation. Also, we conduct experiments with state-of-the-art models and compare their performance on this dataset. The study suggests that, while BM25 is a strong benchmark for the legal citation recommendation task, the most effective method involves implementing a two-step process that entails pre-fetching with BM25+, followed by re-ranking with SciNCL, which enhances the performance of the baseline from 0.26 to 0.30 MAP@10. Moreover, fine-tuning leads to considerable performance increases in pre-trained models, which shows the importance of including legal articles in the training data of these models.
翻译:引文推荐是一项根据给定文本寻找合适引文的任务。现有针对该任务的数据集主要集中在几个科学领域,缺乏法律等核心领域。此外,引文推荐在法律领域中被用于识别支持性论点,主要利用非学术性法律文章。为弥补现有研究的不足,我们首次收集了用于引文推荐任务的学术法律数据集。同时,我们采用最先进的模型进行实验,并比较了它们在该数据集上的性能。研究表明,尽管BM25是法律引文推荐任务的强基准,但最有效的方法是实施一个两步流程:先用BM25+进行预提取,再用SciNCL进行重排序,从而将基准模型的MAP@10从0.26提升至0.30。此外,微调显著提升了预训练模型的性能,这凸显了在法律文章中加入训练数据的重要性。