Pre-trained Language Models (PLMs) have shown remarkable performances in recent years, setting a new paradigm for NLP research and industry. The legal domain has received some attention from the NLP community partly due to its textual nature. Some tasks from this domain are represented by question-answering (QA) tasks. This work explores the legal domain Multiple-Choice QA (MCQA) for a low-resource language. The contribution of this work is multi-fold. We first introduce JuRO, the first openly available Romanian legal MCQA dataset, comprising three different examinations and a number of 10,836 total questions. Along with this dataset, we introduce CROL, an organized corpus of laws that has a total of 93 distinct documents with their modifications from 763 time spans, that we leveraged in this work for Information Retrieval (IR) techniques. Moreover, we are the first to propose Law-RoG, a Knowledge Graph (KG) for the Romanian language, and this KG is derived from the aforementioned corpus. Lastly, we propose a novel approach for MCQA, Graph Retrieval Augmented by Facts (GRAF), which achieves competitive results with generally accepted SOTA methods and even exceeds them in most settings.
翻译:预训练语言模型(PLMs)近年来展现出卓越的性能,为自然语言处理(NLP)的研究与产业界确立了新范式。法律领域因其文本密集的特性,已受到NLP社区的部分关注。该领域中的一些任务以问答(QA)任务为代表。本研究针对一种低资源语言,探索法律领域的多项选择问答(MCQA)。本工作的贡献是多方面的。首先,我们引入了JuRO,这是首个公开可用的罗马尼亚语法律MCQA数据集,包含三项不同考试,共计10,836道问题。伴随该数据集,我们提出了CROL,一个组织化的法律条文语料库,包含93份独立文件及其在763个时间跨度内的修订版本,我们在本工作中利用该语料库进行信息检索(IR)技术研究。此外,我们首次提出了Law-RoG,一个面向罗马尼亚语的知识图谱(KG),该图谱源自上述语料库。最后,我们提出了一种新颖的MCQA方法——基于事实增强的图检索(GRAF),该方法在多数设定下取得了与公认的SOTA方法相竞争甚至更优的结果。