Most legal text in the Indian judiciary is written in complex English due to historical reasons. However, only a small fraction of the Indian population is comfortable in reading English. Hence legal text needs to be made available in various Indian languages, possibly by translating the available legal text from English. Though there has been a lot of research on translation to and between Indian languages, to our knowledge, there has not been much prior work on such translation in the legal domain. In this work, we construct the first high-quality legal parallel corpus containing aligned text units in English and nine Indian languages, that includes several low-resource languages. We also benchmark the performance of a wide variety of Machine Translation (MT) systems over this corpus, including commercial MT systems, open-source MT systems and Large Language Models. Through a comprehensive survey by Law practitioners, we check how satisfied they are with the translations by some of these MT systems, and how well automatic MT evaluation metrics agree with the opinions of Law practitioners.
翻译:由于历史原因,印度司法系统中的大多数法律文本均以复杂的英文撰写。然而,仅有少数印度民众能够熟练阅读英文。因此,有必要将法律文本以多种印度语言提供,可能的途径是将现有的英文法律文本进行翻译。尽管针对印度语言的翻译及印度语言之间的翻译已有大量研究,但据我们所知,此前在法律领域的此类翻译工作并不多。在本研究中,我们构建了首个高质量的法律平行语料库,其中包含英文与九种印度语言(包括若干低资源语言)的对齐文本单元。我们还基于该语料库,对包括商业机器翻译系统、开源机器翻译系统以及大型语言模型在内的多种机器翻译系统的性能进行了基准测试。通过法律从业者开展的全面调查,我们评估了他们对其中部分机器翻译系统译文的满意度,并检验了自动机器翻译评价指标与法律从业者意见的一致性程度。