We present a neural machine translation system that can translate between Romanian, English, and Aromanian (an endangered Eastern Romance language); the first of its kind. BLEU scores range from 17 to 32 depending on the direction and genre of the text. Alongside, we release the biggest known Aromanian-Romanian bilingual corpus, consisting of 79k cleaned sentence pairs. Additional tools such as an agnostic sentence embedder (used for both text mining and automatic evaluation) and a diacritics converter are also presented. We publicly release our findings and models. Finally, we describe the deployment of our quantized model at https://arotranslate.com.
翻译:我们提出了一种神经机器翻译系统,能够实现罗马尼亚语、英语与阿罗马尼亚语(一种濒危的东罗曼语)之间的互译,此为该领域的首创。根据文本方向和体裁的不同,其BLEU得分介于17至32之间。同时,我们发布了目前已知规模最大的阿罗马尼亚语-罗马尼亚语双语语料库,包含7.9万条经过清洗的句对。此外,我们还介绍了其他工具,如一个与语言无关的句子嵌入器(用于文本挖掘和自动评估)以及一个变音符号转换器。我们公开了研究结果与模型。最后,我们描述了量化模型在https://arotranslate.com上的部署情况。