Punctuation plays a critical role in resolving semantic and structural ambiguity in written language. Machine Translation (MT) systems are now widely applied across diverse domains and languages, including many low-resource settings. In this work, we focus on Marathi, a low- to middle-resource language. We introduce Virām, the first diagnostic benchmark for assessing punctuation robustness in English-to-Marathi machine translation, consisting of 54 manually curated, punctuation-ambiguous instances. We evaluate two primary strategies for enhancing reliability: a pipeline-based restore-then-translate approach and direct fine-tuned on punctuation-varied data. Our results demonstrate that specialized fine-tuned models and pipeline systems significantly improve translation quality over standard baselines on the Virām benchmark. Qualitative analysis reveals that the original model may result in wrong translations leading to wrong interpretations, while fine-tuned models significantly improve overall reliability. Furthermore, we find that current Large Language Models (LLMs) lag behind these task-specific approaches in preserving meaning for punctuation-ambiguous text, thus necessitating further research in this area. The code and dataset is available at https://github.com/KaustubhShejole/Viram_Marathi.
翻译:标点在书面语言中对于消除语义和结构歧义起着至关重要的作用。机器翻译系统现已广泛应用于包括许多低资源语言在内的多种领域和语言。在本研究中,我们聚焦于马拉地语这一中低资源语言。我们提出了Virām,首个用于评估英语到马拉地语机器翻译中标点鲁棒性的诊断基准,包含54个手动构建的标点歧义实例。我们评估了两种提升可靠性的主要策略:基于流水线的“先恢复后翻译”方法,以及直接在标点变化数据上进行微调的方法。我们的结果表明,在Virām基准测试中,经过专门微调的模型和流水线系统相较于标准基线显著提升了翻译质量。定性分析显示,原始模型可能导致产生错误翻译进而引发错误解读,而微调模型则显著提升了整体可靠性。此外,我们发现当前的大型语言模型在保持标点歧义文本的语义方面落后于这些任务特定方法,因此该领域需要进一步研究。代码与数据集可在 https://github.com/KaustubhShejole/Viram_Marathi 获取。