Terminology correctness is important in the downstream application of machine translation, and a prevalent way to ensure this is to inject terminology constraints into a translation system. In our submission to the WMT 2023 terminology translation task, we adopt a translate-then-refine approach which can be domain-independent and requires minimal manual efforts. We annotate random source words with pseudo-terminology translations obtained from word alignment to first train a terminology-aware model. Further, we explore two post-processing methods. First, we use an alignment process to discover whether a terminology constraint has been violated, and if so, we re-decode with the violating word negatively constrained. Alternatively, we leverage a large language model to refine a hypothesis by providing it with terminology constraints. Results show that our terminology-aware model learns to incorporate terminologies effectively, and the large language model refinement process can further improve terminology recall.
翻译:术语正确性在机器翻译的下游应用中至关重要,确保这一点的主流方法是将术语约束注入翻译系统。在参与WMT 2023术语翻译任务时,我们采用了一种“先翻译后优化”的方法,该方法具有领域无关性且仅需最少的人工干预。我们首先通过词对齐技术为随机源词标注伪术语翻译,以训练一个术语感知模型。此外,我们探索了两种后处理方法:一是利用对齐过程检测术语约束是否被违反,若违反则对违规词汇施加负约束并重新解码;二是通过向大语言模型提供术语约束来优化初始翻译假设。实验结果表明,我们的术语感知模型能有效学习术语整合,而大语言模型的优化过程可进一步提升术语召回率。