Semantic prosody is a collocational meaning formed through the co-occurrence of a linguistic unit and a consistent series of collocates, which should be treated separately from semantic meaning. Since words that are literal translations of each other may have different semantic prosody, more attention should be paid to this linguistic property to generate accurate translations. However, current machine translation models cannot handle this problem. To bridge the gap, we propose an approach to teach machine translation models about semantic prosody of a specific structure. We focus on Chinese BEI passives and create a dataset of English-Chinese sentence pairs with the purpose of demonstrating the negative semantic prosody of BEI passives. Then we fine-tune OPUS-MT, NLLB-600M and mBART50 models with our dataset for the English-Chinese translation task. Our results show that fine-tuned MT models perform better on using BEI passives for translating unfavourable content and avoid using it for neutral and favourable content. Also, in NLLB-600M, which is a multilingual model, this knowledge of semantic prosody can be transferred from English-Chinese translation to other language pairs, such as Spanish-Chinese.
翻译:语义韵是一种通过语言单位与一系列固定搭配共现而形成的搭配意义,应与语义意义区别对待。由于字面互译的词语可能具有不同的语义韵,为生成准确翻译,需对此语言特性给予更多关注。然而,当前机器翻译模型尚无法处理此问题。为弥补这一不足,我们提出一种方法,旨在使机器翻译模型学习特定结构的语义韵。本研究聚焦汉语"被"字被动句,构建了英汉双语平行语料库,用以揭示"被"字句的消极语义韵特征。随后,我们基于该数据集对OPUS-MT、NLLB-600M和mBART50模型进行英汉翻译任务的微调。实验结果表明,经微调的机器翻译模型能更准确地运用"被"字句翻译消极内容,并避免在中性与积极内容中使用该结构。此外,在多语言模型NLLB-600M中,这种语义韵知识能够从英汉翻译迁移至其他语言对(如西班牙语-汉语)的翻译任务中。